Data PictureThe library helps campus research labs and centers manage and publish their research data. 

CaltechDATA

Caltech Library offers a free managed data storage and sharing service at https://data.caltech.eduFind out more information or read the CaltechDATA FAQ.

 

Data Consultations

Library staff can help you or your research group work through challenges associated with research data.  We can provide:

  • Data Management Plan (DMP) assistance
  • Capacity planning
  • Storage technology recommendations
  • Suggestions for data processing and visualization
  • Options for long term archival and data sharing

Contact us at data [at] library.caltech.edu or call x3827 to schedule a consultation appointment.

Data Preservation and DOIs

Our standard data preservation and DOI (permanant identifier) services are through CaltechDATA.  We also offer services (at an additional cost) for preserving large volumes of data (> 500 GB).  Contact us at data [at] library.caltech.edu to discuss the options.

Caltech Library also manages custom DOIs for groups on campus.  Find out more information, see our DOI page.  

Interactive Visualizations

Caltech Library is exploring how to support interactive research figures.  For example, our Geology Thesis Map shows the location of all the Geology and Planetary Science division theses that have content in CaltechDATA.  This map is interactive, but is built from standard html, css, and javascript using the Bokeh library.  We can host your interactive visualizations as long as they can be bundled as static web content-just contact us at data [at] library.caltech.edu.

Other Resources

Research Data Management Libguide - Library curated resources around data management

Data Storage Hardware and Services

Network Attached Storage

Network attached storage devices are boxes that contain both storage and the hardware needed to manage the storage. They can be thought of as a small computer with lots of storage. Using a NAS to store your research data has many benefits. Because they are internet accessible, it is easy to centralize data collected on different instruments and to access data for later analysis. Most models contain multiple hard drives and can be set up with RAID to protect against data loss in case of a hard drive failure. NAS devices are generally affordable ($300-$1500 depending on the storage space needed) and is usually cheaper than purchasing cloud storage over 4-5 years. We currently provide instructions for setting up  Synology  NAS devices, but many manufacturers make a similar product. If you've got a different NAS, let us know at data [at] caltech.edu and we can work on putting together setup instructions.

Instructions for setting up a Synology NAS

Interested in trying out a NAS? Send an email to data [at] caltech.edu to get access to our demo NAS!

Box.com

Caltech IMSS manages a campus site license for Box.com. Students get 50 GB of free storage, and faculty and staff can request unlimited storage at https://help.caltech.edu (request type IMSS > Data Storage & Backup > Request Additional File Storage > Box).  Box.com is a good resource for storing backup copies of data and syncing between computers, but should not be used as a primary data storage location. Continued availability of Box.com is dependent on IMSS and Box.  A comparison of IMSS provided file storage systems is available at https://imss.caltech.edu/node/941#comparison

Have questions about other storage services? Send an email to data [at] caltech.edu.

High Performance Computing Resources

Caltech Resources

IMSS maintains high performance clusters for data analysis and visualization for specific campus groups, as shown in this (possibly out of date) list. You should contact help-hpc [at] caltech.edu if you have questions about setting up a cluster on campus.

IMSS also contracts with a cloud-based vendor that can set up virtual computing clusters. You pay for how much you use, charged by the second. You also get 500 free core hours to test the service. Find more information here.

XSEDE

XSEDE is a National Science Foundation funded nationwide high performance computing resource. Researchers can request time on more than 10 national supercomputers, visualization resources, storage systems, and scientific gateways , also listed below. A separate NSF grant is not required to gain access to these resources. Caltech users interested in testing one of these systems can contact the Caltech Campus Champion, Tom Morrell at tmorrell [at] caltech.edu, for trial access. Faculty, Postdoctoral Researchers, and NSF Graduate Research Fellows can submit a startup allocation which provides up to 50,000 compute hours to test XSEDE resources. The startup allocation request process is very straightforward and enables users to quickly access computing resources. After one year or when the startup resources are exhausted, researchers can submit a more thorough research allocation proposal.

Current XSEDE Resources (12/2017)

Labeled with Host, Node specifications, Max queue time

Traditional

  • Comet, SDSC, 24 Core; 128 GB RAM; 320 GB SSD, 2 days 
  • Stampede2, TACC, select KNL or SKX: 68 or 48 Core; 96 or 192 GB RAM; 107 or 144 GB SSD, 4 days 
  • SuperMIC, LSU, 20 core; 64GB RAM; 500 GB HHD, 3 days
  • Bridges, PSC, 28 core; 128 GB RAM; 8 TB Storage, 2 days (can also schedule segments of a node)

GPU

  • Comet, SDSC, NVIDIA Maxwell K80, 2 days 
  • Bridges, PSC, NVIDIA K80 or P100, 2 days
  • XStream, Stanford, 12 core 4 Kepler cores 64GB RAM 480 GB SSD, 7 days

Large Memory

  • Bridges Large, PSC3 or 12 TB RAM; 16 or 64 TB Storage, 4 Days
  • Comet, SDSC, 64 Core 1.5 TB; RAM 400 GB SSD, 2 days 

Virtualized/Distrubuted

  • Jetstream, Indiana/TACC, Can spin up various sized custom imaged environments
  • Open Science Grid, Distributed computing for smaller jobs (single thread, < 2 GB memory, 1-12 hour execution, <10 GB storage)

Cyverse

Cyverse (formerly iplant) is another NSF-funded cyberinfrastructure project that provided computing resources, primarily targeted at life science researchers.  They offer free access to Atmosphere, a cloud-based computing resource where you can spin up computing resources with specific images.  A basic allocation is available by registering, and additional allocations require an application.  

Have questions about high performance computing resources? Send an email to data [at] caltech.edu.