Research Data FAQ
Caltech Library and Information Management Systems & Services (IMSS) have collaborated to bring you answers to these frequently asked questions about research data storage at Caltech. You can also find out more information about:
IMSS file storage - https://www.imss.caltech.edu/file-storage
CaltechDATA Information - https://www.library.caltech.edu/caltechdata
IMSS high performance computing (HPC) - http://www.hpc.caltech.edu/
Library Data Services - https://www.library.caltech.edu/data
I’m going to be collecting research data. Where should I put it?
Caltech IMSS provides Box.com cloud storage, with 50 GB of storage for each community member and 1 TB per group. Additional storage is available for an extra cost. Box manages the storage, but you manage access to files. However, Box may not be fast or efficient enough for large amounts of data, and it has a max single file size of 15GB. If your data management needs are too large for Box, you may want to purchase local storage hardware such as a Network Attached Storage device or storage array. IMSS and the library can help you decide on what option is best for your needs (help.caltech.edu select IMSS/Data Storage & Backup or email data [at] caltech.edu (subject: Research%20Data%20Storage%20Recommendation) ).
I need to analyze research data or run simulations.
The new Caltech High Performance Computing (HPC) cluster (http://www.hpc.caltech.edu/) is an excellent option. Your calculations will run on a state-of-the-art resource with local support. Your research group leader has to set up an account (http://www.hpc.caltech.edu/documentation/getting-started), and there is a charge depending on how much computing time you use. Groups get up to 30 TB of free data storage, although this storage is not backed up, so groups must store primary data elsewhere. National (off-campus) computing resources like XSEDE (https://www.xsede.org/) are also available by application and can provide additional computing resources at no charge.
I want to ensure that my data remains available for a long time (like a publication).
You can deposit your files in CaltechDATA (data.caltech.edu), the library-run repository. CaltechDATA accepts files of any type and size, although you should email data [at] caltech.edu (subject: I%20want%20to%20store%20large%20files%20in%20CaltechDATA) if you’re planning on uploading more than 500 GB of data. Caltech library is responsible for maintaining access to the files, and all data records are assigned a Digital Object Identifier (DOI) to provide permanent linking and simplify citation. You can make your files public immediately or after an embargo period.
I’m developing software and want to make sure it remains available for a long time.
The CaltechDATA repository (data.caltech.edu) can accept software and even has an integration with GitHub to automatically preserve software releases. Contact us at data [at] caltech.edu (subject: CaltechDATA%20GitHub%20Integration) with any questions on configuring the integration.
I want to share data with collaborators or reviewers.
To share research data files you can use the file sharing options in Box.com, which also allows you to set a custom password for the files. Box.com is a complete cloud file service, so you can add collaborators that can access files with Box.com credentials. Unlike services like Dropbox, collaborators can store files in a shared folder using your institutional Box storage allocation.
I’m collecting data on human subjects.
Talk to the Institutional Review Board (IRB) about all data collection and storage plans for your project (http://irb.caltech.edu/). Box.com, SharePoint, and OneDrive are certified by IMSS for personal data covered by HIPPA or FERPA regulations.