How long will data be stored in CaltechDATA?

In most cases, indefinitely.  Any data that violate the Terms of Deposit or which fails to meet minimum standards for retention may be withdrawn. For example, we may eventually remove deposits that consist of unusable/obsolete files or are inadequately described. Files that are larger in size may have higher standards for retention.  The DOI for all records will be retained and will lead to a tombstone page listing the reason for withdrawal.  

Who runs CaltechDATA?

CaltechDATA is a service of the Caltech library as part of CODA (  All members of the Caltech community can upload research data for long term preservation and public access.


Who can upload files to CaltechDATA?

Any member of the Caltech community with an access.caltech username can upload files to CaltechDATA.  For issues with your username or password, please contact IMSS.


What can I upload to CaltechDATA?

You can upload any digital files to CaltechDATA.  You can also directly import software from Github for long term preservation.  Publications should be submitted to CaltechAUTHORS


Is there anything that can't be deposited to CatechDATA?

You must have the rights to any data you deposit.  Data cannot violate the publicity, privacy or confidentiality rights of others or be covered by HIPPA or FERPA.  Read our Terms of Deposit for complete details.


How much data can I store on CaltechDATA?

There are currently no hard storage limits, but you should only deposit data that will be useful to others.  All data must be described in sufficient detail so that others can understand it.  If you're planning on uploading more that 500 GB of data, please data [at] (subject: Uploading%20Large%20Amounts%20of%20Data%20to%20CaltechDATA) (contact us) first.


Does CaltechDATA have any file size restrictions?

No, but file size can impact the availability of files.  Files under 1GB will always be immediately available.  Files under 100GB may be immediately available only if they are accessed every four months.  Otherwise files may be stored on Infrequent Access Storage (IAS) and may take up to 24 hours to retrieve.


How do I access my CaltechDATA files in Infrequent Access Storage(IAS)?

Currently no files are stored on IAS.  When this feature becomes active, instructions will be provided.


Can I make my files in CaltechDATA private?

Not indefinitely.  You can embargo data for a specific period of time, but all data must be intended for public access at some point in the future.


Can I restrict data in CaltechDATA to specific users or groups?

No, CaltechDATA is an open repository.


Does my data need to be published before uploading to CaltechDATA?

No, you can upload unpublished data.  If you wish, you can embargo the data until publication.  You can easily link your data to publications by entering the DOI in the related publications field.


What metadata fields are used in CaltechDATA?

Our metadata is derived from the DataCite 4.0 schema, and is compliant with the Project Open Data 1.1 schema (used by US Federal Government Agencies).  Our metadata includes an explicit related publications list.

What can I do with data in CaltechDATA?

You can use the read API to load files in CaltechDATA into another application.  For example, the web application at  allows you to interactively plot two mineral spectra files (,  You can see the api code that generates this demo at  Feel free to send us an email at data [at] if you'd like help integrating the repository with your application. 

Is there any charge for storing data in CaltechDATA?

CaltechDATA storage is provided by the library at no cost in most cases. Users planning on depositing more than 500 GB of data should email us at data [at] to discuss your requirements.

Can I make changes to a record in CaltechDATA?

Yes, once you're logged in you should see an edit button appear for all records you created.  This allows you to edit the metadata in the record.  If you need to change the files associated with a record, send us an email at data [at]

How do I use the CaltechDATA API to create records?

First, you need to generate an access token.  Log into CaltechDATA, and then click on your user menu (the person icon in the upper right hand corner). Then click "Applications".

Menu Option

Click on the "+ New Token" button in the Personal access tokens section.

Token button example

Make up a name for your token and check all of the scope buttons.

Token details

Your token will be shown on screen.  Copy it down and store it somewhere secure.  It functions just like an account password.  

You can create records using our python library caltechdata_api.  You can install the library by downloading the source code of the latest release, extracting the file, and navigating to the caltechdata_api-x.x.x directory using the command line.  Then type 'python install' to install the library.

To use the library, you'll need to set the access token you just created.  Type 'export TINDTOK=TOKEN', where TOKEN is replaced by your actual token - or use the token.bash script that is distributed with the library.

Some scripts used for creating more complex data records are located in the caltechdata_migrate repository.  An example that published a mercurial repository to CaltechDATA is available at caltechdata_hg.

We're also here to help - just send us an email at data [at]

Does the CaltechData repository provide private links for journal peer reviewers to use before publication?

A peer review access password/link is the most requested feature for CaltechDATA.  Our development partner is working on implementing the feature, but we do not have a schedule for when it will be available.  

For now, we recommend submitting the files to CaltechDATA with an embargo.  This will generate a permanent DOI link that you can include in the paper text or references.  The reviewers will be able to follow the DOI and see that the files have been uploaded, but they will not be able to download the files.  You should upload the files to a sharing service like, where you can generate a link (or password) that you can provide the reviewers so they can access all the files.


How do I get a DOI badge on my GitHub repository?

Different formats of the GitHub badge will appear in the GitHub section of CaltechDATA once you make your first software release after activating GitHub preservation. 

If you want the DOI badge to appear with your first release, find your Github repo id at: (swap out your repo name).

Copy the markdown snippet below to your README file and replace the two long number sequences with your Github repo id:


The badge will not appear until you make a release in GitHub and a DOI is assigned.

Who should be listed an an author in CaltechDATA?

The authors of a dataset are "The main researchers involved in producing the data", according to the DataCite specification.  The authors listed in CaltechDATA do not need to match the authors in a related publication.  You can list other individuals who assisted with a research project as contributors. 

How can I test making records in CaltechDATA?

You should not conduct testing in CaltechDATA, as submitting a record will automatically generate a DOI that cannot be deleted.  We have a separate instance for testing - send us an email at data [at] so we can set up a test account for you.