Friday, March 9, 2018


CaltechDATA has supported automatic preservation of GitHub software repositories since launch, so anyone at Caltech can get a DOI (permanent identifier) for their software project and have Caltech Library handle long term preservation. However, most GitHub repositories do not include clear metadata such as authors, affiliations, or ORCID identifiers. CaltechDATA now supports CodeMeta, a new standard format for software metadata. By including a codemeta.json file in your GitHub repo, your full author list, keywords, and license will be listed in CaltechDATA and registered with your DOI.

This improvement is powered by ames, a Python package for automating metadata changes developed at Caltech Library. Every 5 minutes, ames harvests all the GitHub-created records in CaltechDATA and stores them using dataset (our lightweight data storage package). These records are then analyzed for codemeta.json files.  If a CodeMeta file is found, the relevant metadata is extracted and added to the CaltechDATA record and DOI. We currently support authors, keywords, and license fields - but more will be added as a community of practice develops. We’re also exploring better ways to generate CodeMeta files as part of the software release process.