Preservation Policy

Data Backup & Preservation Terms

Harvard Library Technical Services (LTS), in collaboration with Harvard University Information Technology (HUIT) and the Institute for Quantitative Social Science (IQSS), hosts the Harvard's Dataverse repository using Amazon Web Services and S3, and maintains a full backup of all data and directories using Amazon Glacier. Additionally, FAS Research Computing at Harvard University keeps a backup of all Harvard Dataverse data and directories. This means that there are always full, recent copies of the Harvard Dataverse repository at multiple locations.

Backup Schedule

All research data files in the Harvard Dataverse repository are stored in an Amazon S3 bucket. All content placed in that bucket is immediately replicated to a second S3 bucket in a different, isolated availability zone. After seven days in this second bucket, all files are moved into Glacier, Amazon's cloud data archiving service for long-term backup storage.

Additionally, all of Harvard Dataverse's application/system files and databases are automatically backed up daily to a data center run by FAS Research Computing at Harvard University.

Policy and Procedures for Digital Archiving

Harvard University’s policy for digital archiving is part of the institution’s general mission to preserve all of its archival collections and to ensure their availability for current and future use. More specifically, this policy for preserving our digital data collections is meant to ensure continued access to born digital and digitized data, to ensure their authenticity, and to maintain data quality using the best digital archival practices.

Harvard University (in particular with support from IQSS) commits to best archival practice to ensure that all materials deposited in the archive remain available and usable. This includes: preserving previously deposited versions of materials; deaccessioning (removal) of datasets only when legally compelled; maintaining public access to the materials; regularly reviewing risks to materials; and reformatting materials as necessary and if possible to avoid format obsolescence.

Preservation of Materials Deposited in the Harvard Dataverse

Harvard University supports permanent bit-level preservation of all materials directly deposited in the Harvard Dataverse. In addition, all social science data deposited in the Harvard Dataverse that is made publicly available is replicated by the Data-PASS partners for permanent preservation by the partnership.

On top of Harvard University’s commitment to archival and long term access of all data published in the Harvard Dataverse, the Harvard Dataverse takes data publication very seriously (see Joint Declaration of Data Citation Principles), encouraging good curation practices through support of standards-based metadata schemas, proper documentation, and automatic extraction of metadata from FITS and tabular files to enable data discovery and reuse. Tabular files deposited in the Harvard Dataverse are reformatted into simple open format text files (.tab format), with variable level XML metadata based on the Data Documentation Initiative (DDI), to ensure long-term preservation of the data. Also, once a dataset is published, the repository guarantees archival and long term access to that dataset with a DOI persistent identifier provided by DataCite.

In order to ensure long term accessibility of the dataset in the Harvard Dataverse, once a dataset is published it can not be unpublished and can only be deaccessioned under extreme circumstances, such as a legal requirement to destroy that dataset. However, even in these circumstances, a tombstone landing page with the basic citation metadata will always be accessible to the public if they use the persistent URL (Handle or DOI) provided in the citation for that dataset. Users will not be able to see any of the files or additional metadata that were previously available prior to deaccession.

Due to the self-curated nature of some of the datasets in the Harvard Dataverse, owners or distributors of individual datasets have control over selection of materials, documentation, access policies and data user agreements of their datasets. Therefore, questions about finding and using data distributed by others in the Harvard Dataverse should in general be referred to individual dataset owners.

Changes to this Preservation Policy

Harvard Dataverse may revise this preservation policy at its sole discretion. Please check this page regularly for our current practices. If you have any questions about this preservation policy, the practices of this site, or your dealings with this site, you can contact: support@dataverse.harvard.edu.

This policy was last modified: 01/15/2020.