Home > Resource Centre > Standards Watch Papers > PREMIS Data Dictionary
By Sarah Higgins, University of Glasgow
Metadata is the backbone of digital curation. Without it a digital resource may be irretrievable, unidentifiable or unusable. Metadata is descriptive or contextual information which refers to or is associated with another object or resource. This usually takes the form of a structured set of elements which describe the information resource and assists in the identification, location and retrieval of it by users, while facilitating content and access management. Metadata standards formalise the element structure to ensure that the aims of a user community can be fulfilled. More information concerning the nature of a metadata standard and how to implement one can be found in DCC Standards Watch 1: What are Metadata Standards? and DCC Standards Watch 2: Using Metadata Standards.
The Preservation Metadata: Implementation Strategies (PREMIS) international working group was set up by OCLC and RLG in 2003 to define a core set of preservation metadata elements, which could be applied broadly across the preservation community, and to examine a number of practical application issues. In 2005 the group published their final report which included version 1 of the PREMIS Data Dictionary, a metadata set for long-term digital preservation, and accompanying XML schemas, which allows PREMIS compliant metadata to be expressed consistently in XML.
The PREMIS Data Dictionary's scope is restricted to the following digital preservation activities: maintaining viability, renderability, understandability, authenticity and identity. It assumes preservation metadata will be auto-generated as much as possible and that other suitable descriptive, technical and packaging metadata standards will be used in conjunction with PREMIS.
The PREMIS Data Dictionary is rapidly gaining community acceptance and its maintenance is coordinated by the Library of Congress through a Managing Agency, an Editorial Committee and an Implementers' Group. It won the 2005 Digital Preservation Award from the Digital Preservation Coalition and the 2006 Preservation Publication Award from the Society of American Archivists'. The PREMIS Schema has been endorsed by the Metadata Encoding and Transmission Standard (METS) editorial board for use with METS.
The PREMIS Data Dictionary's data model builds on the Open Archival Information System (OAIS) Reference Model (ISO 14721), and defines relationships between five digital preservation activities or entities:
The PREMIS Data Dictionary defines semantic units and semantic components to describe properties of the latter four entities. Eight of the semantic units defined are mandatory, along with a number of their semantic components. These are regarded as the minimum information required for the digital preservation of a digital object. A number of the other semantic components defined become mandatory if the semantic unit in which they are contained is used in the application.
Implementers are expected to use other applicable metadata standards, in conjunction with the PREMIS Data Dictionary to describe: Intellectual Entities, the characteristics of Agents, technical metadata for file formats, rights relating to access and/or distribution, details of media and hardware, the business rules of a repository and information concerning the creation of the PREMIS record. Very few values for semantic units are defined by the Data Dictionary, but the use of controlled vocabularies is recommended and the use of ISO 8601:2004 — for formatting dates is mandated.
The PREMIS XML schema is made up of individual schemas for the four entities which are in scope: Objects, Events, Agents and Rights. This allows them to be used separately and individually. A container schema is available if an implementation requires the PREMIS metadata to be kept together. At least one object must be described if the container schema is used.
The PREMIS Implementers' Group (PIG) includes a wiki to share documents, known as the Pig Pen, an implementation registry and a list group for discussion. Implementors are encouraged to share their experiences and feed these back into the ongoing revision process.