Because good research needs good data

What are Metadata Standards

By Sarah Higgins, Aberystwyth University 

Published: February 2007

1. Metadata and Digital Curation

Metadata is the backbone of digital curation. Without it a digital resource may be irretrievable, unidentifiable or unusable. Metadata is descriptive or contextual information which refers to or is associated with another object or resource. This usually takes the form of a structured set of elements which describe the information resource and assists in the identification, location and retrieval of it by users, while facilitating content and access management.

Back to top

2. Description of a Metadata Standard

Metadata is made up of a number of elements which can be categorised into the different functions they support. A metadata standard will normally support a number of defined functions, and will specify elements which make these possible. A metadata standard may support some or all of the following functions:

  • Descriptive Metadata enables identification, location and retrieval of information resources by users, often including the use of controlled vocabularies for classification and indexing and links to related resources.
  • Technical Metadata describes the technical processes used to produce, or required to use a digital object.
  • Administrative Metadata is used to manage administrative aspects of the digital object such as intellectual property rights and acquisition. Administrative Metadata also documents information concerning the creation, alteration and version control of the metadata itself. This is sometimes known as meta-metadata!
  • Use Metadata manages user access, user tracking and multi-versioning information.
  • Preservation Metadata, amongst other things, documents actions which have been undertaken to preserve a digital resource such as migrations and checksum calculations.

Metadata standards often start as schemas developed by a particular user community to enable the best possible description of a resource type for their needs. The development of such schemas tends to be controlled through community consensus combined with formal processes for submission, approval and publishing of new elements. Metadata definitions are generally stored and maintained in a controlled manner by the user community. Published specifications are often held in a central location, such as a reference document on a website or in a Metadata Registry, which may be accessible from a website. These generally contain semantic definitions of the elements and standardised ways of representing them in digital formats such as databases and XML (eXtensible Markup Language). The latter is rapidly becoming the de facto mark-up standard in many communities. Semantic definitions include both Metadata Structure Standards and Metadata Content Standards. The former ensure consistent structure to enable data sharing and searching, manage the creation process, record provenance and technical processes and manage access permissions, while the latter ensure effective machine searches through consistent data entry and the inclusion of access points using controlled vocabularies such as authority files, thesauri or encoding schemes.

Metadata Schemas develop in response to a community need and often gain wide acceptance, or are widely used while still in development. Maintenance by nationally or internationally recognised centres of excellence, such as the Library of Congress, or support from a professional body increases both visibility and take-up so that they become a community's standard schema. Some bodies, such as the Dublin Core Metadata Initiative or the Open Geospatial Consortium, actively develop schemas and ratify them as standards for their user community. Statutory bodies also develop schemas which once internally ratified may become the compulsory standard for metadata creation across the body. A number of schemas or standards are later ratified by professional, national or international bodies such as the ICA (International Council on Archives), BSI (British Standards Institution) and ISO (International Organization for Standardization).

Back to top

3. Examples of Metadata Standards

There are a large number of metadata standards which address the needs of particular user communities. The first three profiled below primarily support discovery and access. They are progressively more complex to implement and more specialised to particular domains. The last, PREMIS, has been developed specifically to support digital preservation activities.

Dublin Core Metadata Element Set

The Dublin Core Metadata Element Set (ISO Standard 15836) is a basic standard which can be easily understood and implemented and as such is one of the best known metadata standards. It was originally developed, in 1995, as a core set of elements for describing the content of web pages and enabling their search and retrieval. The Dublin Core Metadata Element Set consists of 15 elements which address the most basic descriptive, administrative and technical elements required to uniquely identify a digital resource. The emphasis is now on supporting resource discovery across domains. The Dublin Core Metadata Initiative develops and maintains a suite of inter-related standards. It coordinates a number of working groups who collaborate to develop a metadata registry which supports extended and qualified profiles of Dublin Core, tailored to the needs of a number of different communities or functions, e.g. Dublin Core Collection Description Application Profile (for describing whole collections) and Dublin Core Library Application Profile (for describing published library holdings). Most resource discovery metadata standards can be mapped to the Dublin Core Metadata Element Set, enabling basic federated searching across metadata created using a number of different standards, without detracting from richer metadata held elsewhere. A draft specification for expressing Dublin Core in XML is available from the Dublin Core Metadata Initiative.

e-GMS (e-Government Metadata Standard)

The UK Government is committed to enabling consistency across public sector information and providing better access to public services. As part of this commitment they have developed e-GMS, a metadata standard for government information resources, to enable consistency across government and public sector organisations. Its use is compulsory within the sector and is part of the wider e-GIF (the e-Government Interoperability Framework) which defines technical policies and specifications to enable interoperability and easy access to information across the sector. The standard is currently at version 3 (2004) although version 3.1 will be released soon and a complete overhaul to version 4 is planned. The 15 elements of Dublin Core makes up the core of the standard and it can be readily mapped to 5 other standards if interoperability across metadata records from other disciplines is required. The further 10 elements take account of records management functions, Data Protection and Freedom of Information legislation and basic preservation information. A cut down version of the standard, e-GMS for websites (currently at version 3), is available for those creating metadata for websites.

ISO 19115: 2003(E) — Geographic Information: Metadata

ISO 19115 was developed by the geospatial community to address specific issues relating to both the description and the curation of spatial data. This complex metadata standard can be used for describing digital or physical objects or datasets which have a spatial dimension. There are over 400 elements in the Data Dictionary, which are divided into 14 metadata packages. Each package supports a particular function, some are specific to spatial data and some deal with general description and data curation issues. Abstract models written in UML (Unified Modeling Language) are provided for most of the packages to help the implementer understand how the elements interrelate. The standard also includes methodologies for creating application profiles, metadata extensions and hierarchical metadata and provides implementation examples. Geospatial professionals have developed a number of profiles of this standard to fit particular uses. One of these is UK GEMINI which defines an element set for discovery level metadata. It is also compliant with e-GMS and was developed collaboratively by the UK Association of Geographic Information (AGI) and the Cabinet Office e-Government Unit.

The accompanying XML schema, ISO/CD TS 19139 Geographic information — Metadata — XML schema implementation enables interoperable XML expression of ISO19115 compliant metadata.

PREMIS: Data Dictionary for Metadata Preservation

The Preservation Metadata: Implementation Strategies (PREMIS) international working group was set up by OCLC and RLG in 2003 to define a core set of preservation metadata elements, which could be applied broadly across the preservation community, and to examine a number of practical application issues. In 2005 the group published their final report which included version 1 of the PREMIS Data Dictionary. The accompanying XML schema allows PREMIS compliant metadata to be expressed consistently in XML. PREMIS is rapidly gaining community acceptance and is maintained by the Library of Congress. It won the Digital Preservation Coalition's 2005 Digital Preservation Award.

The PREMIS data model builds on the Open Archival Information System (OAIS) Reference Model (ISO 14721), and defines relationships between five digital preservation activities which it calls entities: Intellectual Entities, Objects (divided into three types: representation, file and bitstream), Events, Agents and Rights. 108 sub-entities and further qualifiers are defined for describing preservation activities of the latter four entities. Only 8 of these are mandatory. The PREMIS Data Dictionary's scope is restricted to the digital preservation activities of: maintaining viability, renderability, understandability, authenticity and identity. It assumes metadata will be auto-generated as much as possible. Implementers are expected to use other applicable metadata standards to describe Intellectual Entities, the characteristics of Agents, Rights relating to access and/or distribution, details of media and hardware, and the business rules of a repository.

Back to top

4. Additional Resources

Back to top

5. Related DCC Resources

Back to top