Because good research needs good data

Automated Metadata Generation

Milena Dobreva, University of Malta, Yunhyong Kim, University of Glasgow, and Seamus Ross, University of Toronto

Automated metadata extraction is still not very widely used in digital preservation workflows. However, automated extraction can not only help improve efficiency in time and resource management within preservation systems, but also alleviate the problems associated to the “metadata bottleneck”. The successful application of automated metadata extraction requires informed solutions that are based on a broad understanding and integration of existing methods and tools. In particular, solutions should include the identification of weak links in the metadata collection workflow to highlight the components requiring further development, and be firmly grounded in strict quality control at each stage of extraction. 

This chapter aims to provide an overview of existing methods and tools, paying special attention to the quality-related issues (in particular, on the precision and recall of extracted metadata and the need for human intervention). The chapter presents examples of ingest processes and illustrates the essential role of automated metadata extraction as a part of the ingest process.

In addition, this instalment will also justify the use of automated metadata extraction as part of a metadata enrichment scenario.

The chapter is relevant to pre-ingest and ingest within digital preservation workflow. 

Download the Automated Metadata Generation instalment 

Key Points

  • Overview of methods for automated metadata extraction
    • file types
    • elements being extracted
    • quality metrics
  • Case studies in use of automated metadata extraction in digital preservation lifecycle
  • Automated metadata extraction as part of the ingest
  • Automated metadata extraction for metadata enrichment