Automated Metadata Extraction
Automated metadata extraction is still not very widely used in digital preservation workflows. However, automated extraction can not only help improve efficiency in time and resource management within preservation systems, but also alleviate the problems associated to the “metadata bottleneck”. The successful application of automated metadata extraction requires informed solutions that are based on a broad understanding and integration of existing methods and tools. In particular, solutions should include the identification of weak links in the metadata collection workflow to highlight the components requiring further development, and be firmly grounded in strict quality control at each stage of extraction. .
This chapter aims to provide an overview of existing methods and tools, paying special attention to the quality-related issues (in particular, on the precision and recall of extracted metadata and the need for human intervention). The chapter presents examples of ingest processes and illustrates the essential role of automated metadata extraction as a part of the ingest process.
In addition, this instalment will also justify the use of automated metadata extraction as part of a metadata enrichment scenario.
The chapter is relevant to pre-ingest and ingest within digital preservation workflow.
Key Points
- Overview of methods for automated metadata extraction
- file types
- elements being extracted
- quality metrics
- Case studies in use of automated metadata extraction in digital preservation lifecycle
- Automated metadata extraction as part of the ingest
- Automated metadata extraction for metadata enrichment
- Home
- Digital Curation
- About Us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating emails
- Curating e-science data
- Curating geospatial data
- Data accreditation
- Data Citation and Linking
- Data protection
- Database archiving
- Digital repositories
- Freedom of Information
- Genre classification
- Interoperability
- Persistent Identifiers
- Trust through self audit
- Using OAIS for curation
- Web 2.0
- What is digital curation?
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Making the Case for RDM
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Case studies
- Tools and applications
- Standards
- Publications
- External resources
- Roles
- Curation journals
- Informatics research
- Briefing Papers
- Training
- Projects
- Community
- Contact Us
SCARP Synthesis Study
SCARP Synthesis Study
Shedding light upon the diversity of scientific research is this DCC-commissioned report, based on SCARP and other case studies. Attitudes and approaches to data deposit, sharing, reuse, curation and preservation are investigated across a range of research fields and disciplines.
