Metadata, metadata, metadata

15 April, 2013

Metadata emerged as the underlying theme in this session on Data Repositories, Portals and Catalogues at the JISC MRD Achievements, Challenges and Recommendations workshop held in Birmingham on 25-26 March 2013. Programme

Tom Ensom kicked off the session by describing two technical outputs from one strand of work by the UKDA-led Research Data @ Essex project, carried out closely with support services at the University of Essex: a metadata profile and a the Recollect plug-in for the Eprints software. Presentation

The project took a metadata model from the IDMB project as a starting point, adopting a three-tier model of metadata. Another influence was a consensus that seemed to be emerging around the DataCite core metadata elements however the project recognised a gap in consensus around additional metadata that would be needed.

The 3-tier model defines three levels of detail for metadata, starting with the Core level detailing the basic metadata required, the second Detail level which expands with further metadata, and the third Description level to flexibly accommodate discipline-specific metadata requirements.

The three tier metadata model adopted by Research Data @Essex

The development of the metadata profile underpinning the ePrints repository instance deployed by the project as a pilot also considered mapping between different schemas including those from INSPIRE and DDI 2.1.

The resulting metadata profile is described by the project.

The project then went on to adapt the Eprints software to store and present research data, for example by modelling data objects as rich data collections made up of metadata and various files. The Recollect plugin was developed by taking advantage of the click-control customisation aspects of Eprints, and has display features such as an added column to display the available files associated with a data object, and panels that can be expanded to show additional metadata. Data upload is a 3-stage process which builds on classic EPrints features such as help fields, to assist with metadata entry, and the option to include a metadata-reviewing process. The plugin is available through bazaar.eprints.org or the Eprints plug-in management system, and support materials are available for those wishing to re-use these outputs (to be made available soon).

We also heard from Glasgow's David McElroy on testing currently going on at Glasgow on their Eprints instance, where they are testing an interface with academics using RCUK subject topics and themes as part of metadata entry. They are also aiming at a Datacite-compatible metadata schema as a minimum requirement.

The CERIF for datasets project (C4D), presented by Kevin Ginty from the University of Sunderland, has worked with EUROCRIS to develop the CERIF metadata standard (which handles research information management fairly well) to improve its handling of research data output metadata, and particularly links between outputs, projects and grants. A demo has been developed in the marine sciences, although the project outputs are intended to be subject-neutral. They have also worked with systems providers (ATIRA, the providers of the PURE CRIS system, and the EPrints team) to ensure that these proposed extensions to CERIF will be available widely to institutions already using these tools. Anyone interested in having a go at metadata creation through a demo via the related irios2.wordpress.com project and explore import and export options to CERIF should get in touch with Kevin to arrange access (contact details on the last slide if his presentation).

Architecture incorporating CERIF metadata with other institutional systems by the C4D project

Wendy White on behalf of the DataPool project at the University of Southampton described the implementation of DOI assignment via the DataCite service at the British Library, with a particular focus on work being explored with the crystallography service. Negotiations are at an advanced stage for formal agreements for this University to start assigning DOIs through DataCite; the library will be the main controllers initially with control devolved to identified gatekeepers at a later stage. Practical policy details are being worked out to determine, for example, circumstances when assignment of a DOI might not be considered suitable.

The crystallography group at Southampton have provided an interesting testbed for considering DOI assignment. The group have experience of using DOIs (provided through CrossRef) for crystallography going back to 2003, driven by an interest in enabling data-driven science. Questions raised around the migration to DataCite-minted DOIs include whether to remint a new DOI for the same resources (what are the implications of having two identifiers?) Further questions are prompted by the use case of data captured through lab notebooks (primarily through the labtrove.org initiative) – what is the appropriate granularity of citation (and identification)? The whole notebook vs a single note vs a single molecule described in the electronic notebook? The metadata captured via the electronic notebooks also provides opportunities for exploring layering of metadata in the landing pages to which identifiers resolve, and navigational interfaces offered by systems such as Microsoft's silverlight.

Datapool would be very interested in community discussion with others who are considering similar questions regarding institutional policies and practices of assigning DOIs. Presentation

Finally, Charlotte Pascoe described a metadata creation tool developed in the PIMMS project. PIMMS (Portable Infrastructure for the Metafor Metadata System) revolves around a Common Information Model (CIM) developed in the METAFOR project for the climate system modelling domain. The presentation focused on how this metadata tool has provided a means of engagement for users and helped to evolve their thinking on data. In particular a (free text) metadata field which captures the rationale for an experiment (what was the purpose of running the experiment in this way?) has added an extra dimension to the way researchers can communicate about their data and experiments, adding layers of information based on a metadata infrastructure to the tools that previously mainly supported interactions around visualisation and data display interactions. The metadata tool also captures other basic information on experiments, such as project and grant information as well as experimental parameters, and supports the recording of (metadata for) model runs that failed (decoupling the capture of metadata from data) thus opening up the possibility of avoiding repetition of failed experiments. The tool offers other features such as management of controlled vocabularies for metadata entry.

Between them these projects showcase several aspects to be considered by institutions creating metadata collections for their research outputs: adapting existing software and schemas, interface design for capturing metadata (including use of vocabularies), and integration with external systems (such as those for digital identifiers).