Digital Curation Centre logo

Research & Development

Home > Research & Development

Research and development are key DCC activities and our R&D teams are working passionately together to transform research-led innovation into services that will enhance the productivity of digital curation practice. Feel free to visit our Publications page and browse through the documents we have written to find out more about what we're doing.

Back to top

Research Activities

Our research team, led by Professor Peter Buneman, Associate Director (Research), and head of the Edinburgh University Database Group [external], has four main goals:

  • To draw together the various functions of curation, from the traditional archival functions to the maintenance and publication of evolving knowledge as seen in scientific databases
  • To identify through direct research collaboration, and through interaction with the service arm of the DCC, the key projects in which research is needed
  • To conduct research in areas already identified by the partners as crucial to digital curation
  • To institute two-way conduits between research and service in which practical issues can be drawn to the attention of researchers and the products of research can be tested in practice

Current research priorities include:

Further research areas have been identified by the research team as important to digital curation.

The DCC hosts a Visitors Programme in which talks by those engaged in cutting edge research are brought to the UK to disseminate their findings and engage with DCC staff. See upcoming and previous events for more information.

If you have any questions or comments, or want to collaborate, please contact the research team.

Back to top

Annotation in databases

Annotation refers broadly to the process of adding or making notes on or upon something. The role of annotation in scientific and scholarly work is especially important because the description or interpretation from trusted sources may inform further interpretation or research. Different communities want to annotate different types of digital data (structured text, images, geospatial data, audio, video), but big questions exist in terms of determining where and how to "attach" annotations of various types to base data, and how to let others search (query), view, or track annotations.

A general study is being conducted of how annotation is being used in scientific data, and what technology is being developed to support it. A prototype system, MONDRIAN, has been developed for the annotation of relational data. This is currently being tested. In addition, using BioDAS [external] concepts of a Distributed Annotation System (DAS) from the bioinformatics community, the AstroDAS project aims to help researchers match sky objects across distributed astronomy catalogues.

Back to top

Data archiving

Techniques have been developed to enable efficient archiving of databases, such as genome databases that evolve rapidly as biological research moves forward. The challenge is to preserve the scientific record by recording all past states of the data. We are currently investigating the possibility of building, for long-term preservation, a distributed archive. We are also pursuing the implementation of a pilot system to help create representation information for long-term data preservation.

Back to top

Socio-economic and legal issues

There are a number of legal issues of relevance to digital curation of which the impact of intellectual property rights is arguably the most significant. We are investigating the operation of copyright and the database right and their implications for those involved in curation activity. In particular we have looked at the impact of the database right framework on those engaged in scientific research. We are also involved in the GRADE project [external] which aims to investigate and report on the technical and cultural issues surrounding the reuse of geospatial data within the JISC IE in the context of media-centric, informal and institutional repositories. We are involved in the legal work package of that project which aims to gain an understanding of the legal rights issues that will arise in a geospatial repository environment and to then establish a licensing system that will be acceptable to the community.

Read commentary and news on the legal aspects of digital curation in our:

Visit our legal blog

It has long been acknowledged that successful digital preservation initiatives are likely to depend heavily on co-operation, e.g. infrastructures that enable the sharing of knowledge, expertise and infrastructure across traditional domain boundaries. The rise of the 'institutional repository' paradigm additionally re-minds us that co-operation is going to be also needed at the repository level. In place of physically integrated, centralised digital preservation systems that will attempt to fulfil all the functions defined by the OAIS model, we are likely to see distributed preservation infrastructures based on networks of co-operating repositories together with shared services like registries of file format information. In these networks, repositories will need ways of interacting both with each other and with their designated communities. Initial work will involve the production of a scoping paper that will introduce the issue and focus on issues like architectures, object exchange and repository certification. This, in turn, will help identify other relevant research topics.

Back to top

Metadata extraction and curation

Manual metadata creation and collection for an exponentially increasing volume of digital objects is becoming more difficult. Research here concerns realising methods for automatic semantic metadata extraction using text mining, machine learning theory and natural language processing. We are also investigating standards and tools for the curation of scientific metadata produced at large scale scientific facilities.

Back to top

Semantic data curation

Data curation is concerned with preservation, annotation and availability of the data. With regards to all three and especially preservation we feel that the 'meaning' and 'machine process-ability' foundations of the Semantic Web and Ontological communities are of direct relevance to such endeavours. The applicability of such approaches with representative data formats used by some of the world-class facilities and data centres run at STFC [external] is being investigated.

Back to top

Provenance and databases

Current work includes developing formal models to help state and improve understanding of the problems that arise from copying data from one database to another without keeping track where the data came from (its provenance). These formal models will assist in developing software tools and standards for tracking, exchanging and managing the provenance of data that is transferred between databases. There is also the related issue of retrieving the provenance or lineage of data products that have been derived through a "pipeline" of data processing steps by others.

Back to top

Data transformation, integration and publishing

This is a large topic of importance to all areas of digital curation. It includes the manipulation of data formats, the combination of data from varied sources, and the efficient transmission of data. It is important in metadata conversion, in publishing in community data formats, and in maintaining the currency of information content.

Back to top

Security

Research in this area includes preparing a report on topics for ensuring a safe data analysis environment for astronomical data centres, and developing an access-control model for XML database content.

Back to top

Supporting technologies

Much of the research and development in digital curation requires supporting database and XML technology. The Edinburgh University Database Group [external] works in close collaboration with the DCC on these topics. We list some of them here.

Back to top

Development Activities

Our development team, led by Dr. David Giaretta, Associate Director (Development), and head of STFC's Astronomical Systems and Services Group at the Rutherford Appleton Laboratory (RAL), works to transform research-led innovation into services that enhance productivity of digital curation practice.

Our DCC Approach to Digital Curation white paper sets out the path for development activities. These are based on the Reference Model for an Open Archival Information System (OAIS), which is a registered ISO standard, 14721. These include:

  • Monitoring international standards
  • Development of a Representation Information registry/repository
  • Development of recommendations for tools and methods for generating Representation Information
  • Creating testbeds for digital curation tools
  • Creating auditing and certification processes for trusted repositories

To find out more, visit our development team's homepage.

If you have any questions or comments, or want to collaborate, please contact the development team.

Back to top

Bookmark and Share