Because good research needs good data

Introducing DataFlow and ViDaaS

Marieke Guy | 05 March 2012

“Data management is too important to leave to the data managers, it needs to be an important part of research”  Mark Thorley, data management co-ordinator for NERC.

Friday saw the launch of two new UMF-funded data management infrastructure projects at the Said Business School, University of Oxford. The UMF programme aims to help universities and colleges deliver better efficiency and value for money through the development of shared services.

DataFlow

DataFlow is a collaborative project led by the University of Oxford. It is a two-tier data management infrastructure that allows users to manage and store research data. The project builds on a prototypes developed in the JISC-funded ADMIRAL project.

The first tier, called DataStage, is a file store which can be accessed through private network drives or the web. Users can upload any research data files and the service is backed up nightly. DataStage is likely to be used by single research groups, but as it allows different levels of access  it can also be employed to handle data in collaborative projects. Deployment can be on a local server or on an institutional or commercial cloud.

The second tier is DataBank, which, through a web submission interface, allows users to select and package files for publication. Files are accompanied by simple metadata and contain an RDF manifest, which is then displayed as linked open data. Metadata collection is made easier by being semi-automated and as minimal as possible. The files are packaged using the BagIt service. Databank is a scalable data repository where data packages are published and released under a CCZero licence, though users can chose to keep data private or add an optional embargo period.

DataFlow is now at beta release v0.1. The DataFlow team are keen to build a user community and have lots of processes in place allowing users to comment on developments.

ViDaaS

ViDaaS (Virtual Infrastructure with Database as a Service) comprises of two separate elements. DaaS is a web based system that enables researchers to quickly and intuitively build an online database from scratch, or import an existing database. The virtual infrastructure (VI) is an infrastructure which enables the DaaS to function within a cloud computing environment. At Oxford this is the Online Research Database Service (ORDS). ViDaaS builds on ideas from the JISC-funded Sudamih project. The ViDaaS service currently has three business models:

  • £600 per year for a standard project (25gb)
  • £2000 per year for large project (100gb)
  • Later option for public cloud for hosting

ViDaaS is officially launching this summer.

Further details on interoperability between ViDaaS and DataFlow are contained within the Data Management Rollout at Oxford (DaMaRO) Project. Resources from the launch workshop are available from the DataFlow website.

Both services are seen as enabling 'sheer curation'. This is an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets.

Use of shared infrastructure services is supported by the JISC. They offer potential cost savings and reduction of overheads, easy transferability of data and reuse of tools.