DataBank

DataBank is a scalable, domain-agnostic data repository system designed specifically to manage and share research data in an institutional setting.

It is one of the two components of the DataFlow data management infrastructure, designed to allow researchers to work with, annotate, publish, and permanently store research data. The other is DataStage.

Provider

Oxford University Bodleian Libraries, as part of the wider DataFlow project

Licensing and cost

The software is free to download and use. The source code is released under the MIT (Expat) license.

Development activity

Version 1.0.2 of DataBank was released in May 2013. While the DataFlow project has finished, the source code repository and mailing list show the code continues to be maintained.

Platform and interoperability

DataBank is written in Python and is designed to work with the Ubuntu Linux 11.10 Oneiric Ocelot operating system. Virtual Machine images are provided for VMWare Fusion 4.x (Mac OSX) and VMWare Player (Windows). The system may be deployed on the Eduserv cloud, on a commercial data storage cloud, or on a local institutional server.

Although designed to work together with DataStage, the software offers a simple API that other services can use to integrate with it.

Functional notes

DataBank’s digital object model is based around "collections," also known as "silos," that function as virtual administrative groups. Each silo has a set of users who can read and write files in the silo, and an administrator to manage it. A data package belongs to a silo and may contain one or many data files, metadata files, and license information for the contents of the data package. The administrator can set up an embargo period for the silo, or for an individual data package.

All data is stored as ZIP files, which are unzipped by the software in the access process. Metadata is added and modified directly through RDF files, although it is exposed in both human- and machine-readable forms. The software will also assign DOIs to data sets.

Unless the administrator adds a robots file saying they do not want to be crawled, by default all data held in a non-dark instance of DataBank are visible to Google and any other web crawlers.

Documentation and user support

There is a variety of documentation for the project, including an Information for Test Users page and a DataBank documentation wiki. The software has a developer mailing list and JIRA issue tracker.

Video walkthroughs are available that describe how to set up a suitable server platform, and how to download and set up the software.

Usability

End-users can interact with the DataBank system through a web interface, but metadata must be added by uploading an RDF file. Installation and configuration use a command-line interface.

Expertise required

Installation and configuration require solid knowledge of command-line interfaces, and benefit from system administration experience. The walkthrough videos should make it possible to get the system running without expertise, but novice users may not be able to get maximum functionality and customisability from the system.

Standards compliance

DataBank uses Dublin Core as its default metadata standard. The system is able to assign DOIs using the DataCite API.

Influence and take-up

DataBank is used at the Oxford Bodleian Libraries. It is unknown whether it is used further afield in production, but it has been tested by the RoaDMap project at the University of Leeds, the YHMAN Shared Visual Data Centre and Microsoft Research.

Last reviewed: 
24 November, 2014