DataStage
DataStage is a flexible data storage system that provides controlled access, secure backup, and the ability to transfer selected files to a more permanent archiving facility. Designed for research groups, the system appears as a mapped drive on the end-user’s computer, with additional features such as repository submission and addition of metadata available via a web interface.
It is one of the two components of the DataFlow data management infrastructure, designed to allow researchers to work with, annotate, publish, and permanently store research data. The other is DataBank.
Provider
Oxford University Bodleian Libraries, as part of the wider DataFlow project
Licensing and cost
The software is free to download and use. The source code is released under the MIT (Expat) license.
Development activity
Version 0.3.1 of DataStage was released in May 2012. While the DataFlow project has finished, the source code repository and mailing list show the code continues to be maintained.
Platform and interoperability
DataStage is written in Python and is designed to work with the Ubuntu Linux 11.10 Oneiric Ocelot operating system. Virtual Machine images are provided for VMWare Fusion 4.x (Mac OSX) and VMWare Player (Windows). While it is intended to integrate with DataBank, the software offers an API so that it can package datasets for submission to any SWORD-2-compliant repository.
End-users can connect to DataStage through a web interface or as a mapped drive on Mac, Linux or Windows machines.
Functional notes
The software gives three levels of password-controlled access: a "private" area only accessible to the file owner and the group leader, a "shared" area giving read-only access to the group, and a "collaborative" area giving read- and write-access. The administrator can invite outside collaborators into the group, pinpointing their level of access. Users can also access and annotate the files through a web interface.
DataStage can be deployed on a local server, or on an institutional or commercial cloud; users can also dynamically invoke additional cloud storage as required. Users can integrate the system into existing backup procedures. The repository interface also allows researchers to push selected files into a more permanent archive facility.
While users can add free-text metadata via the web interface, DataStage also automatically captures a number of general file attributes: date uploaded; file name; last modified; type; owner; location; and size.
Documentation and user support
Documentation is available in the form of an Information for Test Users page and the DataStage documentation wiki. The software has a developer mailing list and JIRA issue tracker. Installation instructions are included in a README file, which comes zipped with the installation package.
Video walkthroughs are available that describe how to set up a suitable server platform, how to download and set up the software, and how to interact with it from the desktop.
Usability
End-users interact with the system either as a mapped drive on their computer, implicitly integrating with their operating system’s current navigation structure, or through a web interface. Installation and configuration use a command-line interface.
Expertise required
Installation and configuration would greatly benefit from knowledge of system administration, and use of the Linux command line. The walkthrough videos should make it possible to get DataStage running without expertise, but novice users may not be able to get maximum functionality and customisability from the system.
Standards compliance
DataStage automatically gathers metadata in RDF format. The system uses the BagIt specification when transferring files to a permanent archive, which must be SWORD-2 compliant.
Influence and take-up
DataStage is used at the Oxford Bodleian Libraries. It is unknown whether it is used further afield in production, but it has been tested by
- UK Data Archive (in conjunction with Eprints);
- University of Hertfordshire and the Centre for Digital Music, Queen Mary University London (in conjunction with DSpace);
- RoaDMap project, University of Leeds;
- YHMAN Shared Virtual Data Centre;
- a pool of research group leaders in the University of Oxford who implemented ADMIRAL (a precursor to DataStage).
- Home
- Digital curation
- About us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Interoperability
- Persistent Identifiers
- Trust Through Self Assessment
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Common Directions in Research Data Policy
- 5 Steps to Research Data Readiness
- Citizen Science
- Making the Case for RDM
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides & Checklists
- Appraise & Select Research Data for Curation
- Cite Datasets and Link to Publications
- Develop RDM Services
- Develop a DMP
- Discover Requirements
- Five Steps to Decide What Data to Keep
- Five Things You Need to Know About RDM and the Law
- License Research Data
- Track Data Impact with Metrics
- Using RISE
- Where to keep research data
- Write a Lay Summary
- Developing RDM Services
- Reviewing research data platform capabilities at CISER
- Using EPrints to Build a Repository for UEL
- Assigning DOIs at Bristol
- DMPs in the Arts and Humanities
- Improving RDM at Monash
- Improving Research Visibility
- Increasing Participation in Training
- RDM Training for Librarians
- RDM strategy: moving from plans to action
- Storing and Sharing Data in Hull
- Curation Lifecycle Model
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Automated Metadata Generation
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Scenarios for Projects Producing Digital Resources
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Scientific Metadata
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Policy and legal
- Data Management Plans
- Tools
- Case studies
- Repository audit and assessment
- Standards
- Publications and presentations
- Roles
- Curation journals
- Informatics research
- External resources
- Online Store
- Briefing Papers
- Training
- Projects
- Community
- Tailored support
