CKAN for research data management

22 February, 2013

CKAN, developed by the Open Knowledge Foundation (OKF), is an open source data management system that makes data accessible, providing tools to publish, share find and use data. It has been used to provide a number of government data portals, including data.gov.uk.

On Monday 18th February, the Jisc Managing Research Data programme, in conjunction with the OKF and the Orbital and data.bris projects, ran a workshop to provide an opportunity to learn more about CKAN and its potential for managing research data and to help Joss Winn of the Orbital project to gather requirements to evaluate CKAN for academic use.

After an introduction from Joss and Simon Hodson of Jisc, Mark Wainwright of the OKF gave an introductory presentation on CKAN use cases and functionality. He covered its functionality as a data store, catalogue and publishing platform, the web interface and API for inserting data and the search and visualisation tools for finding data. It was described as a catalogue for discovery with very flexible metadata. He noted that data.gov.uk/data is pure CKAN, and drew our attention to PublicData.eu, which harvest data from a variety of sources, only some of which are CKAN instances. CKAN 2.0 is due for imminent release, and a beta instance is accessible at beta.ckan.org. Among the features of interest to the delegates were the ability for users to follow datasets or publishers and for ownership of datasets to be assigned to organisations rather than only to individuals. CKAN has a number of extension points with a range of extensions available, and there should be migration/upgrade paths for those when upgrading to version 2 (which can be done without the loss of data). There are also a number of libraries available for the visualisation of data.

Joss spoke about the use of CKAN as part of the Orbital project at the University of Lincoln. He mentioned a post on the (excellent) project blog explaining why CKAN was chosen for the project. They wanted to provide a platform that would encourage researchers to deposit data throughout the research process, rather than just on publication. They had been developing their own solution, but went for CKAN when its data store functionality was added. There is a focus on integrating research data with research information systems in order to maximise the value to the university. The system includes a researcher dashboard and links with staff databases, the university’s ePrints repository and the awards management system.

Simon Price from data.bris told us how the University of Bristol already offered researchers a large amount of storage space for research data, visible to them as a network drive. data.bris works with this storage system, the university’s Pure research information system. Again, they were looking at writing their own system before deciding to use CKAN. Metadata harvesting was very easy to set up, and the system is being used for the research data publication. However, they have discovered that researchers wanted something to facilitate the sharing of research-active data and are looking at the possibility of running a second CKAN instance for this purpose. Simon has since written a blog post on the workshop.

The second half of the day involved a requirements gathering exercise to try and help establish how CKAN could be more useful for research data management. A Google spreadsheet is available with the outputs from that. A range of issues and desired features emerged. Some of them were simple but useful, such as presenting users with an appropriately formatted reference to a dataset for use in publication, some were more complex and some were possibly outside the scope of CKAN development e.g. the ability to search for data across multiple institutions.

All in all it was a very useful workshop which turned out to be much more popular than the organisers had originally envisaged. There would appear to be an appetite for using CKAN in research data management, and it is to be hoped that the requirements of researchers and research institutions will influence the future direction of the software. A community of users and developers in the sector could help to move things in the right direction.