Because good research needs good data

IDCC14 Report on Parallel B, session 2: 'Linking with articles'

Jonathan Rans | 10 March 2014

Building a Bridge Between Journal Articles & Research Data - Eleni Castro and Alex Garnett

In the data publishing parallel session on Wednesday 26 February we heard from Eleni Castro from Harvard and Alex Garnett of Simon Fraser University talking about building a bridge between journal articles and research data.

The fundamental problem, Eleni outlined, is the lack of consolidated infrastructure connecting journal’s articles and the data that supports them and that developing this infrastructure cannot happen without the participation of the publishers.

Eleni and Alex have been working in collaboration to connect the Public Knowledge Project’s open journal system to the DataVerse repository platform. This would allow researchers to deposit data through a journal’s interface making the deposit of datasets, supporting files, and metadata a seamless part of submitting an article. Eleni emphasised the importance of interoperability in the design of systems - they use the EZID system from CDL to issue DataCite DOIs to datasets deposited through the Sword API.

The project has now completed and released an OJS/Dataverse plug-in and a data deposit API which have been tested with a small sample of journals. They are using a fixed set of metadata based on a Dublin Core crosswalk but this may not be the solution that is ultimately implemented.

Most feedback so far is positive, although it is interesting to note that users wanted to see citations and DOIs displayed prominently. Future work will focus on updating the plugin and API and developing permanent, two-way linking of article and data.

In response to a question from the audience Eleni explained that any changes to the dataset between initial submission and publication would be handled by the DataVerse architecture which already has that facility inbuilt. DataVerse uses Unique Numeric Fingerprints (UNFs) to validate dataset content.


Cross-linking between journal publication and data repositories: a selection of examples - Sarah Callahan

The second talk of the session was from Sarah Callahan (STFC) who discussed current examples of crosslinking between journals and data repositories identified by the Preparde project.

The primary example of crosslinking is the inclusion of a DOI in a journal article’s data citation. The main issue here is that the return link between repository and article requires the manual input of information provided by email from the data journal. This process is clearly non-scalable.

As a response to this problem, Preparde suggested the development of an intermediate registry acting as a resolution service. However, there would still be considerable standardisation issues to overcome in order to implement this service and it would provide a possible single point of failure. A possible starting point for the development of a registry could be the DataCite metadata store

Sarah discussed four other forms of crosslinking starting with the inclusion of links in journal article sidebars. The difficulties with taking this approach are that there are usually limited amounts of free space to site links on web pages.

Collecting datasets through geographical map displays was another example, with repositories like Pangaea taking advantage of geolocation metadata to place datasets together on a map. This offers powerful ways of visualising disparate datasets but would require standardisation of metadata to be fully implemented.

By pulling metadata from a repository to populate journal articles publishers can provide richer information about supporting datasets. This might include metrics about the number of dataset download alongside relevant links. Figshare offers a good, current example of this, providing a widget that enables publishers to extract this information. F1000 has a working example of this kind of relationship; although it still relies on email to work Figshare are working on an API.

The final example of currently operational crosslinking comes from the OECD and F1000, both of whom offer a ‘data behind the graph’ function allowing users of the site to play with the visualisations of the supporting datasets in an article.

The primary recommendations on cross-linking from Preparde were that interoperability is essential, therefore standardisation is key; DataCite DOIs and metadata should be used, citations should be placed in an articles reference list for visibility, and the development of an intermediate cross-linking registry is strongly suggested.