Repositories for scientists

Chris Rusbridge | 31 March 2008

Nico Adams in a post to Staudinger's Semantic Molecules has added to the scenarios for repositories for scientists:

"Now today it struck me: a repository should be a place where information is (a) collected, preserved and disseminated (b) semantically enriched, (c) on the basis of the semantic enrichment put into a relationship with other repository content to enable knowledge discovery, collaboration and, ultimately, the creation of knowledge spaces. "

The linkage seems to be the important thing, which current repositories don't do well. He goes on:

"Let’s take a concrete example [CR: scenario, I think] to illustrate what I mean: my institutional repository provides me with a workspace, which I can use in my scientific work everyday. In that workspace, I have my collection of literature (scientific papers, other people’s theses etc.), my scientific data (spectra, chromatograms etc) as well as drafts of my papers that I am working on at the moment. Furthermore, I have the ability to share some of this stuff with my colleagues and also to permanently archive data and information that I don’t require for projects in the future."

He then develops this argument further, making a lot of sense, and concludes:

"Now all of the technologies for this are, in principle, in place: we have DSpace, for example, for storage and dissemination, natural language processing systems such as OSCAR3 or parts of speech taggers for entity recognition, RDF to hold the data, OWL and SWRL for reasoning. And, although the example here, was chemistry specific, the same thing should be doable for any other discipline. As an outsider, it seems to me that what needs to happen now, is for these technologies to converge and integrate. Yes it needs to be done in a subject specific manner. Yes, every department should have an embedded informatician to take care of the data structures that are specific for a particular discipline. But the important thing is to just make the damn data do work!"

I have heard it suggested elsewhere that DSpace may not be up to this kind of linkage, and also the FEDORA is a suitable candidate, but I don't have sufficient experience of the workings of either to be sure. Anyone comment?