Because good research needs good data

UK Repositories claiming to hold data

Chris Rusbridge | 31 March 2008

The OpenDOAR and ROAR services both present self-reported claims by repositories across the world about their contents, backed up by some harvested facts. I’m interested in those UK repositories that claim to hold data.My first problem is that neither repository allows me simply to choose data. OpenDOAR allows me to search on “Datasets” (63 world-wide, 8 in the UK), while ROAR allows me to search for “Database/A&I Index” (24 world-wide, 6 in the UK). I thought the latter was a surprisingly “library science” classification, given the origins of ROAR. Not surprisingly, most repositories are in only one of the lists. Also not surprisingly given the origins of these services in the Open Access and OAI-PMH movements, there are many first class data repositories NOT listed here (UKDA and BADC, for example).The UK repositories listed are:OpenDOAR “Datasets”

Looking at the OpenDOAR listing, and linking through to the repositories themselves, I find it very difficult to actually FIND the datasets in most cases. Looking at ERA, for example, there is no effective search for these datasets. Browsing soon leads to the realisation that the contents are papers, articles, theses, etc. Some of these may have datsets associated or within them, but they are a bit shy! The Edinburgh Datashare repository is a pilot, but does have a couple of real datasets. In a different way, Nature Precedings also is shy of disclosing its datasets.The 3 that do have serious amounts of data are DSpace @ Cambridge, eCrystals and NDAD. DSpace @ Cambridge is dominated by the 100,000 ++ collection of chemical structures encoded in CML, but there are plenty of other datasets there, including some from Archaeology. Sadly, there are plenty of empty collections, and many collections where the last deposit was 2006 (I guess around when the funded project died). eCrystals is completely crystal structures, and has some very nice features; find a compound, and as you look perhaps rather bemused at the page, a Java object loads and there you have a rotatable image of the molecular structure before your eyes on the data page! NDAD also has many ex-Government datasets, some of them very large.ROAR “Database/A&I Index”

NDAD and eCrystals (under a slightly different name) appear again in this ROAR set. Of the others, HEER seems to be closed (it wanted a password for every page I found), and ReFer seems to have been withdrawn. ReOrient seems to be frozen and to present its data in the form of maps, while the Linnaean Collection seems to be images (which can, of course, be good data as well).It’s a rather sad study! I do hope that the Open Repositories 2008 conference in Southampton over the next couple of days leads to an improvement. I can't get there, unfortunately, but I hope someone will report from it here. I particularly liked the idea of the developers challenges. Can we have some oriented to data, please?