Because good research needs good data

Edinburgh IT futures workshop

Chris Rusbridge | 23 November 2008

I spent Friday morning at a workshop entitled “Research, wRiting and Reputation” at Edinburgh University. Pleasingly, data form quite a large part of the programme (although I’ll miss the afternoon talks, including Simon Coles from Southampton). First speaker is Prof Simon Tett of Geosciences talking about data curation and climate change. Problem is that climate change is slow, relative to changes in the observation systems, not to mention changes in computation and data storage. Baselines tend to be non-digital, eg 19th century observations from East India company shipping: such data need to be digitised to be available for computation. By comparison, almost all US Navy 2nd World War meteorological log books were destroyed (what a loss!). He’s making the point that paper records are robust (if cared for in archives) but digital data are fragile (I have problems with some of these ideas, as you may know: both documents and digital data have different kinds of fragility and robustness… the easy replicability of data compared with paper being a major advantage). Data should be looked after by organisations with a culture of long term curatorship: not researchers but libraries or similar organisations. The models themselves need to be maintained to be usable, although if continually used they can continue to be useful (but not for so long as the data). Not just the data, but the models need curation and provenance.Paul Anderson of Informatics talking about software packages as research outputs. His main subject is a large scale UNIX configuration system, submitted by Edinburgh as part of the RAE. It’s become an Open Source system, so there are contributions from outside people in it as well. Now around 23K lines of code. Interesting to be reminded how long-lived, how large such projects can be, and how magically sustainable some of the underlying infrastructure appears to be. What would happen if SourceForge or equivalents failed? However, it does strike me that there are lessons to be earned by the data curation community from the software community, particularly from these large scale, long lived open source projects. Quite what these lessons are, I’m not sure yet!Four speakers followed, three from creative arts (music composition and dance), and the fourth from humanities. The creative arts folks partly had problems with the recognition (or lack of recognition) of their creative outputs as research by the RAE panels. Their problems were compounded by a paucity of peer-reviewed journals, and the static, paper-oriented nature even of eJournals. The composer had established a specialist music publisher (Sumtone), since existing publishers are reluctant to handle electronic music. Interestingly, they offer their music with a Creative Commons licence.The dance researchers had different problems. Complexity relates to temporality of dance. It’s also a young discipline, only 25 years or so, whereas dance is a very ancient form of art (although of course documentation of that is static). Dance notation is very specialised; many choreographers and many dancers cannot notate! Only a very small set of journals are interested, few books etc. Lots of online resources however. Intellectual property and performance rights are major issues for them, however.The final researcher in this group was interested in integrating text with visual data and multimodal research objects. Her problems seemed to boil down to limitations of traditional journals in terms of the numbers and nature (colour, size, location etc) of images allowed; these restrictions themselves affected the nature of the argument she was able to make.OK, these last few are at least partly concerned at the inappropriateness of static journals for their disciplines. Even 99% of eJournals are simply paper pages transported in PDF over the Internet. Why this still should be, 12 years after Internet Archaeology started publishing, as a peer-reviewed eJournal specifically designed to exploit the power of the Internet to enhance the scholarly content, beats me! Surely, just after the last ever RAE is exactly the time to create multimedia peer-reviewed journals to serve these disciplines. They’ll take time to get established, and to build up their citations and impact-factor (or equivalent). Does the new REF really militate so strongly against this? What else have I missed?