Because good research needs good data

IDCC15 session report: A Decade of Digital Curation

A report on the plenary session 'A Decade of Digial Curation' at the International Digital Curation Conference, London, on Tuesday 10 February 2015.

Martin Donnelly | 24 February 2015

On day two of this year’s IDCC, CODATA Executive Director Simon Hodson chaired a plenary session on the topic of “A Decade of Digital Curation”, with an opportunity for reflection on where we’ve been, lessons learned, and how we might apply these to our future endeavours. Each of the five presenters spoke for around fifteen minutes, followed by a joint panel with opportunities for audience participation.

The first presentation came from the University of Pretoria’s Heila Pienaar & Martie Van Deventer, and was entitled ‘Research data management in a developing country: A personal journey’. Heila and Martie gave an enthusiastic account of a community-driven, bottom-up approach to RDM, which was largely bootstrapped with very little dedicated funding. They tracked international developments (notably Jisc, the British Library and the DCC in the UK) from afar, following up with visits and other liaison with institutional RDM activities at Oxford (UK), Monash (Australia) and Purdue (US) universities, repurposing suitable surveys and training materials via open licences. They concluded with an overview of the new mandate from the National Research Foundation, which dictates that from March 2015 research papers will have to be Open Access and data should be deposited in an open repository. There’s an obvious risk of disconnect here between demands and resourcing, and despite the many creative uses made of time and international goodwill thus far, the key take-home message was on the importance of funding for RDM activities: bootstrapping has its limits and can only take you so far.

Christopher Fryer, Senior Digital Archivist at the UK Parliamentary Archives, then gave a presentation entitled ‘Project to production: Digital preservation at the Houses of Parliament, 2010-2020’, which – rather than reflecting on the past decade – looked back at the last five years, and forward to the next five. He outlined a general pathway from pilot projects to mature services, noting – as others have in the past – that digital preservation is a social/organisational problem more than a technical one, and described the Archives’ collaborations with internal and external stakeholders via formal and informal user groups. Christopher also outlined the Archives’ use of the governmental G-Cloud framework, and their “cloud-first” policy which mandates that open data is stored in cloud, while more sensitive, closed data is stored internally. With cloud being a relatively recent technological development, the longer-term economic impact/implications are currently being monitored and information being gathered to enable evermore evidence-based future investment decision-making.

Next up was Sam Pepler, covering not ten but “Twenty years of data management in the British Atmospheric Data Centre”, and noting that the data centre’s origins stretched back even further than this. Sam and his colleagues examined BADC reports and other internal documents from three years – 1995, 2004, 2014 – and asked “What does this examination tell us about discovery access, data management planning, policies etc?” (Readers are encouraged to look at Sam’s slides for much more detail about the development of data management trends than I have space for here!) Sam noted that it wasn’t always a given that BADC would endure as a long-term data repository, and indeed it is thanks to sensible decisions taken (or deferred!) in the 1990s that continuity of access to older materials has been maintained. The increasing priority of data management planning was highlighted as a consistent, upward trend to combat the disconnect between comparatively short-term research careers and comparatively long-term data utility. The final takeaway message was on the benefits of openness, and the desirability of an open-by-default (unless there’s good reason to do otherwise) approach, which chimed with Christopher Fryer’s earlier presentation.

Not to be outdone, Southampton’s Jeremy Frey then cast his mind back to 1665 (!), so to speak, for one of his slides on tackling the themes of memory, note-taking and curation. Jeremy’s presentation, titled ‘Collection, curation, citation at source: Publication@Source 10 years on’, gave a chemist’s eye view of the relationship between data, its creator(s), and its reuser(s) in a larger-scale, collaborative/e-Science context. Jeremy’s was a somewhat more technical presentation than the others, drilling down into issues of metadata and structure as crucial to framing and understanding information. The organisation of information is of utmost importance, but getting people to actually do it is really difficult. Metadata is, therefore, a major challenge, and a possible (partial) solution may be to encourage documentation via readable/writable sentences rather than by attaching keywords from fixed vocabularies. This has the benefit of capturing actions as well as descriptions of things, thereby enriching the would-be reuser’s understanding of the original processes. This has the beneficial effect of reducing, or at least managing, uncertainty.

In the final presentation of the session, Chris Awre spoke about ‘Meeting institutional needs for digital curation through shared endeavour: the application of Hydra/Fedora at the University of Hull’. Chris concentrated on the role of the institutional repository as an integral component of the infrastructure, and the importance of being considered ‘part of the furniture’ – or, perhaps better, ‘in with the bricks’ – and enumerated five guiding principles, before relating each to the development of Hull’s chosen repository platform (Hydra):

  1. A repository should be content agnostic;
  2. A repository should be (open) standards-based;
  3. A repository should be scalable;
  4. A repository should understand how pieces of content relate to each other; and
  5. A repository should be manageable with limited resource.

The final message of Chris’s presentation was that resource representation (or record display) should adapt to the content, i.e. bibliographic and data records should display differently. So a single repository can meet diverse needs, provided the development is carried out sympathetically to different stakeholder requirements.

The session concluded with a short panel discussion session, which Simon Hodson got underway by asking about community collaboration. What are the cultural norms and expectations, and what are the pros and cons, of a collaborative approach? The first comment came from the floor, noting that collaboration is driven by shared self-interest: and that obliging people to collaborate doesn’t work. So the incentives generally have to emerge organically, and the job of infrastructure providers is to foster an environment in which collaboration is encouraged and supported. Christopher Fryer picked up on this, noting the potential friction between a fluid, hands-off supporting approach and an impact-driven funding model. The benefits realisation process is still ongoing, and there is a need to demonstrate progress against objectives even as the environment continues to evolve.

Simon next asked how the community has reacted to relatively new policies on Open Access and data. In South Africa, where the new mandate has only very recently been announced (and comes, very quickly, into effect), there is no compliance to speak of just yet, but they expect that compliance rates will be high due to the financial penalties that the new mandate enables (in that if you don’t do what they say you won’t get any more money!). However, as the repositories aren’t yet in place, there’s no prospect of financial penalties until that happens.

Finally, Simon asked each of the panellists to identify the most important single lesson learned in their engagement with digital curation to date. Jeremy Frey emphasised the need to engage with the data producers, and to update supporting institutional systems and procedures sympathetically to keep up with the data producers’ own changing environments. He also cautioned against trying to do too many things at once, a sentiment echoed by Christopher Fryer in his recommendation for small, practical, pragmatic, incremental steps. Sam Pepler noted the need to keep systems and procedures as simple and straightforward as possible, sticking to meeting the needs of core business and resisting the temptation to over-engineer. Chris Awre emphasised the benefits of collaboration via the oft-quoted African proverb ("If you want to go fast, go alone. If you want to go far, go together"), but warned that increasing collaboration requires considerable coordination effort to prevent mission creep: in short, always remember what you’re trying to achieve. Our South African colleagues had the last word in the session, agreeing with the ‘keep it simple’ maxim, and exhorting us to remain sympathetic to researchers’ needs and feelings.