House of Lords debates research data curation and disclosure

15 January, 2012

There is an intriguing debate going on in the House of Lords that has implications for everyone involved in managing and curating research data.

The other day Ibrahim Hasan pointed the Jiscmail Freedom of Information list to an interesting article on the "FOI Man" blog. The article poured scorn on Universities UK's support for amending the Protection of Freedom's Bill, which the Lords are currently scrutinising. The proposed new law would do various things such as ban wheel clamping on private land. More to the point, Clause 100 of the bill extends Freedom of Information rights. It  requires any "public authority" to not only make datasets available on request but to provide it in a "re-usable format".

This includes universities or other publicly funded research organisations, and the HEI lobbying group Universities UK is trying to get the Lords to amend the bill, to introduce an exemption for pre-publication research, similar to that already existing in the Scottish version of the FOI legislation.  

A retreat to the "ivory towers" is how FOI Man characterises the amendment, and it was interesting to see former DCC Director Chris Rusbridge's staunch defence of the Scottish exemption, along with Andrew Charlesworth, co-authors of JISC guidance on FOI and Research Data. Chris's response is along the lines that FOI requests on research data are being used increasingly, and could be used much more intensively in ways that are not necessarily helpful to research.

It seems professionally perverse to argue against enshrining in law a public right of access to reusable data. What greater incentive could there be for researchers to ensure the data they create is safe, understandable, and verifiable? It is hard to argue with FOI Man that "It is a fabulous privilege to be funded to increase society’s knowledge. Stubbornly refusing to accept and embrace FOI as a method of engaging with the worldis going to leave people with a very old-fashioned image of universities, that in my experience is not reflective of their true ambitions." The exemptions that apply outside Scotland still apply and, as FOI Man points out, these include protection for personal and commercially confidential data. Moreover, section 22 of the FOI Act already gives a qualified exemption protecting information that is intended for future publication.

The question is, is FOI providing the outcomes we want in terms of more transparent research, public engagement, or reusable outputs? Rusbridge and Charlesworth are right in my opinion. The current FOI legislation is not a good enough carrot for these, except perhaps that it gives university FOI officers a strong incentive to bring research data into the fold of their planned release schedules. I wonder how many researchers are excited by that prospect, when claiming exemption may effectively require them to justify withholding data in the 'public interest' before they themselves even understand what that data means. FOI may not be a Sword of Damocles hanging over researchers' heads, but other unfortunate analogies come to mind; at worst a flick-knife that cranks and bullying corporations can use to intimidate, and undermine work they dislike.

So what would work better? Reading the excellent committee debate on the UK Parliament website I was intrigued by the argument put forward by Cambridge University philosopher Professor Onora O'Neill, aka Baroness O'Neill of Bengarve. Clause 100 she said "would currently require disclosure of data sets while data were still being entered and had not yet been checked" and this "could be misleading as well as damaging to research projects and to those provided with the incomplete, and perhaps misleading, data." This is a fairly well used argument with a well honed retort from the open science community, along the lines that many-hands-make-light-work, and open scrutiny is the best route to spotting errors in analysis. This argument is rather less convincing however when the data, its description or its analysis have not been done sufficiently to enable scrutiny, and are extracted on the demands of a requestor who is under no obligation to share it with anyone else.

The Bill's clauses requiring reusable release are admirable in O'Neill's view, but with the caveat that "...for a relatively simple spreadsheet, this requirement would create no more difficulty for research databases than it does for government data sets. However, some scientific data sets are of orders of magnitude larger and do not use standard software; even if it is feasible, it may be extremely costly to render them usable by others …it may be necessary to provide metadata or to process data further in order to make access to them more feasible even for competent others."

On the other hand, Amendment 148B would permit holders of research data to "undertake to provide" data by deposit in an archive or by setting out a "data publication or sharing scheme that will provide access for others and also secure the crucial benefits of professional data curation and data security." This amendment "seeks to postpone access where such archiving is not merely foreseen but is something that data holders have undertaken to provide. In effect, it would create a temporary exemption for the data concerned."  

This amendment did not go forward, more is the pity, so I hope that something like it reappears as the bill is revised or in the Code of Practice that the government promises. It could have implications for all concerned in managing research data. It could remove from Data Management Plans the perception that they are vague commitments that researchers can safely forget after making them, in the knowledge that the funding bodies requiring them do not monitor them. They would instead be taken further as active plans, becoming insurance policies, underwritten by whatever infrastructure their institution has in place to support data management and publication.

Helping researchers to make the data from completed projects reusable will have its rewards for research, and for economics and societal impact. For research-in-progress the release should be voluntary and driven by those benefits (derivative research, collaboration, and so on) that researchers can see themselves or with advice from their peers and support services. There is certainly a case for saying (as Rufus Pollock once put it) that "the coolest thing to do with your data will be thought of by someone else", but not if they don't know what it is. Data is not research data unless you can follow the story of how it came to be. That is a big challenge to the greater ambition of public right of access to research.

More about

data policy, FOI