Because good research needs good data

FOI dataset provisions - what do they mean for RDM?

RDM managers should be aware of recent changes in Freedom of Information legislation in England and Wales, which add to the risks and opportunities around making research information and datasets openly accessible.

A Whyte | 27 September 2013

Records managers and compliance officers in UK Higher Education Institution should already be aware of changes to how the law on Freedom of Information deals with datasets. They affect universities as 'public authorities', but are they relevant to implementation of Research Data Management? The short answer is 'yes', even although one might be forgiven for thinking the new provisions don't change much.

This article (soon to be followed by a new DCC guide on Legal aspects of RDM) identifies why 'RDM managers' need to be aware of the changes. I'm using that term as shorthand for anyone who is accountable or responsible for implementing services that support research data management in their university/HEI, whether or not they have 'RDM' in their job title. The article identifies steps they should take including keeping a watchful eye on the Government's transparency agenda, which the 'dataset provisions' are part of.

The FOI dataset provisions came into effect on 1 September 2013 and identify more explicitly where datasets fit within the scope of FOI legislation in England and Wales. The changes include new obligations on institutions to make datasets available in response to access requests. That means ensuring they are machine-readable and use open file formats as far as is practically feasible. It also means releasing them using one of several licences from the Government's licensing framework to ensure that reuse is permitted.

While much of this sounds like Open Science and is in tune with the Government's transparency agenda, which I'll return to below, there is a degree of fuzziness around how far the FOI provisions will apply more specifically to research data. The Information Commissioner's Office has issued general clarification in its guidance on the provisions. A revised 'definition document' shows show the types of information ICO would expect universities to publish. The latter notes that "public authorities must publish under their publication scheme any dataset they hold that has been requested, together with any updated versions, unless they are satisfied that it is not appropriate to do so". The ICO guidance clarifies what 'appropriate' means, distilling and clarifying some of the government Code of Practice for public authorities.

That guidance is full of don't-panic-buttons or get-out clauses (depending on your perspective). As it points out "...there is no new right to obtain information that was not previously accessible under FOIA; the changes are about providing the information in a re-usable form and making it available for re-use, if it is a dataset". The FOI definition of 'data' is and always has been limited to 'raw' and 'factual' data, i.e that which "is not the product of analysis or interpretation other than calculation". Just to complicate things, anything that researchers or their institutions count as research data that does not fit that description may still be be described as 'research information' and covered by FOI, but in that case it will be unaffected by the dataset provisions.

All the FOI exemptions previously available to institutions still apply, albeit qualified by the 'public interest test' that FOI access decisions are subject to. Moreover, for research data there should soon be a new exemption. To prevent the premature disclosure of research data the Intellectual Property Bill, currently proceeding through parliament, introduces into FOI legislation in England and Wales a new exemption for for continuing research that is intended for academic publication. This will bring the law in England and Wales into line with Scotland where such exemption already exists.

Any of the exemptions available offer a route to deciding that releasing a dataset would be 'inappropriate'. Institutions can also take into consideration other factors to decide whether it is 'reasonably practical' to convert a requested dataset into a reusable form. The legislation does not define 'reasonably practical', but the ICO guidance says relevant factors may include "...the time and cost of conversion, technical issues and the resources of the public authority". Practicalities and costs aside, the ICO guidance gives a further steer towards focusing on data that produces "management information, or information that the public authority itself needs in order to provide services and carry out functions. This would be consistent with the Open Data policy aim of providing greater transparency about the work of public authorities."

Despite these caveats, institutions will need to take a measured approach to the changes. The ICO 'definition document' is worth quoting on the subject of research data:

"In line with the overall direction of travel towards greater transparency, we expect HEIs to progressively publish information on publicly funded research, or to provide a direct link to it. Where appropriate we recommend HEIs ask researchers to follow the Research Councils UK’s Policy and Guidance on Access to Research Outputs. In future the “Gateway to Research”, under development by Research Councils UK, will open up access to Research Council funded research information and related data outputs. It is hoped that this will be available by 2014. The ICO will keep the position under review."

Is ICO hinting that it may lend its statutory weight to monitoring whether institutions play their part in meeting RCUK policy expectations? Probably it is just taking a hands-off approach as there will be further policy action in this area. The Government response to the Shakespeare Review made it clear that its moves to open up public sector information and data will continue, and are driven by the value open data should yield for research as well as economic development opportunities. Government support is planned in "a focused programme of investment to build skill-sets in basic data science through our academic institutions" according to the response. Meanwhile, the Research Sector Transparency Board continues work to "develop a policy agenda around access to research data" - and the G8 Open Data Charter signals that policy development will not be limited to the UK.

It seems likely that HEIs will get further carrots in the form of support for data science, as well as sticks in the form of more regulatory 'nudges' to make research data and information accessible. Even if these do not happen immediately, RDM managers should be recognising and supporting the action their FOI colleagues need to take, and working out what else needs done. According to ICO Head of Policy Development Steve Wood, writing on the ICO blog "In the same way that the Act contains the underlying principle of an 'assumption in favour of disclosure', it’s important to now adopt an approach of “open data by default”. 

So what does that mean in practice? The ICO blog article lists key first steps listed below. I have changed the order slightly for convenience, and underneath each added suggestions on how RDM managers might address the points.

1. Start to think about the definition of dataset: what information or categories of information do you have that fits the definition?

  • RDM steering groups will want to ensure any institutional RDM policy gives a sufficient steer on what kinds of data are within its scope, and how that differs from the FOI dataset definition.

2. Make sure you know who owns the intellectual property rights (IPR) in your datasets?

3. Promote the key principles of open data in your organisation: use an open format and open licences by default and only deviate from this when you have good reasons to do so.

  • Remind yourself of existing FOI advice on research data e.g. the basics in the DCC Freedom of Information FAQ. and Jisc Legal's more detailed Q&A guidance written by Chris Rusbridge and Andrew Charlesworth. These do not yet cover the new dataset provisions but clarify the available exemptions that are unchanged by it.
  • Ensure planning for RDM guidance and awareness training will include making researchers - or staff who support them- aware of the FOI dataset provisions, and prepares them for making data 'open by default' in licensing terms.
  • Ensure RDM policy guidance clearly identifies who owns the IPR in datasets and related research information, e.g. by linking up any relevant advice already given under Freedom of Information, Copyright and IPR or other headings.
  • Ensure the institution gives coherent guidance supporting researchers to choose an 'open by default' path to innovation by providing accurate information on the pros and cons of open licences and identifying circumstances where it may not be appropriate to use them.  This could take account of potential indirect research and commercialisation benefits from making it openly accessible, as well as any direct benefits arising by selling licences to it. Guidance might give a steer on where in general terms the institution sees the balance and/or more concrete guidance for researchers to apply to specific scenarios.
  • Ensure researchers are sufficienty informed and supported to judge when the data they are responsible for is too ethically sensitive to be openly released. The ICO blog notes that the new provisions "may encourage new requests for datasets that contain personal data" and point to their code of practice - anonymisation: managing data protection risk.
  • Ensure that when implementing any data repository capabilities, whether in-house or outsourced, you offer researchers and repository managers the functionality to make open data release easy when it is appropriate and (conversely) secure archiving easy when that is appropriate, and that in either case the support for licensing matches both the institution's needs and the compliance requirements. A 'one size fits all' platform may not be possible!

4. Charging for re-use is not encouraged but can be justified in some situations: do you have existing powers that allow you to charge? If you charge for re-use under the new regulations, can you justify the cost recovery and return on investment?

  • The case for charging is likely to be more difficult to make for research data than other kinds, except perhaps where there are commercialisation opportunities that would warrant licensing - but exemptions are already available for that.

5. Familiarise yourself with the licencing framework and the new version 2.0 of the OGL. FOI officers may need to learn a little more about copyright.

and

6. In some organisations open data is not part of the remit of the FOI officer. It’s crucial to make sure these two functions have an understanding that they need to work together.

  • RDM steering groups and working groups provide great opportunities to address these points, as they should already be consulting staff with expertise on FOI and IPR!
  • More specifically, FOI officers may need support on the range of open licensing options for data. The DCC guide How to Licence Research Data should help here.

7. Consider what datasets you can make available for re-use proactively in your publication scheme

  • Take stock of what research information and research data you may include in your FOI publication scheme that is not covered by the existing exemptions.  How will your records management function deal with any requests for the raw data that may be needed to scrutinise research that has been completed, and has already been published?
  • Again review your policy guidance- if data is not in a repository but on a departmental server, or a usb drive under someone's desk, who can decide whether it is in a fit state to release and then make openly available?  Will researchers and support staff be equipped to make sound decisions on whether access is 'reasonably practical'?
  • Consider how your research information system, data catalogue or repository could help to fulfill the institution's FOI obligations. The ICO definition document quoted above highlights the forthcoming RCUK Gateway to Research, which will provide an outlet for research information and data on funded projects. The Jisc's Research Data Registry, a service pilot operated by the DCC, also offers a route for proactive data publication with minimal institutional overhead.

The DCC helps universities become more aware of the extent of their data holdings, for example using the Data Asset Framework. Even so, many institutions have scarcely scratched the surface!

To sum up, the FOI dataset provisions are likely to 'up the ante' for Higher Education Institutions to use open licenses by default, and are in line with a broader transparency agenda that RDM managers need to be aware of. They provide a statutory spur towards more openness around research data, and any institution that is not actively preparing for that is putting itself at risk, risk that it at least needs to assess.