Jisc Research Data Network event, Cardiff, 18 May 2016

24 May, 2016

Kicking off the event, Jisc Deputy Chief Innovation Officer, Rachel Bruce, gave an overview of the intentions behind this new series of workshops, and explained how they are expected to link up with existing and planned Jisc-supported work in the RDM space.

Next up, Neil Penry, RDIM programme manager at Cardiff University, welcomed us to Cardiff and outlined the university’s wider RDM infrastructure, alongside the rationale behind its composition, their choice of technologies (including Thomson-Reuters’ Converis data storage platform), and their plans for taking this forward. Following Neil’s talk, Rachel asked a few questions about drivers. In common with many institutions, Neil replied that Cardiff’s foremost initial driver was the EPSRC requirements, but that readiness for future research excellence assessment exercises came a close second

Jisc’s Head of Change – Research, Catherine Grout, then spoke about the proposed Research Data Shared Service, including the ‘why’ behind it, and the ‘must-have’ features for it to be accepted by institutions and researchers, high amongst which were ease-of-use and being a good fit with existing workflows and systems. She spoke of the ideal endpoint as being “visible data, invisible infrastructure”. Interoperability and integration, including with commonly used CRIS systems, is therefore a key focus, as is compatibility with the requirements of the major UK funders. Catherine gave an update on the current status of the service, namely that the procurement/tendering processes are complete and the chosen suppliers are now in place. Numerous external consultants have been engaged to help support various aspects of the work.

Ending the first of the day’s sessions, Marta Teperek of the University of Cambridge spoke briefly about Cambridge’s motivations for getting involved with the Jisc Shared Service, citing preservation, scale and sensitive data as three thorny issues which they are tackling, and towards which they hope the shared service will contribute a solution.

Following coffee, we moved into one of three parallel sessions. In the first of the two that I attended, Sarah Middle, Repository Manager at Cambridge, asked “Should embargo conditions be applied to metadata?” Sarah described four different types of embargo and outlined the key differences between publication and dataset embargoes. Sarah reported confusion from researchers about the difference between releasing a dataset and publishing its descriptive metadata. Some researchers worry that releasing metadata before publication could have negative consequences. Sarah’s slides included a couple of very good quotes from concerned researchers, which indicated a continuing need for advocacy, awareness raising and training. Around half of Cambridge researchers ask for their data to be embargoed until the accompanying article is published. This is especially prevalent in the life sciences. The publishers have little readily-accessible advice about this, but some have responded to direct questions positively, and Sarah shared as an example the response from Nature Publishing Group. Cell Press, however, see prior data publication as problematic, fearing that this may dilute press attention. Science also take a cautious approach, but may be open to revising this to help with things like REF/HEFCE policy. Cambridge’s current situation is to provide researchers with tailored advice on on specific publishers’ policies, but they have in place a workflow capable of placing metadata records in a dark archive until publication in cases where either the publisher or researcher are concerned about negative impact. However, Sarah noted that this is contrary to the Open Data Principles, and does add an additional, manual, step to the repository workflow. There is also a risk that the paper could be published before the dataset metadata record is made live, which may result in broken links until situation is rectified. Possible solutions Sarah proposed include:

  • Better communications with researchers;
  • Resolution between wants of publishers and institutions;
  • Minimal metadata prior to publication;
  • The Open Scholarship Initiative embargo research project;
  • Collating case studies – Q. is there a role for Jisc here?

Next up was David Kernohan of Jisc, providing an “Update on Journal research data policy project.” This work follows on from a previous investigation, JORD, which looked at the viability of a registry service for journal data policies. In future this may complement the info available via SHERPA services. David and his colleagues convened an expert group, including international representatives, to support the project. Linda Naughton and David’s article in UKSG Insights (2016) highlights some of the difficulties inherent in the work, including subjectivity in understanding policies and the need for ‘hermeneutics’. As with funder policies, this will likely remain the case until such times as we have machine-readable / licence-based policies, covering agreed areas with a shared ontology or taxonomy of terms. There’s also a need to translate terms across academic domains: the language used is quite different in Engineering and History of Art, to say nothing of the word ‘ontology’ itself which has quite different meanings in computing science and philosophy! The project’s shorter-term aim is to develop templates and guidance for journal publishers to help them streamline their RDM policies. It was noted that the PLOS policy could be a good starting point.

After lunch, Charlie Dormer, User Researcher at RCUK (although currently on secondment to BIS) spoke about the new RCUK grant submission system, which will succeed the much-loved Je-S. A basic (“stripped-down”) version is due by March 2017, its purpose “to create digital services that support the entire grant funding process, from idea generation to impact reporting, that enables the best possible funding of research excellence.” Charlie emphasised the core goals of cost effectiveness, flexibility, and interoperability, which she acknowledged aren’t always the first words that spring to mind when thinking about Je-S! The system will be developed using an agile approach, meaning gradual introduction of service functions. Key themes are: User Research, Service Design and Iterative Prototypes. Charlie showed some behind-the-scenes screenshots of the current back-end, which is a heavily customised Siebel database, followed by a series of wireframe mock-ups for the proposed new system, and also a list of requirements derived from a recent Interoperability Workshop. She noted that the new system will not replace Researchfish, but that there will be a bidirectional interface to link the two systems.” A couple of questions were asked about how DMPonline might fit in with the new system. The preference is for it to be built in to the form, as opposed to remaining as an Appendix.

We then split into the second group of parallel sessions. I went to see my former HATII colleague Kellie Snow, now with the University of Bristol, present a “Case study on managing sensitive data”. Kellie outlined how the first version of Bristol’s data repository was for open data only, but it very quickly became clear that different levels of access would be required for different use cases. A range of initial questions were brainstormed (including “what different levels of access might be required?”, echoing the different types of embargos outlined in the morning parallel session by Sarah Middle from Cambridge – they went with four Data Access Levels – Open, Restricted, Controlled and Closed (see photo, above, for descriptions) – and there followed a number of meetings and discussions over a period of several months. The upshot of it all is that a Data Access Committee is soon to be constituted within the university’s committee structure. Data Access Levels are assigned by the researchers at the data deposit stage, and this is subsequently verified by the Research Data Service. Kellie ended with a list of future plans in this area, including the development of further procedures around commercial data (e.g. dealing with IP and contract issues), and identifying boundaries and shared responsibilities around clinical data, especially research co-funded by the NHS.

After more coffee we had the last of the day’s parallel sessions. Graham Hay of Cambridge Econometrics spoke about “Methods/approaches for measuring the costs and benefits of RDM”. Problems Graham cited included the lack of clear economic evidence to support the case for sustained investment in RDM, a few niche reports notwithstanding. The approach to costing is bespoke and fragmented throughout the sector, and there is also a lack of clarity on how to model costs and benefits. Graham’s project sought to develop an overarching framework, and his presentation guided us through some of the economic aspects of RDM, including an activity-cost structure. The team are still interested in feedback on the model, although I have to admit it was a little too complex (and text-heavy) for us to tackle fruitfully at this event. The consensus in the room was that it was difficult to see how it might be utilised in its current form by a non-specialist. Some detail-masking / black-boxing required!

A very useful and enjoyable event. The next meeting is scheduled to take place in Cambridge in the Autumn.