Because good research needs good data

Beyond Impact, and the Catch-22 in Data Reuse

A Whyte | 11 May 2011

Take an impressive line-up of thirty odd specialists in research assessment and open data, put them in a large open meeting space for a couple of days, add some inspiring facilitation with regular refreshments and what do you get? Apart from a roomful of people wanting to carry on the conversation, the ‘scoping’ part of the Beyond Impact workshop produced a fine line in actionable items. This Open Society Initiative funded project aims to bring together funders, developers, and service providers to develop a community “focussed on providing the tools and policies to enable more effective research assessment that serves the needs of researchers, funders, institutions, and business.”

The project has already gone far beyond a mere nod to open principles, having sprung out of Cameron Neylon’s open invitation in Friendfeed to help write the proposal, which was drafted openly in Google Docs. This workshop was an initial f2f meeting to frame an agenda, followed by a ‘development sprint’ to demo some of the ideas discussed. Over the first two days, in successive group and plenary sessions we whittled down a list of problems to fit the opportunities there in the tools and standards for research information – both the administrative kind and the research-artefact kind.

Challenges for institutions this workshop discussed included the need for more harmonisation of the indicators, something that the RCUK outcome project has been leading on recently. The overheads of recording outcomes and impacts are painful. Integration of the administrative system and interoperability with HEFCE and the RCUK outcome recording systems will be key to getting efficiency. The CERIF standard for CRIS cropped up often in this discussion, thanks in part to the presence of its creator Keith Jeffery.

For researchers, the big issue as ever was getting credit for data and software. The mechanisms for citation are key to this of course. The availability of persistent ID services e.g. from Datacite lets data repository initiatives like Dryad and Sagecite give researchers a ‘pathway to impact’ for datasets relating to their articles. Marrying up identifiers for outputs with identifiers for researchers from ORCID, and grant numbers, brings a gleam to the eye of open data hackers and funders alike, offering more effective ways to harvest data for impact profiles. To my mind the conversation got a little over-intrigued by the ‘baseball card’ analogy, the possibilities for researcher scorecards populated by mining links from articles to the breadcrumb trails from tweeting, blogging, visiting and downloading anything related to them. Filthy lucre there may be in that stuff, but innovation is a product of networks not individuals, so we need a bigger picture. And how representative can a picture based on online activity be when studies repeatedly show the wariness researchers have for spending time on activity that doesn’t count to the REF?

So while researchers’ curated datasets, standards work, and other contributions made to data infrastructure do not feature highly in REF submissions, greater awareness of indicators and guidance on how to ‘make the grade’ might instil greater confidence in how this work will be judged, and get beyond the catch-22. A white paper along these lines is one of the action points that might make it into the workshop report, and one I hope DCC might contribute to.

I would have found worthwhile a more concerted effort on listing some of the things that funders and others mean by ‘impact’. The discussion sidestepped this, I think to avoid getting bogged down in philosophical differences. Instead the workshop began with a ‘vote with your feet’ kind of likert-scale questionnaire. Cautioning that he’d borrowed the technique from a Californian, Cameron invited all present to stand along a line across the room, according to how far we agreed with statements like ‘most of the tractable problems are technical not social’. Then we were calle don to explain ourselves, and talk to our neighbours. I was thankful not to be the only one standing in the middle of the room bemoaning that technology is social or, as Bruno Latour once said, ‘technology is society made durable’.

As you might expect from this workshop the proceedings were drafted online as they happened and a report on the actions arising will soon be available.

Overall there was a keenly felt need for "above campus" shared solutions and services, for communities of best practices in areas where universities don't compete. If Impact is the ‘demand’ end of the research process, and data management the ‘supply side’, it will be critical for DCC to help get services for data management, curation and publishing talking to those for recording outputs, outcomes and impacts. So I expect we’ll continue to find common purpose with the Beyond Impact project’s effort towards that.