Business intelligence for research data curation?

15 July, 2010

Gaining business intelligence from user activity data was the topic of a JISC workshop I attended this week – and a hot topic if the activity of JISC programme managers is anything to go by. I counted five, plus a large contingent of JISC service people, and of course Deputy Chair Professor David Baker who chaired this event. 

The ‘business intelligence’ on the agenda was wide ranging; from Amazon-style recommendations based on other users’ online behaviour; to the potential for hard-pressed senior managers to make better decisions on what services and resources to select or dispose of by mining anonymised user activity data extracted from library systems and VLEs. The workshop debated whether data of this sort could be made open in the interests of transparency, or at least aggregated within and between institutions; and how this might help ‘segment’ provision to more cost-effectively serve user needs, whether clustered by subject or types of institution, and with or without the involvement of commercial players.

There are some parallels here with the possibilities of ‘community curation’ to find new approaches to valuing datasets - approaches that rely less on the cost and expense of an expert committee. However the focus of the workshop was elsewhere.  The ‘users’ in question here were primarily students, and discussion was mainly around their book borrowing activities. The workshop showcased results from several JISC projects in this area; EIE (Extending the Information Environment), and MOSAIC (Making our Shared Activity Count).

University of Huddersfield library was the main exemplar of this work, and their Dave Pattern presented impressive-looking graphs indicating correlations between student performance and library resource usage. Both are apparently on an upward trend following introduction to the library catalogue of recommendations based on other users’ activity, of the “students who borrowed this title also borrowed…” sort. There was talk of exploiting further links across institutional silos to ‘improve the user experience’. There were also cautionary notes from Naomi Korn on the need to manage legal and ethical risks, and pointers to a new report on “handling personalisation data lawfully” to be made available soon by JISC Legal. Discussion towards the end of the day called for JISC to marshall further evidence of the value the HE sector can expect to gain from mining user activity data.

A key issue from the point of view of research usage, as the OUP’s Richard Gedye pointed out, is that with the notable exception of IR’s, much ‘article-level’ activity data belongs to publishers. And yet institutions collect Shibboleth activity data that should reveal useful patterns about the users behind the clicks.  JISC is funding the RAPTOR project which looks promising in this area. It is to deliver a reporting tool that will extract from Shibboleth and EZ-proxy data on “usage of e-resources broken down by e-resource and by department” – a potential gold mine for tracking the uptake of datasets linked to subscription journals (to compare with uptake from open access IRs?).

The implications for data curation are worth speculating on. Initiatives like the Dryad project are finding new ways to sustain post-publication data repositories through links to publishers’ journal articles. JISC is prompting the development of tools and standards for gathering ‘article level’ activity metrics, for example in the PIRUS2 project and through the Mimas Journals Usage Statistics Portal. Ideally we might want to track how 'within article' activity relates to  usage of supplementary datasets and, even better, find indications of perceived usefulness. Online journal usage metrics are on the cards as an element in the Research Excellence Framework and if such metrics are an acceptable way to measure current scientific value they may be a route to giving credit where it's due for datasets- and to guide curators'  decisions on which of these to keep, enhance or dispose of.