Because good research needs good data

IDCC11 Preview: An interview with Mark Hahnel

In the third of our preview posts, Mark Hahnel from FigShare, Digital Science, gives us his perspectives on the issues we hope to address during the rapidly approaching 7th International Digital Curation Conference...

Kirsty Pitkin | 24 November 2011

In the third of our preview posts, Mark Hahnel from FigShare, Digital Science, gives us his perspectives on the issues we hope to address during the rapidly approaching 7th International Digital Curation Conference...

You will be discussing the issues associated with giving researchers credit for their research data in your presentation at IDCC11. Are there any specific messages you hope people will take away from your talk?

In terms of open research data, the landscape is changing, but we all have a role to play if we want to speed up the cultural change. The progression of science is more efficient if all of the research outputs are available to the public for reuse and scrutiny.

Many people claim to have a stake in this research when it is published but few are keen to take responsibility for the large amount of research outputs that are not published and therefore maintained, but are reusable. The global scientific community has a responsibility to improve the efficiency of scholarly communications.

Lots of people argue for open data but far fewer practice what they preach. What do you think is needed to encourage more data sharing?

‘Credit’. Researchers need to get credit for making all of their research objects available. This credit can be defined as career/profile enhancing rewards. To date, these career enhancing rewards are limited to the ‘impact factor’. This is a great form of credit, but it is not the only one and now we can offer more forms of credit for more research outputs, why aren’t we?

Case studies can help. If you, or your repository has a success story of some description then it should be shared. We need to be sharing these stories, building up a catalogue of real life examples where researchers have actually received this ‘credit’.

How do you think we can encourage 'unexpected' reuse of data - that is, use by communities other than those who originally collected the data?

I feel there are two things we can be doing better here: improve discoverability of research objects, and better linking of data based on user descriptions, tagging etc.

By linking cross-disciplinary data and offering browsing researchers ‘similar research’ based on metadata, as sites like YouTube do, new relationships and collaborations can be forged. These are unlikely to come about if we continue digesting and filtering scientific research as we do now.

PDF is a problem. It is nigh on impossible to search for specific figures on google when they are locked away inside PDFs which are not machine readable. This means that all of the content in a traditionally published scientific paper is largely un-discoverable through the one sentence summary that is the paper title.

This can be improved on. Google scholar is helping, but there is so much more that can be done to make research outputs more discoverable and thus help researchers find exactly what they are looking for.

We usually see funders, data creators, universities and data users as the typical set of stakeholders for data. Would you add any to that list?

Government.

PubMed Central (PMC) is a free archive of biomedical and life sciences journal literature at the U.S. In keeping with National Library of Medicine (NLM)’s legislative mandate to collect and preserve the biomedical literature, PMC serves as a digital counterpart to NLM’s extensive print journal collection.

This legislative mandate infers that biomedical literature is the sole scholarly output of government funded research. This is not true. If your research is public funded, the public should have access where appropriate.

Which stakeholders do you think can do the most to promote a culture of wider reuse of data?

The reuse is not the problem, the access is the problem. If it is available online, researchers will find it and will reuse it. All can and will play a role. However, I feel action by the funders is the most efficient way to make this happen.

We need the content creators to forcibly change the way in which they manage their research outputs. In order to be a successful researcher, you must jump through several criteria hoops in order to secure funding.

The funders decide what these criteria should be and to date, the vast majority do not require good data management practices that can in turn promote reuse.

Which research data management projects do you think will be the ones to watch?

Other than FigShare, there are a few interesting projects that have sprung up in the last six months out of the ‘altmetrics’ movement. Notably, altmetric.com and total-impact.org. Researchers want to see exactly what people are saying about their data and where. These tools show that this data is trackable using existing technology. API’s for research metrics and even individual datasets will allow greater use by a greater range of people.

A large number of researchers, including myself, are forced to become egomaniacs when it comes to research, self promotion and career advancement. The current funding model requires that researchers ‘sell’ the quality and significance of their research, by publishing in select journals solely because of their impact factor. By measuring the impact in several ways, researchers can boast about the reach of their work in a more detailed manner. This is also good for funders, who can assess how successful each grant has been based on a more detailed analysis.

If there was one change that you could make to improve research data management practice, what would it be?

Correct the uneducated in one fell-swoop. I have had many conversations with researchers who suggest that they would make their data available but their funder or institution forbids them from doing this. Upon further investigation, I have found that every single individual was misinformed or making incorrect assumptions, and in every single example they were wrong.

If I could make one change, it would be to force researchers to make all of their research outputs available in a timely manner, where appropriate. On a more realistic level, I would like funders, institutions and PIs to make researchers aware of their rights and responsibilities when it comes to data management, ideally on day one of a PhD, so this confusion never gets a chance to set in.

You may be also interested in previous interviews from this series with Ewan McIntoshDavid Lynn and Victoria Stodden.

 

 


Mark will be presenting a session entitled 'Give researchers credit for all of their research' as part of the research perspective strand of talks on Tuesday 6 December.  You can still book your place at the 7th International Digital Curation Conference

If you are unable to attend in person, look out for an announcement next week about how you can take part remotely, or track the conference via Lanyrd to be notified about the arrangements.