Reflections on IDCC 11 day one

7 December, 2011
Kirsty Pitkin

Clifford Lynch, CNI, began by noting how interesting it was to hear in the DCC Symposium how much the health system frames the debate about personal genomic data. He observed that this would have been a very different discussion in the US, where one of the big issues about the direct-to-consumer genomics companies in the US is that you can obtain data from outside of the medical establishment, so it cannot influence your records for insurance. This is still very much an open issue in the US.

One connection Lynch felt needed to be made more strongly was the question of interpretation. We currently have a system where you can get some data and then you can get an interpretation of that data separately, which is a bit more debatable. That says that people who want to deal seriously in the area of personal genomics have to rely on access to the literature, which makes a compelling connection between the movements to make medical data available to the public and the concurrent movement to open up the literature to the public so they can interpret that data. Lynch observed that these trends will feed each other going forward.
 
As we move towards electronic medical records, Lynch noted that what happens to dead people's medical records is an increasingly interesting question, particularly as we move forward trying to build up international databases of research information. He emphasised that so much of the potential and so many of the issues associated with data curation and reuse, which we would originally have gone ahead and pursued in the name of furthering scholarship, are now bound up in public policy. He felt this was a strong theme that ran through all of the day's presentations.
 
Lynch focussed in on the comments made by David Lynn from the Wellcome Trust at the start of the day, observing that funders of research are taking on a more active role in the reusing and sharing of data. He predicted that we are likely to see funders architecting how data gets reused, which is a new and powerful role for them.  
 
With reference to Victoria Stodden's remarks about reproducible research, Lynch observed that he has been hearing about reproducibility from many sources, not just as a data concern but as a methodological concern. He believes that we are going to be hearing more about this in future. None of the issues are simple, especially when we start looking at large data on one side and code on the other. He noted that we have made very limited progress in writing correct code first time, relying on a process of ongoing testing and use to identify all of the bugs. We need some new thinking about failure rates and software architecture to address this problem.
 
He observed that there is a lot of talk about citing data in the same way as we cite research, and we need not just the technical means but also the social compact to do that. At the same time as we have tactical discussions about this issue there is also a strategic discussion going on about how best to demonstrate impact. Citation is a limited tool for demonstrating impact and we need to be looking at that problem with fresh eyes as we consider how best to measure the impact of open data and design recognition mechanisms for those sharing their datasets.
 
Lynch concluded by expressing his surprise that we did not hear more about citizen science activities, particularly given the context set by Ewan McIntosh in his opening keynote. This seemed to Lynch to be particularly relevant, given that one of the big themes of citizen is to make data open and reusable in ways that are accessible to very large parts of the population, including students of all ages.