Because good research needs good data

Open Data... Open Season?

Chris Rusbridge | 16 July 2007

Peter Murray Rust is an enthusiastic advocate of Open Data (the discussion runs right through his blog, this link is just to one of his articles that is close to the subject). I understand him to want to make science data openly accessible for scientific access and re-use. It sounds a pretty good thing! Are there significant downsides?Mags McGinley recently posted in the DCC Blawg about the report "Building the Infrastructure for Data Access and Reuse in Collaborative Research" from the Australian OAK Law project. This report includes a substantial section (Chapter 4) on Current Practices and Attitudes to Data Sharing, which includes 31 examples, many from the genomics and related areas. Peter MR wants a very strong definition of Open Access (defined by Peter Suber as BBB, for Budapest, Bethesda and Berlin, which effectively requires no restrictions on reuse, even commercially). Although licences were often not clear, what could be inferred in these 31 cases generally would probably not fit the BBB definition.However, buried in the middle of the report is a cautionary tale. Towards the end of chapter 4, there is a section on risks of open data in relation to patents, following on from experiences in the Human Genome and related projects.

"Claire Driscoll of the NIH describes the dilemma as follows:It would be theoretically possible for an unscrupulous company or entity to add on a trivial amount of information to the published…data and then attempt to secure ‘parasitic’ patent claims such that all others would be prohibited from using the original public data."

(The reference given is Claire T Driscoll, ‘NIH data and resource sharing, data release and intellectual property policies for genomics community resource projects’ Expert Opin. Ther. Patents (2005) 15(1), 4)The report goes on:

"Consequently, subsequent research projects relied on licensing methods in an attempt to restrict the development of intellectual property in downstream discoveries based on the disclosed data, rather than simply releasing the data into the public domain."

They then discuss the HapMap (International Haplotype) project, which attempted to make data available while restricting the possibilities for parasitic patenting.

"Individual genotypes were made available on the HapMap website, but anyone seeking to use the research data was first required to register via the website and enter into a click-wrap licence for the use of the data. The licence entered into, the International HapMap Project Public Access Licence, was explicitly modeled on the General Public Licence (GPL) used by open source software developers. A central term of the licence related to patents. It allowed users of the HapMap data to file patent applications on associations they uncovered between particular SNP data and disease or disease susceptibility, but the patent had to allow further use of the HapMap data. The licence specifically prohibited licensees from combining the HapMap data with their own in order to seek product patents..."

Checking HapMap, the Project's Data Release Policy describes the process, but the link to the Click-Wrap agreement says that the data is now open. See also the NIH press release). There were obvious problems, in that the data could not be incorporated into more open databases. The turning point for them seems to be:

"...advances led the consortium to conclude that the patterns of human genetic variation can readily be determined clearly enough from the primary genotype data to constitute prior art. Thus, in the view of the consortium, derivation of haplotypes and 'haplotype tag SNPs' from HapMap data should be considered obvious and thus not patentable. Therefore, the original reasons for imposing the licensing requirement no longer exist and the requirement can be dropped."

So, they don't say the threat does not exist from all such open data releases, but that it was mitigated in this case.Are there other examples of these kinds of restrictions being imposed? Or of problems ensuing because they have not been imposed, and the data left open? (Note, I'm not at all advocating closed access!)