Because good research needs good data

IDCC13: ELIXIR - a bioinformatics infrastructure

Ewan Birney opened the 8th International Digital Curation Conference with a keynote presentation describing how changes in genetic research inspired a new bioinformatics infrastructure called ELIXIR.

Alex Ball | 15 January 2013

The data used in molecular biology has undergone a massive shift in scale over the last decade. While this opens up many new opportunities, it means that the existing research infrastructure needs an overhaul. In his opening keynote presentation at IDCC 13, Ewan Birney, Associate Director of the EMBL European Bioinformatics Institute (EMBL-EBI), described the new ELIXIR infrastructure and illustrated why it was needed with examples from his work in genomics.

The cost of mapping genomes has fallen steadily since the techniques were first invented, but between 2007 and 2010 the costs plunged dramatically. As a result, the volume of genomic data is currently doubling every 6-8 months. In fact it is now only an order of magnitude smaller than that of high energy physics data. This wealth of data is invaluable for understanding the relationships between genes and exhibited characteristics – predisposition to certain heart conditions, for example – but has many other uses. By comparing the genomes of healthy cells and cancer cells, highly specific treatments have been developed. Sequencing has also been used to identify pathogens and detect where they occur in an environment.

Ewan gave the example of the ENCODE (Encyclopedia Of DNA Elements) Project, which since 2003 has been working to discover the functions of various parts of the human genome. Over the years it has generated over 15 TB of raw data and consumed over 300 years of CPU time, and it is only a medium-sized project. It is estimated that the storage and processing power available for molecular biology will need to increase tenfold to keep pace with demand, and this is where ELIXIR comes in. ELIXIR is a pan-European network of computing resources for life sciences research, currently under construction. It is using a hub-and-nodes model, rather than being fully distributed or fully centralised, in order to get the best of both worlds with respect to sustainable funding, resiliance and concentration of resources. The hub will be based at the EMBL-EBI site in Cambridgeshire, UK, while the nodes will include both national centres and individual institutions.

Ewan concluded his talk with a novel proposal that, sadly, must remain within the conference's four walls for now. He presented it as a bit of fun, but if it pays off could prove revolutionary. Sorry to be such a tease about it, but if you don't already it's as good a reason as any to keep an eye on his blog or follow him on Twitter.