IDCC13 Preview: Scott Edmunds

20 December, 2012

The 8th International Digital Curation Conference is just around the corner and we are anticipating great discussions about data science when our international audience gather in Amsterdam in January 2013.

In the seventh of our series of preview posts, Scott Edmunds from GigaScience/BGI HK, gives us his insights into some of the current issues... 

Your presentation will focus on publishing the outputs from data-heavy research projects. Are there any specific messages would you like people to take away from your talk?

We live in exciting times with many new developments surrounding stewardship and handling of data, and the release this year of a number novel platforms and schemes facilitating data release and allowing better mechanisms for tracking and crediting data reuse. I think the main message from my talk is that on top of just providing the means for storage, the next step that should follow is moving the compute to the data and building cool things on top of these precious data resources. With data-citation being used as a means to credit release of data, similar schemes should also be used to reward reproducibility by open release of code, methods and workflows.

We address three areas in our call this year - Infrastructure, Intelligence and Innovation. What do you see as the most pressing challenges across these?

Infrastructure is obviously a massive challenge, with data production in many areas now outpacing the ability to handle it, so I think intelligence and innovation are key to keep on top of this deluge. This will involve technological innovations in data compression and cloud computing, as well as improved visualization and data handling tools to aid data curation, interoperability and reuse.

And in terms of opportunities, do you see potential in data science as a new discipline?

What people classify as “data science” seems to be new discipline in the way it has come from the recent explosive growth of online electronic information, but as many traditional areas of research also become increasingly data driven the overlap with this type of work may eventually lead to it all to just be classified as “science”. Whatever this field is eventually termed as, it obviously has huge potential for growth and need for people with the right skills and literacies.

The conference theme recognises that the term ‘data’ can be applied to all manner of content. Do you also apply such a broad definition or are you less convinced that all data are equal?

From my perspective of working with biological data, the approach until now has followed the “all data is equal” and needs to be archived. The explosive growth of genomic sequencing data and logistical challenges keeping up with its production, means there are now difficult decisions that need to be made as to whether everything needs to be kept, and at what level and quality of data compression. With this in mind there may be a move (in biology anyway) towards a graded system ranking data quality and utility (see Ewan Birney’s recent paper in our journal discussing this topic).

You’ll undoubtedly have looked at the programme in preparation for IDCC. Which speakers / sessions are you most looking forward to?

I’m obviously most interested in the data publication session, but its great to see there are some talks from a more applied and researcher led perspectives, with a number of talks on the cloud computing and data applications. Ewan Birney’s keynote should be excellent, as the many innovations the ENCODE project developed to make their work reproducible and accessible were ground breaking and have really raised the bar for the future.


Scott's presentation is on Day 1 of the conference, 15 January. Programme is available.

If you have not already done so, you can still book your place

Please share your attendance at IDCC13 via Lanyrd