Because good research needs good data

IDCC13 Preview: Francine Bennett

The 8th International Digital Curation Conference is just around the corner and we are anticipating great discussions about data science when our international audience gather in Amsterdam in January 2013. In the eighth of our series of preview posts, Francine Bennett from Mastodon C, gives us h...

Magdalena Getler | 08 January 2013

The 8th International Digital Curation Conference is just around the corner and we are anticipating great discussions about data science when our international audience gather in Amsterdam in January 2013.

In the eighth of our series of preview posts, Francine Bennett from Mastodon C, gives us her insights into some of the current issues... 

Your presentation will focus on data science. Are there any specific messages would you like people to take away from your talk?

It’s a fast-evolving area, so I suspect all of us panellists will have a slightly different view on what a data scientist actually is. But the thing that I would emphasise as important in a data scientist, which is quite unusual and hard to find, is the combination of a flexible and inquiring mind with hard analytical and technical skills. In fact I would prefer to hire a data scientist who is great at asking good questions and has a fairly shallow statistical training, than the opposite. Hard skills can always be taught or learned from textbooks if someone has the right aptitude, and it’s easy to disappear down an analytical rabbit hole if you have a big statistical toolkit but a lack of thoughtfulness about how and where to apply it. 

We address three areas in our call this year - Infrastructure, Intelligence and Innovation. What do you see as the most pressing challenges across these?

My view is from the business world, and there I see a lot of rapid progress across all of those areas. Infrastructure challenges are quickly being overcome by a quickly developing toolbox of both open and proprietary options, including services like Mastodon C’s which aim to manage and make easy-to-use the latest big data software stacks. Intelligence is the biggest open challenge - intelligence in making good infrastructure decisions, but also intelligence in what to do with data and infrastructure once you have it. Data is worth almost nothing in its raw form, it only acquires real value once transformed into intelligence and insight, and the process of doing that is nuanced, hard to systematise, and requires expertise to do well. 

And in terms of opportunities, do you see potential in data science as a new discipline?

I don’t think it’s something entirely new, but it’s a very useful name for something that’s been evolving for a while, and for focussing minds on the set of skills and tools that are needed to get good at it. In one way or another, I’ve been a ‘data scientist’ working with ‘big data’ for most of my career, but now that those labels exist it is much easier to communicate what I do, why it is useful, and to find all the interesting people in the same field.

The conference theme recognises that the term ‘data’ can be applied to all manner of content. Do you also apply such a broad definition or are you less convinced that all data are equal?

I'd definitely apply a broad definition. I’m a terrible data hoarder. My company runs a Hadoop platform and big data analytics for clients, and one of the major results of these newer open source data technologies which deal well with large or messy datasets, the plummeting price of compute time, and increased emphasis on machine learning models, is that all data can now become usable and interesting in the right context. 

I and I think many others have the attitude nowadays that it’s worth throwing all the possibly relevant data you have into the pot when trying to solve a problem or achieve insight. As long as you understand the source and features of what you’re using, and handle it appropriately (for example, understanding where content can be inaccurate or incomplete, and building models which deal gracefully with that possibility) it would be a waste to reject any potentially useful raw material. 

You’ll undoubtedly have looked at the programme in preparation for IDCC. Which speakers / sessions are you most looking forward to?

I’m looking forward to sitting in on a few of the more academic sessions and seeing what I can learn, as I mostly work in the business world and don’t get much contact with academia. I’m also looking forward to Paul Miller’s talk on stories of Intelligence, as he’s always plugged in to what’s most interesting and I’m sure he’ll have something good to share.
         


Francine's presentation is on Day 1 of the conference, 15 January. Programme is available.

If you have not already done so, you can still book your place

Please share your attendance at IDCC13 via Lanyrd