IDCC13: The Power of Intelligence

21 January, 2013

The second of the 3 themes at this year’s IDCC was Intelligence, explored in a two-talk session chaired by Clifford Lynch, Executive Director of the Coalition for Networked Information (CNI). While intelligence and innovation are beginning to drive the data science agenda, from data mining and visualisation to open data and business intelligence, the two talks showed us that it is currently the coordination behind these efforts that remains lacking.

Digital collections: new challenges for curation and new opportunities for data driven scholarship

Adam FarquarAdam Farquhar, Head of Digital Library Technology at the British Library, spoke first about the British Library’s digital collection initiatives. Farquhar argued that if digital content and digital data are viewed as ubiquitous then information held in libraries is data. Real intelligence requires a transition from considering individual items to reflecting on collections of data. Data collections offer many curation challenges e.g. access to and preservation of these collections, but they also offer data opportunities, e.g. visualisation. However to recognise these opportunities we stop treating digital like we treat print content.

The British Library has a growing level of content, much of which has been created by large-scale book digitisation. They have digitised over 40 million pages in 10 years. In 2012 they received ‘Highly Commended’ in the category of Big Data Project of the year for the British Newspaper Archive created through a partnership with Brightsolid. Unfortunately some of their digitisation projects with Google and Microsoft continue to entail access restrictions.

However Adam Farquhar believes the British Library are on the cusp of change. He distinguished between providing page views of digitised newspapers and 'providing access to the newspaper as a dataset'. He also appealed for us to look at the UK web archive, and other online assemblies, as a ‘data collections’.

The British Library have recently embarked on an ‘Opening up Speech Archives’ project, which is looking at the potential of speech-to-text technologies for research and has been sharing data sets. They have also been creating bibliographic data in an open data format allowing researchers to download copies and interlink with it. Other work includes enhancement of collections including geo-referencing historic maps using crowdsourcing and visualisation activities.

The British Library also now offers a training programme in digital scholarship which incorporates ideas from Standford and Oxford universities.

Farquhar ended with a quote from Franco Moretti of Stanford University: “Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.”

Intelligence, Insight, and the role of Scale; data stories from the business world

Paul MillerBig data is everywhere, but then how much has actually changed? In his talk Paul Miller from Cloud of Data began by pointing out that ‘data speaks’, a point illustrated with some interesting historical data stories. In the 1854 Cholera outbreak scientist had believed that the disease was caused by bad air. Physician John Snow was skeptical about this root and had ‘gone back to first principles’. He drew a map and plotted the deaths on it, the map also indicated where 12 water pumps were, and so the real cause was established. Keeping with the disease thread Miller showed us how data can easily ‘predict’ activity: Google Flu trends shows the severity of flu outbreak all over the world using searched. Unfortunately for us Amsterdam looked to be the flu hotspot of the Netherlands!

We were then asked to consider if size really does matter. The influx of data is described using emotive language and imagery: the data deluge, the data flood, the data tsunami. It’s as if “everyone wants to be a data scientist”. We often hear about the 3 Vs of big data: volume, velocity and variety, more recently there has been discussion about the 4th possible V: value. There is an implicit presumption that bigger is better, but this is not always the case, sometimes bigger just means the value is even more hidden than before.

Personally Identifiable Information is another trend on the horizon. Miller suggested that we consider our own personal information and that huge opportunity lies in our own connections. Individuals should consider getting their own personal data locker containing data about interactions with government, banks, businesses and more.  Thought should also be put in to where it is stored, for example in the Mastodon C dashboard you can select green data centers over not so green data centers.

Miller concluded that data is incredibly powerful, and more and more of it is becoming freely and openly available for our use however we need skills and tools to extract value. One challenge for the future is how will we protect individuals and create the market conditions for new businesses to emerge.

Slides

More about

IDCC13