IDCC13 Preview: Kevin Ashley
The 8th International Digital Curation Conference is just around the corner and we are anticipating great discussions about data science when our international audience gather in Amsterdam in January 2013.
In the first of our series of preview posts, DCC's Director Kevin Ashley, gives us his insights into some of the current issues...
You are a conference co-chair. Are there any specific messages would you like people to take away from the conference?
I think the conference has multiple messages to offer to what is a diverse audience, professionally and geographically. Overall, I would like everyone to come away aware of the potential for reuse of the work that others are doing and the potential for collaboration. Whether it is software tools, training materials, methodologies or analyses, many of the talks describe things that others can use to deal with data curation issues in their own research group, institution or national setting.
There are already signs of worrying duplication of effort in the digital curation field and this is something we can't afford. Internationally we will struggle to command the resources to solve these problems once. We can't afford to solve some of them twice or three times and others not at all.
We address three areas in our call this year - Infrastructure, Intelligence and Innovation. What do you see as the most pressing challenges across these?
I would not want to single any one out. We have immediate and pressing challenges in the area of infrastructure, particularly because its effective use will be key to realising necessary efficiencies at a time when money may be harder to come by.
We don't lack for either intelligence or innovation in this field, but we need to work harder to coordinate and build on the innovation and to encourage greater adoption of some of the techniques described under the 'intelligence' heading in the call for papers.
And in terms of opportunities, do you see potential in data science as a new discipline?
I'm not at all convinced that it is a new discipline. I think people have been doing what we call data science for many years, albeit with different names and without the ease that increasing compute power & data collections offer. That doesn't mean that there isn't potential.
I think one goal that's within reach is to enable those who currently think of themselves as domain specialists who happen to deal with data to realise their potential as generalists who can apply their skills in data analysis & synthesis in many disciplines. It's an approach we've used in much of the training developed for the DCC's DC101 course, for instance.
Matters such as data quality can be taught in a generic way and only then does one need to consider how to apply them in specific research domains. Many data scientists learn their skills on the job in a way that can make them believe that their skills aren't transferable; they usually are.
The conference theme recognises that the term ‘data’ can be applied to all manner of content. Do you also apply such a broad definition or are you less convinced that all data are equal?
I have always been an advocate of the broadest possible definition of data. Even in the relatively constrained area of relational databases with fields and rows, I've always been at pains to point out that the cells don't just contain numbers or short text strings, but might contain audio, video, rich documents or a variety of other types of content.
This was important for the design of systems like NDAD, the initial service for preserving and providing access to UK government data for the Public Record Office which we were building in 1997. It was also important to communicate this to archivists and records managers who were making selection decisions about what would be preserved. By encouraging them to take a broad view of what 'structured data' was, we acquired material that might otherwise have been lost.
The NSF/JISC/NEH/NWO/ESRC/SSHRC/AHRC/IMLS-funded 'Digging Into Data' challenges have also done a great deal to encourage a broad view of what data can be. That doesn't mean that all data are equal. What you can do with a data collection depends a lot on the amount of structure it has and on many other properties of it. But there isn't a simple hierarchy of good and bad data; quality, and even the 'data-ness', of something is in the eye of the beholder. I can read a novel as a novel for its enjoyment. You can take the same text and use analytical techniques to make deductions about authorship and style. It's the same content, but it is only data to one of us.
You’ll undoubtedly have looked at the programme in preparation for IDCC. Which speakers / sessions are you most looking forward to?
As a chair, it would be unseemly for me to pick out particular submissions – we value them all! But I'm glad that we've got a greater percentage of talks from speakers across Europe this year, which was one of the reasons for holding the conference in Amsterdam.
We knew there was lots of innovative work taking place across the continent that wasn't getting the attention it should have done at IDCC in past years. I'll be catching as many of these talks as I can.
I'm also looking forward to two workshops on Monday and Thursday that I'll be personally involved in. The first is to promote awareness of and European participation in the fledgling Research Data Alliance, and the second will try to develop a common understanding of pricing (as opposed to costing) schemes for data repositories.
If you have not already done so, you can still book your place.
- Digital curation
- About us
- Briefing Papers
- Introduction to Curation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Persistent Identifiers
- Trust Through Self Assessment
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Common Directions in Research Data Policy
- 5 Steps to Research Data Readiness
- Citizen Science
- Making the Case for RDM
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides & Checklists
- Appraise & Select Research Data for Curation
- Cite Datasets and Link to Publications
- Develop RDM Services
- Develop a DMP
- Discover Requirements
- Five Steps to Decide What Data to Keep
- Five Things You Need to Know About RDM and the Law
- License Research Data
- Track Data Impact with Metrics
- Using RISE
- Where to keep research data
- Write a Lay Summary
- Developing RDM Services
- Reviewing research data platform capabilities at CISER
- Using EPrints to Build a Repository for UEL
- Assigning DOIs at Bristol
- DMPs in the Arts and Humanities
- Improving RDM at Monash
- Improving Research Visibility
- Increasing Participation in Training
- RDM Training for Librarians
- RDM strategy: moving from plans to action
- Storing and Sharing Data in Hull
- Curation Lifecycle Model
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Automated Metadata Generation
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Scenarios for Projects Producing Digital Resources
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Scientific Metadata
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Policy and legal
- Five Steps to Developing a Research Data Policy
- Overview of funders' data policies
- Funders' data policies
- Institutional data policies
- Policy tools and guidance
- RDM guidance webpages
- Roadmaps to EPSRC Expectations
- Freedom of information FAQ
- MRC data plan FAQ
- Open source FAQ
- Data Management Plans
- Case studies
- Repository audit and assessment
- Publications and presentations
- Curation journals
- Informatics research
- External resources
- Tools & Services
- Guidance, Reports and Directories
- Projects and Initiatives
- Organisations and Networks
- Standards and Specifications
- Resources of Historical Interest
- Online Store
- Briefing Papers
- Forthcoming training events
- Request a training session
- Previous training events
- Training and reference materials
- Career profiles and related data management skills
- DC 101 training materials
- Disciplinary RDM training
- RDM for librarians
- Skills frameworks
- Data management courses and training
- Research Data Management Forum (RDMF)
- Interviews: Setting the Scene
- Social media directory
- DCC Associates Network
- DCC blogs
- Survey: Budgeting for RDM
- Tailored support