IDCC11 Preview: An interview with Victoria Stodden
In the fourth in our series of preview posts ahead of the IDCC 11, we interview Victoria Stodden, Assistant Professor at the Department of Statistics at the University of Columbia. She shared with us what she sees as the main stumbling blocks to open science and explains why she believes reproducibility of research is a key driver for openness...
You will be talking about reproducible research in your presentation at IDCC 11. What are the main outcomes are you hoping for from your talk?
I'm very happy to see theme such as open science and open data being discussed more frequently at conferences and workshops. What I hope to do with my talk is frame many of these different issues people are discussing within the context of the scientific method. I've found this to be the most powerful communication tool when trying to reach scientists and science supporters: how can we support scientific norms, particularly regarding computational research?
Reproducibility is a key part of the scientific methods and provides the underlying rationale for openness in scientific knowledge.
Science isn't about finding answers - a diligent researcher in his or her basement figuring things out but telling no one isn't doing science – science is about communicating both discovery and method.
Lots of people argue for open data but far fewer practice what they preach. What do you think is needed to encourage more data sharing?
Big question. Short answer: incentives. Scientists do lots of things they don't like in order to conform with the scientific method (who likes actually writing papers?) and we don't have the incentives in place to reward the full communication of method, which must include data and code sharing, such that published results can be conveniently reproduced by others in the field.
We are moving toward this though and I am encouraged, but scientists are subject to pressure from journal publication requirements, funding agency requirements, promotion and hiring committee expectations, the demands of their particular research problem, incentives to commercialize discoveries, and legal restrictions on sharing, among others.
Data and code sharing requires serious effort and without incentives in place computational scientists face a collective action problem: why do something unrewarded at the expense of doing work that is rewarded? As you can see from the list it is nearly impossible for all the incentive mechanisms to work together to rectify the problem, and there is no easy one size fits all solution to implement.
So we are moving step by step and slowly as the pieces fall into place from the various groups, but we are moving.
Reproducibility is too fundamental to science for this not to be the case.
How do you think we can encourage 'unexpected' reuse of data?
I'd be wary of creating additional demands on those who are sharing data. As you can see from my answer to the previous question, there are plenty of difficult barriers already. One thing that can help, and that doesn't inhibit sharing, is research that shows the usefulness of data and code, both to innovation and to researchers' careers.
Karim Lakhani of Harvard Business School has a study showing research problems posed by Innocentive were typically solved by people in quite different fields, and more studies showing increased citation from shared data and code will help break the myth that data sharing isn't personally advantageous to the sharer.
Making sure we cite appropriately for reused data and code will certainly help as well.
We usually see funders, data creators, universities and data users as the typical set of stakeholders for data. Would you add any to that list?
What about journals (e.g. the Elsevier Executable Paper Grand Challenge), governments (e.g. Data.gov), and the public?
Which group of stakeholders do you believe can do the most to promote a culture of wider reuse of data?
It's an interlocking effort. Each has a role to play and all can take steps to facilitate reproducibility in computational science.
What research data management tools do you think will be the ones to watch in the future?
How about project management and code development tools? One thing that is sometimes forgotten in the discussion of open science and open data is that code is inescapably and intrinsically part of the discussion. When you have data, you have code. There is no other way the dataset got there! And no other way to access it!
Some exciting management tools were discussed at a workshop I co-organized in the summer called Reproducible Research: Tools and Strategies for Scientific Computing. It is a mistake to think of data management in isolation, without regard to its interwoven role in a broader research context. This is where the framing of reproducibility can help.
If there was one change that you could make to improve research data management practice, what would it be?
Version control. It's necessary for all sorts of data sharing practices (and code sharing practices) and can be done on web interfaces, openly or privately until publication.
Victoria will be presenting a session entitled 'Reproducible Research' as part of the research perspectives strand of talks on Tuesday 6 December. You can still book your place at the 7th International Digital Curation Conference here.
If you are unable to attend in person, look out for an announcement next week about how you can take part remotely, or track the conference via Lanyrd [http://lanyrd.com/2011/idcc11/] to be notified about the arrangements.
- Digital curation
- About us
- Briefing Papers
- Introduction to Curation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Persistent Identifiers
- Trust Through Self Assessment
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Making the Case for RDM
- 5 Steps to Research Data Readiness
- Citizen Science
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides & Checklists
- Five Steps to Decide What Data to Keep
- How to Appraise & Select Research Data for Curation
- How to Cite Datasets and Link to Publications
- How to Develop RDM Services
- How to Develop a DMP
- How to Discover Requirements
- How to License Research Data
- How to Track Data Impact with Metrics
- How to Write a Lay Summary
- Developing RDM Services
- Reviewing research data platform capabilities at CISER
- Using EPrints to Build a Repository for UEL
- Assigning DOIs at Bristol
- DMPs in the Arts and Humanities
- Improving RDM at Monash
- Improving Research Visibility
- Increasing Participation in Training
- RDM Training for Librarians
- RDM strategy: moving from plans to action
- Storing and Sharing Data in Hull
- Curation Lifecycle Model
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Automated Metadata Generation
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Scenarios for Projects Producing Digital Resources
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Scientific Metadata
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Policy and legal
- Five Steps to Developing a Research Data Policy
- Overview of funders' data policies
- Funders' data policies
- Institutional data policies
- Policy tools and guidance
- RDM guidance webpages
- Roadmaps to EPSRC Expectations
- Freedom of information FAQ
- MRC data plan FAQ
- Open source FAQ
- Data Management Plans
- Case studies
- Repository audit and assessment
- Publications and presentations
- Curation journals
- Informatics research
- External resources
- Tools & Services
- Guidance, Reports and Directories
- Projects and Initiatives
- Organisations and Networks
- Standards and Specifications
- Resources of Historical Interest
- Online Store
- Briefing Papers
- Curation webinars
- Digital Curation 101
- Materials for Trainers
- Data management courses and training
- Tools of the Trade training
- RDM for librarians
- Research Data Management Forum (RDMF)
- Interviews: Setting the Scene
- Social media directory
- DCC Associates Network
- DCC blogs
- Survey: Budgeting for RDM
- Tailored support