Because good research needs good data

Microsoft Research Faculty Summit: eScience

Adrian Richardson | 17 July 2007

Microsoft Research Faculty Summit 2007
Microsoft Conference Center, Redmond, Washington, July 16

ESCIENCE: DATA CAPTURE TO SCHOLARLY COMMUNICATION
Tony Hey, Microsoft Research (Chair)

Research Communication, Navigation, Evaluation, and Impact in the
Open Access Era, Stevan Harnad, University of Southampton

The global research community is moving toward the optimal and
inevitable outcome in the online age: All research articles as well as
the data on which they are based will be openly accessible for free for
all on the web, deposited in researchers' own OAI-compliant
Institutional Repositories, and mandated by their institutions and
funders. Research users, funders, evaluators, and analysts, as well as
teachers, and the general public will have an unprecedented capacity not
only to read, assess and use research findings, but to comment upon them, entering into the global knowledge growth process. Prepublication preprints, published postprints, data, analytic tools and commentary will all be fully and navigably interlinked. Scientometrics will generate powerful new ways to navigate, analyze, rank, and evaluate this Open Access corpus, its past history, and its future trajectory. A vast potential for providing services that mine and manage this rich global
research database will be open both to the academic community as well as
to enterprising industries. [See: "Publication-Archiving, Data-Archiving and Scientometrics," forthcoming in CTWatch]
http://users.ecs.soton.ac.uk/harnad/Temp/ctwatch.doc

The Digital Data Universe
Chris Greer, National Science Foundation

CyberInfrastructure to Support Scientific Exploration and
Collaboration Dennis Gannon, Indiana University

Funding for experimental and computational science has undergone a dramatic shift from having been dominated by single investigator research projects to large, distributed, and multidisciplinary collaborations tied together by powerful information technologies. Because cutting-edge science now requires access to vast data resources, extremely high-powered computation, and state-of-the-art tools, the individual researcher with a great idea or insight is at a serious disadvantage compared to large, well-financed groups. However, just as the Web is now able to provide most of humanity with access to nearly unlimited data, theory, and knowledge, a transformation is also underway
that can broaden participation in basic scientific discovery and empower entirely new communities with the tools needed to bring about a paradigm shift in basic research techniques.

The roots of this transformation can be seen in the emergence of
on-demand supercomputing and vast data storage available from companies like Amazon and the National Science Foundation's TeraGrid Science Gateways program, which takes the concept of a Web portal and turns it into an access point for state-of-the-art data archives and scientific applications that run on back-end upercomputers. However, this transformation is far from complete. What we are now seeing emerge is a redefinition of computational experiment from simple reporting of the results from simulations or data analysis to a documented and repeatable
workflow in which every derived data product has an automatically
generated provenance record. This talk extrapolates these ideas to the
broader domain of scholarly workflow and scientific publication, and
qualitative as well as quantitative data, and ponders the possible
impact of multicore, ubiquitous gigabyte bandwidth and personal exabyte storage.