Report on Data Preservation in High Energy Physics
2 March, 2009 | in Blogs
By: Chris Rusbridge
There's a really interesting (if somewhat telegraphic) report by Richard Mount of SLAC on the workshop on data preservation in high energy physics, published in the January 2009 issue of Ariadne. The workshop was held at DESY (Deutsche Elektronen-Synchrotron), Hamburg, Germany, on 26-28 January 2008.
In the closing sessions, we have Homer Neal,
"The workshop heard from HEP experiments long past (‘it’s hopeless to try now’), recent or almost past (‘we really must do something’) and included representatives form experiments just starting (‘interesting issue, but we’re really very busy right now’). We were told how luck and industry had succeeded in obtaining new results from 20-year-old data from the JADE experiment, and how the astronomy community apparently shames HEP by taking a formalised approach to preserving data in an intelligible format. Technical issues including preserving the bits and preserving the ability to run ancient software on long-dead operating systems were also addressed. The final input to the workshop was a somewhat asymmetric picture of the funding agency interests from the two sides of the Atlantic."There's a great deal to digest in this report. I'd agree with its author on one section:
"Experience from Re-analysis of PETRA (and LEP) Data, Siegfried Bethke (Max-Planck-Institut für Physik)I had heard of this story from a separate source (Ken Peach, then at CCLRC), so it's good to see it confirmed. I think the article that eventuated is
For [Richard], this was the most fascinating talk of the workshop. It described ‘the only example of reviving and still using 25-30 year old data & software in HEP.’ JADE was an e+e- experiment at DESY’s PETRA collider. The PETRA (and SLAC’s PEP) data are unlikely to be superseded, and improved theoretical understanding of QCD (Quantum ChromoDynamics) now allows valuable new physics results to be obtained if it is possible to analyse the old data. Only JADE has succeeded in this, and that by a combination of industry and luck. A sample luck and industry anecdote:
‘The file containing the recorded luminosities of each run and fill, was stored on a private account and therefore lost when [the] DESY archive was cleaned up. Jan Olsson, when cleaning up his office in ~1997, found an old ASCII-printout of the luminosity file. Unfortunately, it was printed on green recycling paper - not suitable for scanning and OCR-ing. A secretary at Aachen re-typed it within 4 weeks. A checksum routine found (and recovered) only 4 typos.’
The key conclusion of the talk was: ‘archiving & re-use of data & software must be planned while [an] experiment is still in running mode!’ The fact that the talk documented how to succeed when no such planning had been done only served to strengthen the conclusion."
Bethke, S. (2000). Determination of the QCD coupling α_s J. Phys. G: Nucl. Part. Phys., 26.One particularly sad remark from Amber Boehnlein (US Department of Energy (DOE))
"Amber was clear about the DoE/HEP policy on data preservation: ‘there isn’t one.’"The DCC got a mention from David Corney of STFC, who runs the Atlas Petabyte Data Store, however I can confirm that we don't have 80 staff, or anywhere near that number (just under 13 FTE, if you're interested!). The reporter may have mixed us up with David's group, which I suspect is much larger.
In the closing sessions, we have Homer Neal,
"who set out a plan for work leading up to the next workshop. In his words:Well worth a read!
- ‘establish the clear justification for Data Preservation & Long Term Analysis
- establish the means (and the feasibility of these means) by which this will be achieved
- give guidance to the past, present and future experiments
- a draft document by the next meeting @SLAC.’"
- Home
- Digital Curation
- About Us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating emails
- Curating e-science data
- Curating geospatial data
- Data accreditation
- Data Citation and Linking
- Data protection
- Database archiving
- Digital repositories
- Freedom of Information
- Genre classification
- Interoperability
- Persistent Identifiers
- Trust through self audit
- Using OAIS for curation
- Web 2.0
- What is digital curation?
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Making the Case for RDM
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Case studies
- Tools and applications
- Standards
- Publications
- External resources
- Roles
- Curation journals
- Informatics research
- Briefing Papers
- Training
- Projects
- Community
- Contact Us
Promoting the Phase 3 programme
Promoting the Phase 3 programme
You work hard to get research results – make sure that your data do just as much for you in return. The DCC is here to help you tackle your data curation needs, so read on to find out what we have lined up for the third phase of our programme.
