Because good research needs good data

Archiving Web Resources

Dave Thompson, Wellcome Library

Published: December 2008

The World Wide Web is among the most important information resources, and is certainly the most voluminous. In a relatively short time, it has become a vital medium for a range of academic and commercial publishers.

However, until recently, little effort has been directed towards ensuring the long term preservation of the digital assets that reside on-line. The web's dynamic nature makes it prone to frequent changes, and without a means for capture and preservation it's likely that vast quantities of content will be lost forever.

Since the web is home to a vast range of materials with widely varying characteristics in terms of formats, scale and behaviour there are inevitable issues that must be overcome to facilitate their collection, management and preservation.

Download the Archiving Web Resources chapter (pdf)

Key Points

  • Automation of harvesting
  • Deposit approach
  • Selection, negotiation and capture
  • Issues associated with the "deep" web
  • Existing initiatives/products (e.g. Internet Archive, NWA, PANDAS)
  • Legal implications
  • Collaboration and responsibility
  • Non-standard media types