You are here
Archiving Web Resources
Dave Thompson, Wellcome Library
Published: December 2008
The World Wide Web is among the most important information resources, and is certainly the most voluminous. In a relatively short time, it has become a vital medium for a range of academic and commercial publishers.
However, until recently, little effort has been directed towards ensuring the long term preservation of the digital assets that reside on-line. The web's dynamic nature makes it prone to frequent changes, and without a means for capture and preservation it's likely that vast quantities of content will be lost forever.
Since the web is home to a vast range of materials with widely varying characteristics in terms of formats, scale and behaviour there are inevitable issues that must be overcome to facilitate their collection, management and preservation.
- Automation of harvesting
- Deposit approach
- Selection, negotiation and capture
- Issues associated with the "deep" web
- Existing initiatives/products (e.g. Internet Archive, NWA, PANDAS)
- Legal implications
- Collaboration and responsibility
- Non-standard media types