Archiving Web Resources

Dave Thompson, Wellcome Library

Published: December 2008

The World Wide Web is among the most important information resources, and is certainly the most voluminous. In a relatively short time, it has become a vital medium for a range of academic and commercial publishers.

However, until recently, little effort has been directed towards ensuring the long term preservation of the digital assets that reside on-line. The web's dynamic nature makes it prone to frequent changes, and without a means for capture and preservation it's likely that vast quantities of content will be lost forever.

Since the web is home to a vast range of materials with widely varying characteristics in terms of formats, scale and behaviour there are inevitable issues that must be overcome to facilitate their collection, management and preservation.

Download the Archiving Web Resources chapter (pdf)

Key Points

Automation of harvesting
Deposit approach
Selection, negotiation and capture
Issues associated with the "deep" web
Existing initiatives/products (e.g. Internet Archive, NWA, PANDAS)
Legal implications
Collaboration and responsibility
Non-standard media types

You are here

Archiving Web Resources

Key Points