CDL Web Archiving Service

The California Digital Library's Web Archiving Service is a subscription-based aimed at libraries and academic institutions, allowing them to build archives of websites relevant to their research interests. The archives thus created can be made private, immediately public, or public after an embargo period, and branded to suit the subscriber.


California Digital Library.

Licensing and cost

The service is subscription based; the fee charged is based on requirements so the service provider must be contacted for a quote. The tools used by the service (Heratrix, Nutch, Wayback) are open source, and content is harvested in line with copyright law.

Development Activity

The service is active, and is one of the California Digital Library's main UC3 production services.

Platform and interoperability

The Web Archiving Service, as a web-based service, is platform agnostic.

Functional notes

The service is flexible on matters such as

  • the frequency and depth of captures;
  • the number of sites (base URLs) captured;
  • the number of archives used to organise the captured material;
  • the number of curators managing the account.

A search form for the archive can be embedded in other websites, and records from the archive can be exported as XML. A tool is provided for comparing the capture results from different crawls.

Documentation and user support The website has an FAQ page, and a Learning Center that provides several PDF user guides and video tutorials.

Technical support is provided as part of the service.


The service provides a straightforward web interface for both administrators and users. Collections are browseable by URL and indexed for full text search.

Expertise required

At the simplest level, subscribers can simply select the sites they want to archive and the service will set up a sensible harvesting programme. A good understanding of collection management and web archiving is advantageous for getting the most from the service.

Standards compliance

The archived material is stored in the industry-standard ARC format.

Influence and take-up The service is mainly used by US universities and colleges, though the list of subscribers includes the United Nations Food and Agriculture Organization. The website lists the public archives hosted by the service.

Last reviewed: 
29 April, 2014