Web Curator Tool

The Web Curator Tool (WCT) is a tool for managing the selective web harvesting process. It is designed for use in libraries and other collecting organisations, and supports collection by non-technical users while still allowing complete control of the web harvesting process. The WCT Project is a collaborative effort by the National Library of New Zealand and the British Library, initiated by the International Internet Preservation Consortium. The WCT software was developed by Sytec Resources Ltd and is now available under the terms of the Apache Public License.
Functionality: 
The WCT is a tool for managing the selective web harvesting process. The tool's workflow encompasses the following tasks:
  • Harvest Authorisation: seeking and recording permission to harvest web material, and to make it accessible to the general public.
  • Selection and scoping: determining what material should be harvested, be it a web site, a web page, a partial web site, a group (or collection) of web sites, or any combination of these.
  • Scheduling: determining when a harvest should occur, and when it should be repeated.
  • Description: describing harvests with basic Dublin Core metadata, and other specialized fields (or a by a providing a reference to an external catalogue).
  • Harvesting: the Web Curator Tool will download the selected web material at the appointed time using the Internet Archive's Heritrix web crawler -- each installation can have multiple harvesters on different machines, each which can perform several harvests simultaneously.
  • Quality Review: tools are provided for making sure the harvest worked as expected, and correcting simple harvest errors.
  • Endorsing and submitting: if the harvest was a success, it is endorsed then submitted to an external digital archive.
Level of Expertise: 
The WCT is designed for non-technical users in libraries and other collecting institutions who need to capture web material for archival purposes. It is designed to run in an enterprise setting, and would normally be installed by a system administrator (it is not a desktop application).

The DCC is funded by

Joint Information Systems Committee