Screenshot of the Web Curator Tool, includes in tray, harvest authorisations, permission request templates, reports.

Web Curator Tool (WCT)

One of our innovative tools is Web Curator Tool. The Web Curator Tool is used for harvesting websites, web pages, and other documents on the internet.

What is the Web Curator Tool?

The Web Curator Tool is designed for use in libraries and other collecting organisations. It supports collection by non-technical users while still allowing complete control of the web harvesting process. The tool supports:

  • harvest authorisation — obtaining permission to harvest web material and make it accessible.
  • selection, scoping and scheduling — deciding what to harvest, how, and when
  • basic description — adding unqualified Dublin Core metadata and web-specific notes;
  • harvesting — downloading the selected material from the internet;
  • quality review — ensuring the harvested material is ready to archive; and
  • archiving — submitting harvest results to a digital archive.

The National Library of New Zealand runs a selective web harvesting programme using the Web Curator Tool (WCT). Websites harvested by this method are deposited into the Library’s National Digital Heritage Archive (NDHA) archive using Rosetta.

Workflow of the Web curator tool, deposit to staging TA assessor to arranger, approver, to permanent repository.
Diagram of the workflow of a web harvest into a preservation system.National Library web harvesting

Development of the Web Curator Tool

The Web Curator Tool was developed in 2006 as a collaborative effort by the National Library of New Zealand and the British Library. The project was initiated by the International Internet Preservation Consortium (IIPC).

In December 2018, the Web Curator Tool 2.0 was released. This release is the product of a collaborative development effort started in late 2017 between the National Library of New Zealand (NLNZ) and the National Library of the Netherlands (KB-NL).

British Library
International Internet Preservation Consortium (IIPC)
National Library of the Netherlands

Web Curator Tool is open-source

The Web Curator Tool is written in Java and designed to run in Apache Tomcat. It has a flexible architecture, allowing the components of the tool to be distributed over multiple servers.

Web Curator Tool is available under the terms of the Apache Public License. The Web Curator Tool was released as open-source software and can be downloaded from GitHub.

Web Curator Tool on GITHub

Before you use the Web Curator Tool we recommend you read the Web Curator Tool Documentation. Documentation relating to the older version of the Web Curator Tool are available on GITHub.

Web Curator Tool documentation — current Web Curator Tool documentation.

Older versions of the Web Curator Tool documents on GITHub — the documents on GITHub relate to previous versions of the Web Curator Tool.

Web Curator tool 2.0 handout (pdf, 110KB) — find out what is new in the Web Curator Tool 2.0.