Close message
Skip to content

Whole of domain web harvest

This page is the primary source of information about the Library’s domain harvesting.

Why does the National Library collect websites?

The National Library exists and has a social responsibility to preserve New Zealand's social and cultural history, whether in the form of books, newspapers and photographs, or of websites, blogs and videos.

The New Zealand Web Domain Harvest recognises the importance of the internet in all areas of New Zealand society and culture by taking a ‘snapshot’ of the whole .nz domain as it exists on the web during the time of harvesting. The Library’s first domain harvest took place in 2008. Further harvests were run in 2010 and 2013. They have been run annually since 2015.

The National Librarian is authorised to harvest websites by the National Library of New Zealand (Te Puna Mātauranga o Aotearoa) Act 2003 and the Minister’s National Library Requirement (Electronic Documents) Notice 2006.

What is a domain harvest?

The Library undertakes two streams of web archiving: selective harvesting and domain harvesting.

Selective archiving is where Library staff select high-value websites for inclusion in our collections. The Library has been selectively harvesting since 1999.

Domain harvesting is an attempt to harvest as much material as is technically possible with a minimum of human intervention. It is called "domain harvesting" because the simplest approach is to try to harvest an internet domain, such as the NZ (or ".nz") domain for New Zealand.

Technical details

The technical parameters of the harvest were developed after consultation with the public and internet stakeholder groups.

We acquired:

  • Websites that fall under the .nz country code
  • Websites that fall under .com, .net and .org that can be programmatically determined to be hosted on machines that are physically located in New Zealand
  • Selected websites based overseas that are covered by the provisions of the National Library of New Zealand Act (2003)

The Library commissioned the Internet Archive (an American-based not-for-profit) to perform the harvest on our behalf.

The crawler uses the user agent string NLNZ_IAHarvester[year].

Some web harvesting statistics

Year

Number of URLs harvested

Size of the harvest

2008

105 million URLs

4.1 Tb uncompressed data

2010

130 million URLs

8 Tb uncompressed data

2013

150 million URLs

12.59 Tb uncompressed data

2015

200 million URLs

15 Tb uncompressed data

The data is stored at the National Library in Wellington.

Contact us

If you would like to send us a question, or comment on the domain harvest, email Web.Archive@dia.govt.nz.

What is a domain harvest?

Domain harvesting is an attempt to harvest as much material as is technically possible with a minimum of human intervention. For the New Zealand Web Harvest, we try to harvest an entire internet domain, the NZ (or ".nz") domain for New Zealand.

Nominate a site for the harvest

If you would like to nominate a site to be included in our regular harvesting that you feel may be missed or overlooked (such as a New Zealand based site that may not have a .nz domain), please fill out the nomination form.