New Zealand whole of domain web harvest 2010April 8th, 2010 By Courtney Johnston
The Library has announced the dates for the 2010 New Zealand whole of domain web harvest: 12 - 25 May 2010.
In January this year we released an Options Paper, seeking feedback on several issues that arose during the first whole of domain web harvest in 2008. Our sincere thanks to those who took the time to respond. The following decisions have been made:
- The harvest is scheduled to begin on 12 May 2010. There will be a five-week notification period.
- The Library will use several channels to communicate about the harvest, including its corporate website, the LibraryTechNZ blog, a Twitter account (coming today or tomorrow), various mailing lists and forums, and media releases.
- In 2008 the Library made the decision to ignore the robots.txt convention.
- For the 2010 harvest, where a robots.txt file exists the harvester will honour robots.txt except when downloading images and other elements that are embedded in other web pages.
- Website owners can set specific rules for the Library's harvester, which will have the user agent string: NLNZHarvester2010.
- If you have a very restrictive robots.txt file in place already, we would appreciate it if you could provide a more permissive rule for NLNZHarvester2010 to help us capture a complete copy of your website.
Location of harvester
- After consultation with New Zealand telecom vendors we have decided to run the harvest from the United States using the Internet Archive's hardware and network infrastructure, as we did in 2008.
More information and a copy of the full Summary of Submissions is available on the Web Harvest 2010 page on our website. This will be the home for all information for website owners and administrators.
If you want to nominate a website for inclusion in the harvest, use our Nomination Form. You can also use that form to send us a site map.
If you have questions or feedback, please email us Web.Archive@dia.govt.nz. Of course, if you leave comments here we'll do our best to answer them.