IIPC WW1 commemorationMarch 9th, 2016
The National Library of New Zealand is a member of the International Internet Preservation Consortium (IIPC), which includes many national libraries, universities and other institutions around the world who are committed to preserving our web history and making it available for research.
Some institutions like the National Library have a legal mandate to collect and preserve websites within their national domain, but the Web itself crosses international boundaries. Working together collaboratively gives us a chance to identify websites from each country that relate to an international event and showcase the collections collectively to the public.
Collecting websites is carried out using a process we call ‘crawling’ or ‘harvesting’ where a web crawler copies the publicly available parts of a website and it is stored in WARC (Web ARChive) files and made available to researchers using a special viewer.
In May last year a call went out from the IIPC asking if any members were interested in taking part in a collaborative project to identify and collect websites relating to the commemoration of World War One. The British Library, Bibliothèque nationale de France (National Library of France), Deutsche Nationalbibliothek (German National Library), Netarchive.dk (they collect and preserve the Danish part of the Internet), Instituto de História Contemporânea (Portuguese Web Archive), Library of Congress , Library and Archives Canada-Bibliothèque et Archives Canada, National Library of Australia and the National Library of New Zealand expressed interest.
Two crawls took place in November 2015 and March 2016. Here’s the result: www.archive-it.org/collections/6415
The scope of the crawl
The National Library was already archiving military history websites, so it was a matter of selecting websites that specifically covered WW1 commemorations and activities. The Library identified 33 New Zealand websites for the crawl. Websites that contained a broader coverage of military history were excluded from the collaborative crawl, but were harvested within the usual Library web harvesting programme. They are available to the public by searching the collections on the National Library website.
The IIPC members also identified websites that were not part of IIPC membership, but important to consider in the crawl, such as Turkish websites about Gallipoli and other campaigns. The aim was to collect content from a variety of perspectives.
The IIPC WW1 collaborative collection is hosted on Archive-It which is a web archiving service built at the Internet Archive. A total of 1500 websites were collected during the November crawl (774 GB data) and more will be collected in March.
Advantages of a collaborative crawl
Our web archiving team in the Alexander Turnbull Library often find overseas websites that are either written by or contain information about New Zealanders. If they’re not in scope for Legal Deposit, but are important to the Turnbull Collections we will seek permission to harvest and archive these sites. This however is time consuming and not practical if only a few pages relate to New Zealand here and there throughout a website. If the website is located in a country that has a web archiving programme, that’s great, because you know the content is being captured, but often access is limited to a library’s reading room.
The National Library’s web collections are catalogued, but they’re not indexed which means researchers have to browse through the archived website to find any specific content they may be looking for. That’s relatively easy if the website is small and easy to navigate, but it’s difficult to find content that’s not indexed when the sites are much bigger. The advantage of the collaborative crawl is that the collections are fully indexed by the Archive-It service so researchers can keyword search the entire collection and also apply some search filters.
The collaborative nature of the crawl means that when researchers are looking at a website and want to follow a link through to another website, the link will take them there if it has been included in the collection. For example, The National Library archived Big Bearers NZ Tunnellers in our web harvesting programme, but not the resource link to “Les Boisselle Study Group” because it’s not in scope for our Library’s collection. However if you click on the link in the IIPC collaborative collection you’ll be able to view the website because another IIPC member identified it as in scope for their collections.
Limitations of the crawl
There are always limitations with collecting the web. Technical and legal issues may prevent the harvesting of some content. It’s a snapshot in time and run within a set budget. Some websites may have disappeared from the web prior to the crawl or have yet to be created. As with many collections, this is a great place to start, but you may also need to search local web archives or the Internet Archive for content not found in the collection.
For example, one site I had identified for the collaborative crawl had disappeared by the time the crawl took place. It was an online Field of Remembrance where people could donate to the RSA and post their thoughts online. It wasn’t part of the WW100 centenary project that a number of countries have created during 2014-2019, but it did commemorate Anzac Day. Thankfully, it was captured in the regular web harvesting work we carry out. We archived a copy for the Library’s collections.
How are people commemorating WW1?
Looking at the collection of websites, it’s interesting to see how people have been commemorating WW1. As you would expect, there are numerous blogs that have been created to remember the lives of family members who have served in WW1. Others focus on a specific topic like painting miniatures or collecting WW1 postcards:
Some websites provide social commentary on why and how we commemorate such events: http://wayback.archive-it.org/6415/20151116184137/http://clioscurrent.com/blog/2013/10/28/the-trouble-with-canadas-great-war-commemoration
There are educational sites as well. Here Te Kete Ipurangi highlights the participation of Pacific Islanders during WW1.
The commemorations also look at recent events such as Anzac celebrations:
And of course there is the WW100 project:
Gone but not forgotten
Over time as these websites change and are taken down from the ‘live’ Web, it’s encouraging to know they won’t be lost forever. Two hundred years after the Great War when we commemorate WW1 again, people will be able to see how we commemorated the first 100 years.