Introduction and context
The purpose of this research project was for the National Library of New Zealand (NLNZ) to establish a greater understanding of the use of archived websites by university academics, and to explore the future direction for web archiving in New Zealand. The Library has been archiving New Zealand websites since 1999 but to date there has been no research completed on the extent to which this collection is known and used.
The hypothesis was that there was little knowledge of the existence of web archives by university academics in New Zealand, and therefore, little use of this resource for research purposes. However, the Library was aware of several New Zealand tertiary courses that use archived websites as a learning resource.
The research sought information regarding how archived websites are used as a teaching and learning resource and what more the Library could offer New Zealand teachers at tertiary level to assist with providing access to archived websites for educational purposes.
2. Definition of terms
A copy of a website or blog that is preserved and made accessible in an archive. Most websites selected by NLNZ are collected regularly to capture new content. For example, the Library has taken a copy of the website Pacifica Inc every year since 2008 and these copies are available here online.
A regular harvest of the publicly available web content from the New Zealand .nz, .net, .org, and .com domains. The 2013 domain harvest collected 150 million urls and consists of 14TB of data.
3. Summary of findings
A full overview of the findings and analysis are provided in section 7 of this report. Below is a brief overview of the main findings of the research:
- The majority of respondents (211, 73%) believe that it is important for New Zealand websites and blogs to be archived.
- 147 respondents (51%) indicated that our collection of archived websites is important for their current research within the next five years.
- any respondents demonstrated confusion between an archived website, and archived items that are available via the web. With filters applied, 21 respondents (7%) can be reliably found to be aware of and have used the New Zealand Web Archive.
- In total 128 respondents (44%) had at some point used one of the following international sources of web archives; the Internet Archive, National Library of Australia Web Archive, UK Web Archive and the US Library of Congress Web Archive.
- 88 respondents (30%) used the Internet Archive at some point for their research, including 14 (5%) using it often.
- 88 respondents (30%) used the Internet Archive at some point for their research, including 14 (5%) using it often.
- URL searching was preferred by only 81 respondents (28%). It was rejected the most strongly out of the search options and notably was rejected more strongly by those who had experience searching the New Zealand Web Archive.
- 43 respondents (15%) use archived websites or blogs from international sources, as a resource in their teaching. The most popular method of using archived websites or blogs in teaching is by providing a link to the archived site; this method is used by 31 respondents (11%).
- Government websites were considered the most important subject area to be archived for the research of 191 respondents (66%).
- 114 respondents (39%) believed archives of social media would be useful to their current or future research. While 105 respondents (36%) did not know, this section engaged the most respondents with 77 adding comments to their answers.
- Video channels were considered the most important medium of social media to be archived by 166 respondents. The second most important medium of social media to be archived was discussion forums which was agreed and strongly agreed upon by 124 respondents.
- Respondents were evenly divided in their opinion of the importance of personal identity and microblogging channels to research.
4. Survey population and methodology
The research was undertaken in partnership between the National Library of New Zealand and Victoria University of Wellington who provided the researcher. The survey was sent to 2470 university academics of the humanities and social science disciplines, from seven New Zealand universities and one wānanga (public tertiary institution providing education in a Māori context). Other disciplines were excluded in order to keep the survey to a more manageable scope and scale.
The invitation bounced from 96 of the respondents who had previously indicated they no longer wished to receive SurveyMonkey requests. In total the survey was sent to 2374 recipients and it was completed by 290 recipients, making a 12.2% response rate. Of the 290 survey responses gathered 257 were fully completed and 33 were only partially completed. Partially completed surveys were included in the final results.
The survey population was chosen because of its likelihood of using archived websites and their relationship to the Library’s resources. To allow the Library to focus on their commitment to New Zealand universities; Polytechnics and other tertiary institutions were excluded from the survey population. Unintentionally New Zealand’s two other Wananga were excluded from the survey population due to the inaccessibility of their staff email addresses. Survey recipients were contacted solely by email and all email addresses were sourced from publicly available websites.
The survey consisted of twenty-four questions based on the following themes: career demographics; awareness of archived websites; use, access and value of archived websites for research and teaching; subject content of web collections; the importance of archiving social media; and domain harvests. The questions contained a mix of dichotomous and trichotomous questions, Likert scales, and free text sections which allowed respondents to elaborate on their answers.
The survey questions were constructed in consultation with Library staff and Gillian Oliver at the School of Information Management, Victoria University of Wellington. The survey was piloted amongst a select group of staff members during November 2014. The survey was distributed on Thursday 4th December using the online survey tool SurveyMonkey. The majority of university academics have finished their teaching by December, although the Library acknowledge that this is still a busy period and the distribution timing of the survey likely affected the response rate. The survey was open for six weeks, closing on the 14th of January. The survey was completed anonymously and was granted approval by the Victoria University of Wellington, School of Information Management’s Human Ethics Committee.
5. International context
The National Library of New Zealand would like to thank the international archiving community for what little has already been published on the scholarly usage of web archives and acknowledge the influence that it has had on this research. Publications on web archiving are split into either a technical archivist discussion or are projections of the generally unknown user of web archives. There is still little in the way of scholarly articles and research on the use of web archives by researchers. (1) However, the following research projects have provided some insight that was drawn upon in our research.
5.1 Netherlands 2007
The ‘Web Archiving User Survey’ was undertaken by Marcel Ras and Sara van Bussel at the National Library of the Netherlands (KB) in an attempt to establish the potential users of a permanent archive of selected Dutch websites. (2) They consulted sixteen existing web archives around the world and concluded that generally the end users of web archives fall into the following categories:
- Website owners
- Public institutions
- Members of the public
A user test was then conducted on fifteen participants drawn from the above categories. They completed a short survey and were observed searching and accessing web archives. In this study users ranked full-text searching as the most important condition to be satisfied by a web archive, this was followed by URL searching and also the importance of making a clear distinction between the presentation of an archived website and a live website. The most important subjects for archiving mentioned by the participants in the Netherlands’ user test were; news sites, weblogs, cultural websites, government websites and scholarly websites. One of the conclusions drawn from the Netherlands’ research was the expectation that the main use of web archives will continue to be for research purposes.
5.2 Portugal 2010
The report ‘Understanding the Information Needs of Web Archive Users’ was undertaken by Miguel Costa and Mario Silva in 2010, for the Portuguese Web Archive (PWA). (3) The aim of the report and the research was to discover the intentions of a web archive user and gather more information regarding which subjects should be archived. The research for this report was threefold and included: analysing 400 search logs; an online questionnaire, which had nineteen respondents; and a laboratory study of twenty participants. Generally their conclusions were in keeping with other trends from research on the use of web archives.
The Portuguese Web Archive can interpret full-text queries and URL searches from the same search box. Their research showed users preferred full-text over URL searches; although examination of the search logs showed that URL searches made up 20.96% of all of the searches submitted, so they are still a popular means of accessing the Portuguese Web Archive. The most frequent use of their web archive was to find a page that was already known to the user, the second was to collect information written in the past, and lastly a small number used the Web Archive to satisfy a transactional need such as downloading an old file. Interestingly, this research revealed a slight tendency for users to prefer older incidences of archived websites than newer ones, which indicates that the importance of an archived website increases with age.
5.3 United Kingdom 2010
The most similar research to the NLNZ project was completed in the UK in 2010 by the Joint Information Systems Committee (Jisc). (4) Their extensive report entitled: ‘Researcher Engagement with Web Archives’ included qualitative research conducted with seventeen individuals from the web archiving community and academics of the humanities and social sciences discipline. The report was primarily focused on answering the following two questions:
- How are researchers in the humanities and social science disciplines currently making use of web archives?
- What sort of technical and policy infrastructures will researchers need in the future in order to facilitate their work?
Their report discusses the gap between the potential and the actual number of researchers using web archives, expressing the general trend of a lack of awareness about web archives. Indeed Jisc recognise that they too know little about the users of their web archive. Overall, their research raised concerns about the rapid turnover of web content and the resulting loss of web data. A concern shared amongst web archivists around the world.
Their report discusses the difficulties of deciding what to archive, and how to best provide access to the masses of information held on the internet. Firstly, decisions about what to archive have to be made. Interviewee and European Archive director, Julien Masanès, explains the impracticality of librarians and archivists replacing the publisher’s filter at the magnitude of the internet. The selection process creates a tension between libraries, who want to build large multi-purpose web archives, and researchers, who want deep, project specific archives. Interviewee Dr Kristen Foot refers to this relationship and expresses that there should be some level of university commitment and discussion with researchers to support web archiving. Jisc’s report also discusses the relevance and difficulty of archiving social media which is of interest to the National Library of New Zealand who is gauging the researcher interest for this data in New Zealand.
The findings and report by Jisc were important in establishing the NLNZ research into the ‘Use of Web Archives by Researchers.’ There has been a consistency with the findings of Jisc and the results of our research showing how few people are aware that archived websites exist. Jisc present strong evidence for the need to archive the web for academic and cultural heritage reasons and they encourage library and research communities to develop and share best practises for web archiving to increase accessibility and awareness.
5.4 France 2012
A qualitative study entitled ‘Web Archives for Researchers: Representations, Expectations and Potential Uses’ was undertaken in France by the Bibliothèque nationale de France (BnF) in 2012. (5) The aim of the project was to assist planning for the future of their Internet legal deposit. There were three user groups and a total of fifteen interviews conducted. This study focused on the first user group consisting of five researchers. The intention was to analyse the subject’s practice in web research and their perception of web archives. All of their five researchers recognised the value of web archiving but none had in fact used the web archives of the BnF.
The BnF’s research, like that of Jisc in the United Kingdom, grappled with the debate around archiving social media. The BnF researchers cited the difficulties in determining the value of social media stems from the difficulties distinguishing between the public, or ‘published’ domain, and the private personas and conversations of individuals, with the latter having questionable value for archival purposes, and raising ethical questions that should be addressed before collecting. However, the BnF also acknowledge that archives of social media could be a great resource for future researchers, but that it is impossible to predict what material will interest researchers in the future. The BnF conclude that promotion and communication of web archives is necessary to engage researchers and improve their service for the future.
6. The New Zealand Web Archive
The New Zealand Web Archive forms part of the Alexander Turnbull Library research collections. The Library began selecting websites in 1999 and the collection has continued to grow with active development since 2005, reflecting New Zealand’s growing online cultural and historical presence. The selected websites in the collection cover, over 22,000 sites at the time of writing, a diverse range of subjects and significant events, and is strong in the following areas:
- politics, including blogs, and general and local body elections
- Māori, including iwi and Treaty of Waitangi
- community and ethnic groups
- music, including labels, organisations, artists and directories
- the arts
- the environment
The collection also provides a visual history of how websites change over time and most of the websites in the archive are collected at regular intervals to ensure new content is captured. This content includes web pages, images, multi-media, and publications, such as journals, that are made publicly available.
1. Brenda Reyes Ayala. Web Archiving Bibliography. Denton: UNT Libraries, 28 June 2013. p. 2. ^
2. Marcel Ras, and Sara van Bussel. Web Archiving User Survey. The Hague: National Library of the Netherlands (Koninklijke Bibliotheek), 2007. ^
3. Miguel Costa, and Mario J Silva. “Understanding the Information Needs of Web Archive Users.” Proceedings of the International Web Archiving Workshop IWAW 2010. Vienna, Austria, 2010. 9-16. Portuguese Web Archive (PWA), 2010. ^
4. Meghan Dougherty, Eric Meyer, Christine Madsen, Charles van den Heuvel, Arthur Thomas, and Sally Wyatt. Researcher Engagement with Web Archives: State of the Art. London: JISC, 2010. ^
5. Peter Stirling, Philippe Chevallier, and Gidas Illien. “Web Archives for Researchers: Representations, Expectations and Potential Uses.” The Magazine of Digital Library Research 18, no. 3/4 (March/April 2012). ^