Collecting social media a bite at a time

New Zealanders are discussing important issues on social media. We want to ensure that the issues that are important to us today are documented and preserved for the future.

Web archiving at the National Library

The National Library uses the term web archiving or web harvesting to describe the copying of webpages using a web crawler.

We collect New Zealand websites because Legal Deposit legislation requires a full archive of what’s published in and by New Zealanders.

Sites and pages disappear from the internet, and you might want to access it in the future. We actively look for the things we think you’ll need — capturing events and moments that are important to New Zealand society.

We’ve been collecting and archiving New Zealand websites since 1999.

Gaps in the collection

Our team has noticed that content is shifting away from the traditional website to social media. This is understandable — it’s easier to post content and quick to circulate. It’s also much more difficult to collect!

This worries us that some of our documentary heritage may be lost if we don’t start collecting content in this area.

Collecting social media is a bit of a mine field. There are legal, privacy and technical issues to work through. So far, we’ve only been able to collect small amounts of social media since we started harvesting the web.

Social media timeline

Social media platforms have grown and multiplied in the last couple of decades. Here’s a partial timeline of platforms that have come, and sometimes gone.

A sample of popular social media channels launched since 1999. There are 18 shown, from Blogger and Livejournal to Periscope and Meerkat.

And this is what the library has collected so far – representative examples from Twitter, Blogspot, Wordpress, and Tumblr, along with some videos provided by content owners that are posted on Youtube or Vimeo.

A contrasting timeline of channels we've collected. It only includes Twitter, Blogspot, Wordpress, and Tumblr.

In future we expect donors will deposit their social media collections with us just as we receive donations of print material into our archival collections today. However not all social media platforms provide an option for you to archive your own accounts.

What happens when a social media platform closes down? In the future will we have any record of what they looked like?

New Zealand General Election event harvest

We’re currently preparing to harvest websites for the 2017 general elections, capturing a snapshot of the event.

These are some of the social media trends we’ve noticed over the years as we’ve collected in this area:

Blogging and Youtube

In 2008 blogging was very popular in the political arena and we were able to collect a number of blogs in addition to the usual party, candidate, political commentary and lobby group websites. Youtube was also growing in popularity and most of the major political parties were using Youtube to get their campaign messages out.

Twitter and Facebook

In 2011 Twitter and Facebook had become increasingly common. Many candidates had a static official campaign website, but used Twitter and Facebook to post updates and current news.

The trend away from websites

In 2014 there was a significant shift away from traditional websites to using social media almost exclusively. Many candidates either had a largely static website, or abandoned using a website all together. Instead they opted to use social media as their main campaign platform.

To give you an example here are some statistics our team gathered during the 2011 and 2014 elections.

Chart showing candidate's use of online channels in 2011 and 2014 elections.

While a lot of candidates in 2011 only had a website, very few stuck with that choice in 2014. Hundreds added some form of social media, or went with just Facebook and Twitter, and no website. The number using Facebook only stayed level, while a few more picked up Twitter only. Interestingly, many more candidates had no website or social media presence in 2014.

Keep in mind that these are rough statistics based on what we identified when we were searching online for candidate websites and social media.

While we noted the use of social media in 2014, we didn’t differentiate which of those accounts were public, personal or a combination of the two, because we weren’t able to collect them.

If you’d like more information about these statistics, email web.archive@dia.govt.nz.

We expect in 2017 that the use of social media during the elections will continue to expand to include other types of social media, in addition to Youtube, Twitter and Facebook.

Where to begin?

Our web archiving team collects the public web. Our aim is to collect social media content that is clearly intended by the content owner to be viewed by a wide (and public) audience — people who want their voices heard.

In the September general elections the focus will be on collecting the political campaigns, candidate sites and political commentary on Twitter. We chose Twitter because it’s a popular platform for discussing events, and it happens to be one of the easiest forms of social media to collect.

Archiving Twitter

There are two ways Twitter allows you to capture content. One is by downloading your own Twitter archive and the other is by using the Twitter API to crawl content.

Archive your own Twitter account

Twitter lets you download your own Twitter archive as a zipped file.

The zipped file contains a number of files that make up a twitter archive, including an index.html file which gives you the Twitter view of your own tweets. Your tweets are organised by month and you can search your archive by keyword, phrases or hashtags. There’s also a CSV and json file which is useful if you want to run some analytics over your data.

Twitter API

You can also programmatically crawl Twitter using their APIs. If you want to capture commentary about a particular event, you can run a crawl to capture Tweets relating to a particular hashtag or search terms. We trialled this during the 2016 Kaikoura earthquake.

We’re planning to run a hashtag crawl during the general elections using a combination of the search API and streaming API. Much of this is based on the work undertaken by Ian Milligan and Nick Ruest when they captured tweets relating to the 2015 Canadian Federal Election.

Access to archived websites

You can search for archived websites on this site or by using the National Library catalogue.

How do you eat an elephant?

We continue to investigate ways of collecting other types of social media to ensure that New Zealand’s documentary heritage is preserved. Meanwhile we’re following the “One bite at a time” principle, starting with Twitter.

In our next blog post we’ll share details of the Twitter crawl we ran during the 2016 Kaikoura Earthquake. Stay tuned!

By Gillian Lee

Gillian Lee is the Coordinator, Web Archives at the Alexander Turnbull Library.

Post a Comment

(will not be published) * indicates required field
Vivienne August 11th at 8:52AM

Do you have any recommended tools for archiving Pinterest sites? Are you planning to capture the New Zealand Pinterest space?

Gillian Lee August 14th at 11:53AM

Hi Vivienne,
If you search Pinterest for “how to save your pin board” you will find options available to you to save copies of your personal pin boards. Pins tend to point to people’s websites and blogs. We’re more focussed on collecting those NZ sites rather than the ‘pins’ themselves. However if we received donated copies of people’s digital pin boards, then they would go through our usual selection process.

Andrew Henry August 15th at 9:17AM

Great to hear you'll be able to capture Twitter for this election harvest :-)
Keep up the good work!