- Events
- Five real life lessons from digital archiving
Five real life lessons from digital archiving
Part of Connecting to collections 2023 series
Video | 1 hour
Event recorded on Tuesday 16 May 2023
In this Connecting to Collections online talk, Valerie Love, Kaipupuri Pūranga Matihiko Matua Senior Digital Archivist will offer insights into the behind-the-scenes work that goes on to preserve unpublished born-digital collections at the Alexander Turnbull Library.
Transcript — Five real life lessons from digital archiving
Speakers
Joan McCracken and Valerie Love
Mihi and acknowledgements
Joan McCracken: To open our talk today, we have as our whakatauki a verse from the National Library's waiata, “Kōkori, kōkori, kōkori,” Na our Wainui colleague Balla Tarawhiti.
Haere maie e te iwi
Kia piri tāua
Kia kite atu ai
Ngā kupu whakairi e
And now, it's my great pleasure to introduce Val Love, our senior digital archivist at the Alexander Turnbull Library. Welcome, Val.
Introduction
Valerie Love: Kia ora Joan. Tēnā koutou katoa. Nō Pennsylvania, America ahau, noho ana ahau ki Te Whanganui-a-Tara. He Kaipupuri Pūranga Matihiko Matua ahau. Mahi ana ahau ki te whare pukapuka o Alexander Turnbull. Ko Valerie Love tōku ingoa. No reira tēnā koutou, tēnā koutou, tēnā koutou katoa.
Well, hello and welcome, everyone. My name is Valerie Love, and I am the senior digital archivist at the Alexander Turnbull Library, which holds the archives and special collections for the National Library of New Zealand. And in my role at the Library, I care for incoming born-digital collections, and I oversee the lifecycle of unpublished digital files.
I've been working at the Turnbull Library for nearly 11 years now, first as a digital specialist on the arrangement and description team, a curator, and currently as senior digital archivist. For today's talk, I'm going to talk about some of the things that I've learned over the years working with digital collections. And because May is New Zealand Music Month, I've picked some great case studies from the born-digital music collections held at the Turnbull Library and some key lessons and insights that have come from that.
So when people think of the Turnbull Library, is this the first thing that comes to mind? What's on the slide at the moment are some early pages from the Kirkcaldie & Stains catalog, probably from about 1909 or 1910. And those pages from the catalog have been interleaved with notes and recipes. And they were part of the archive of the Kirkcaldie & Stains Department Store which operated in Wellington on Lambton Quay from 1863 to 2016.
But the collections at Turnbull Library are much more than just the early history of Aotearoa, New Zealand. The Library is actively documenting and preserving contemporary 21st-century culture as well. And this includes everything from photographs taken on smartphones to oral history recording interviews to websites and social media, and just about everything in between.
So as the 110-year-old papers from the Kirkcaldie & Stains collection show, printed materials will last a pretty long time if they're stored in reasonable conditions. Even if they are just put in a box and forgotten about, as long as they aren't exposed to the elements, they should be OK. But for digital files, the exact opposite is true. If digital material is left on its own for decades, it very, very likely will not survive.
So the Library has a whole digital preservation program to provide active care and management for born-digital collections to ensure that they will be accessible into the future, just like these pages are. But before we dive into the collections in more detail, let's take a step back and talk about what we mean by born-digital collections.
So what actually is a digital file?
So what actually is a digital file? Digital files are really just strings of binary data, either electrical or magnetic charges represented by ones and zeros, and put together in myriad ways to convey information. In addition to the raw data that they contain, digital files usually include instructions that tell the computer or other device reading it what to do with the particular file.
People often think of digital files as this ethereal thing just floating up there in the cloud. But there is also very much a physicality to digital materials in the form of hardware, disks, drives, and servers. But digital files also have infrastructure dependencies. They need operating systems and software to function because a digital file can't just exist on its own.
It requires hardware and software to be rendered which, in turn, requires equipment and electricity. And they also need a stable environment to protect digital files from damage.
So at its core, my work is really to help the Library acquire and care for digital files.
To keep digital files safe and accessible in perpetuity, there are four main digital preservation strategies that could be considered. The first one is to maintain the original technical environment. The second one is to replace the original software with a backwards-compatible application. The third strategy is to emulate by creating a virtual version of a suitable environment for digital materials. And the fourth strategy is migration — to migrate the content into a new format that can be accessed by contemporary systems.
For digital files, where the informational content itself is the primary material to be preserved, migration is generally the simplest approach for long-term preservation. And many of us are already doing this, whether or not we realise it. You might open a file in Microsoft Word, and it asks you if you want to save it to the latest version.
Or you might take a raw Photoshop file and save it as a TIFF or a PDF so it can be shared with somebody who doesn't have a Photoshop license. So many of us are already engaging in digital migration practices one way or another. And that brings me to my first real-life lesson in digital archiving.
Lesson 1: Digital curation begins at content creation
So digital curation begins at content creation. So plan for the future of your digital files right from the start. The National Digital Heritage Archive was established nearly 15 years ago to preserve and provide ongoing access to Aotearoa's digital content or digital cultural heritage, I should say. Currently, it contains over 46 million files spanning across 250 different file formats and consisting of 575TB of data for both the National Library and Archives New Zealand.
But the truth is that the Turnbull Library and other libraries and archives like it will only ever hold just a tiny fraction of our digital cultural heritage. So I encourage everyone to become proactive in terms of thinking about your own digital files. Because, spoiler alert, the more digital stuff you have, the easier it is for the stuff that really matters to get buried or overlooked.
So take some time to think about the content that you're creating. Are you creating digital files that you'll want to last for generations? Or is your digital content mainly useful just for right now — what if that photograph of my morning tea scone was accidentally deleted?
Whereas the photos from my citizenship ceremony, for example, I would be very upset if those were accidentally deleted or if they weren't able to be shared in the future with people that I wanted to share them with.
So it's really all about prioritizing the digital files that are actually important to you to preserve and making sure that you're looking after those.
But let's get back to some of the ATL collections. So I've been lucky to work on a number of really fascinating music-related collections in my time at Turnbull Library, including processing the professional papers of the Topp Twins via their management company, Diva Productions.
You can see the photo of the Twins at the Library in the collection stores pointing to their boxes on the shelves. The Topp Twins material comprised 22 boxes and primarily print material such as the magazine cover on the slide. But there was also an entire box full of CDs, DVDs and other AV material. And that's really where the challenges were.
As Riccardo Ferrante writes, “A key challenge for the preservation of digital content is technical obsolescence.” Born digital objects require particular hardware and operating systems in order to be accessed and rendered with integrity.
As technology evolves, parts of these technical environments are replaced with newer, faster, and smaller components. The hardware and software necessary for a 10-year-old object to operate as designed is often no longer readily available, neither is the skill set required to maintain that environment. Innovation in the marketplace is, itself, a risk to the cultural heritage that it produces.
And on that note, give me a thumbs up in the chat if you have a CD or a DVD player at home. Anyone out there still have a CD or DVD player? You can use the little emojis to respond. No, maybe not.
I mean, I actually don't have a DVD or a CD player at home. I had a CD player in an old car but not anymore. And none of my laptops or anything have a DVD drive. And that's getting more and more common.
And so my next question for folks is who has CDs or DVDs at home without any way of playing them? I know I certainly do. There's a few DVDs that I can't quite bear to get rid of, even though at the moment, I have no way of, no way of watching them. But they hold sentimental value. And so I'm holding on to them and thinking, well, maybe I will get a DVD player at some point in time or maybe I will have to think about other ways of being able to access this content.
So when we look at how an optical disk, like a CD or a DVD, works they're basically like a sandwich. There's a piece of foil between two protective layers of vinyl. And when the disk is mastered, the foil has tiny pits, which are read by a laser as ones or zeros if there is no pit. And that provides the bitstream that is processed into digital information.
So information is stored very densely on optical disks. And minor damage or deterioration can cause significant information loss. But there's just this very thin layer on top. And if we apply sticky labels or if we write on the disk with a Sharpie like this one has been, we're putting even more strain on that very thin layer of protection for the disk and the digital files on it.
And I know many of us have done exactly this, writing with a Sharpie on a disk when we've burned CDs in the past. So instead, we do advise labeling the disk on the clear plastic inner ring because there's no information stored on that bit of plastic. And so that just gives a little bit of extra protection for the digital files itself.
Now most CDs use a thin piece of aluminum to store the data which is malleable and cheap, but it's also prone to oxidation and microscopic surface irregularities, often scratches, which can cause loss of data. Now if a CD or a DVD has been professionally mastered, like a commercial music recording or video, then it's going to be a little bit more robust or quite a bit more robust than the ones that we might burn at home in the early 2000's. But even those disks won't last forever.
So you can see in the slide an example of a poor-quality disk that turned up in our collections a few years back. And unfortunately, the data that had been on this disk was completely unreadable. So unlike back in 2008, we now know that optical disks of any variety, even the fancy gold ones that promise to live up to 100 years, are not suitable for long-term archival storage.
Technological obsolescence
The average lifespan of an optical disk is between 10 and 30 years. But even if that disk is stored in optimal conditions, it's still under threat by technological obsolescence. The disk may still be pristine. But if the machinery necessary to read the disk or the software to interpret it is no longer available, then that data will still be inaccessible. And that's something that we encounter quite a lot.
So media carriers like CDs and DVDs and even more recent types of carriers like USBs and hard drives are not a permanent storage solution. And this is vitally important for those of us working with digital files and creating digital files. If a digital file exists only in one location, especially if that location is an obsolete media carrier, then it is incredibly vulnerable.
And so backups and redundancies are vital for long-term preservation of digital files. So that way, if one format fails, then another might survive. And some people even advocate buying different brands of hard drives and saving files on each one so that way they are both less likely to fail at the same time.
So in the early part of my digital archivist career, I was transferring digital files from laptops and computers onto CDs and DVDs for safekeeping. But now my work entails pretty much the exact opposite — identifying disks and portable drives in the collections and copying the files from them to ensure that they will be accessible into the future. However, migration isn't a once-and-done thing.
The digital files will need to be migrated from their original disks onto a more stable media carrier, likely a secure server or a portable hard drive. And then, they'll need to be migrated from that driver server in the future as well to remain accessible. And this goes on and on, particularly as technology and media and digital infrastructure changes over time. So with digital curation, there's not really a finish line.
And one of the big challenges of digital archiving is that we need to be ready for the digital content of today as well as the technology of the past, like this Macintosh Powerbook 170 from 1991 that came across my desk earlier this year with a collection. So now seems like a good time to introduce our second real life lesson from digital archiving.
Lesson 2: Set aside some time for your digital files
So lesson two — set aside some time for your digital files. Experience has taught me that digital collections can easily fall to the bottom of people's priorities. And because digital files aren't taking up space on a desk like papers and boxes might be, digital collections can easily suffer from 'out of sight, out of mind'.
And I have to admit, when I come home from work, the last thing I want to do is think about my digital files after having spent the entire day thinking about digital collections. But it is important to set aside time and do this. So that way, the materials that you create that are significant that you want to be preserved, will still be accessible into the future.
And so we recommend four key steps to better manage digital files. The first is to identify. The second is to select and prioritize. The third step is to then organize the materials that you have. And the fourth is to save or back them up, and make sure that they are stored safely.
So regardless of format of whether you're dealing with digital photographs or documents or recordings, these same steps will apply. And these four steps can actually be applied to any physical materials that you have as well, such as papers, or photograph albums, or things of that sort, too. And so the first step really is to figure out what exactly you have and to gather up the information about it. And this is the first thing that we do when a digital collection arrives at the library as well.
So getting back to some of our library collections — John Cousins is a Christchurch-based composer and sonic artist who created multi-media works incorporating sound, image, video, and other formats using a bespoke multi-channel studio that he referred to as the Acousmonium. And in 2017, Cousins donated his archives to the Library. Now, unlike the Topp Twins collection, which was mostly analog with a little bit of digital splashed in there, this collection was the exact opposite. It was mostly digital files with a few analog reel tapes.
So, after lengthy discussions with our music curator, Michael Brown, and the Library's AV conservators about exactly what types of materials and digital files would be in-scope for the Turnbull Library's collections, the digital files came to the library on a 3-terabyte portable hard drive. And so the very first thing that we did was a virus scan of the drive to make sure that we weren't going to inadvertently be bringing any unwanted nasties to the Library's network and computing systems.
And as part of this initial check, I discovered that the hard drive was Mac-formatted. And thus, it wouldn't actually work properly with the library's default PCs that we use for processing digital collections. But luckily, we have a MacBook Pro available for just these occasions.
When I use the MacBook Pro to read the drive, we discovered that the hard drive contained over 18,000 digital files that Cousins had arranged into 15 categories or series such as master audio files, mixed media, photographic montages, source audio, and transformations, as well as alternate versions of his musical works. And each of these series of folders had a set of nested folders inside of them, which was no problem on a Mac. With a Mac, you can have as many folders inside a folder for as many layers as you want.
However, this caused issues on a Windows operating system where there's actually a set limit on the length of file names and file paths. So when we tried to copy the files from the original Mac hard drive, we had to copy them first to a hard drive that could be read by both Mac and PC and, then from that drive to a Windows operating system. And then, we were getting some error messages because the file names and the file paths were too long for Windows to recognise.
So this collection posed a lot of technical challenges because of that, being created on a Mac and then needing to go into a Windows environment, which the National Digital Heritage Archive is based on. But we had lots of tools up our sleeve and lots of, yeah, lots of work. So we use a tool called rsync to copy the files.
And this tool also generated a checksum, which is basically a unique digital fingerprint for each of the files, and created a file list of all 18,000 files that were on that hard drive. So then we were able to analyze that master list of files and get a bit more information about which ones had the file paths that were too long, which files were actually system files that we didn't need to retain, and which files were what I like to call the good stuff that we definitely did want to retain as part of the collection.
And the other thing that that file list and that list of checksums did — so the checksums are basically that unique digital fingerprint for each file. And they allow us to basically have a baseline, a baseline for each digital file as it's received by the Library. So that way, we know if anything has changed in that file over time.
If in 10 years, we generate checksums on those files again, they should match the original checksums from when the files were received. And if for some reason, they don't match, then we know that something has gone wrong with this file — that either it's become corrupted, or it's been accidentally changed, or something's not matching up. So we use those checksums to just make sure that the integrity of the files is consistent over time.
So as I said, we had 18,000 digital files on this 3-terabyte hard drive. And then, we were able to do our analysis. Did we need to preserve all of these files? Were there duplicates? Were there files that were out of scope for the collection?
And when we did this analysis, we determined that out of those 18,000 files, there were actually only about 8,137 files that were really in scope for the collection. There were a number of duplicates and a lot of system files and things that just didn't actually need to be retained by the Library.
However, we kept the hard drive itself as a collection item exactly as it was received. Those 8,000 files that were in scope for the collection were transferred and copied off of the hard drive and ingested into the National Digital Heritage Archive. So that way, they could be accessible to researchers through the library's online catalog and website.
So researchers can actually access those digital files from the website instead of actually having to use this hard drive. This hard drive is in the collections as an item. It stays in the nice, climate-controlled storage for perpetuity. But those digital files are available via the catalog and website.
Now, the collection includes a rather extraordinary sound library of what some might consider random audio and sounds that Cousins had recorded in order to incorporate into his works. And so this ranged from everything from household noises. There were noises of a vacuum cleaner running. There were just all of these day-to-day sounds that you don't really think about, but which actually provide this really wonderful snapshot in time in terms of the things that people were using.
There was audio of birdsong. There was audio of waves. There was audio of all sorts of things you can imagine, including this example here, which is recordings of a bottle of fizzy drink being opened. And if there's any fans of ASMR watching today, you are in for a treat.
[BOTTLE OPENING]
[CAP FALLING ON TABLE TOP]
So that's just a tiny little recording. But that was used in a couple of his works. And so, it's just part of this wonderful compilation of these little audio files that have been generated to be used in other projects. And the vast majority of digital files in this collection are openly available online under a Creative Commons license and downloadable so they are available for other artists to use as well.
Now believe it or not, but a collection of 8,000 digital files is actually relatively small. It's not unusual for digital collections to contain hundreds of thousands of files. So we have tools and scripts that we use to automate as much of the analysis and processing as we can. But there's still a lot that comes down to a human doing the work and doing the analysis of the materials, and determining what is in scope for the collection.
[BOTTLE OPENING]
Oops. There we go. So the other thing to consider is the absolute mind-bending scale of digital content that is being created. Digital content is being created by us, around us, about us. And sometimes digital preservation can feel a bit overwhelming or even impossible.
I mentioned earlier that the library holds about 600TB of data in its digital collections, including both born digital and digitised material. And when I say digitised, I mean that it was originally an analogue format, and then it was created into a digital version. So we have 600TB of digital data.
But in 2022, apparently, 95 zettabytes of data were created globally. And in case you wondered, one zettabyte is a billion terabytes. So the scale of data creation is absolutely massive. And this is really why it's important for us to think about our digital footprints, what digital material we're creating, and what really does need to be preserved and saved for the future, and what actually, maybe, we don't need after all.
Lesson 3: You can't preserve it all
So that brings us to real-life lesson number three. And that is that you can't preserve it all. We, very literally, cannot come anywhere close to preserving the volume of digital data that is created nor should we even try. But again, if you take one thing out of my talk today, I hope that you feel encouraged to be proactive about understanding and managing your own digital files if you're not already.
Now, remember when I said that digital files have a physicality about them. The sad truth is that digital files can and do degrade. So in library and museum land, a significant amount of our work is to mitigate what we call the agents of deterioration for physical materials.
And this is basically the reason why we store manuscript collections in acid-free folders inside of acid-free boxes inside the secure climate-controlled collection stores inside the National Library building which looks like a bit of a concrete fort. And this is to make sure that all of our collections are as safe and secure as they can possibly be and protected from the outside elements.
Now, born-digital collections face the same challenges as physical but with a few additions that are unique to born-digital collections because of the technical dependencies of files and power requirements. So if the internet loses connectivity or there's an electrical outage, you can still access your paper files.
That's the very reason I've printed out this script — just in case something went wrong with the technology today. But the same can't be said for digital. So issues in red are the types of things that we try to mitigate with our digital preservation activities. And you'll see that I've bolded custodial neglect because that really is, honestly, one of the biggest dangers to digital files. Again, that our digital files are out of sight, out of mind, and left on their own. They may not be accessible in the future.
So a good first step is to really inventory and understand what kinds of digital files you actually have. And if those files are important to you, to make sure that there's always a backup.
We also recommend looking at the types of files and file formats that you create and identifying if any special software is required to open them. And again, like the John Cousins collection, this is incredibly important for those of us who are Mac users since Apple files don't always open properly on Windows machines.
But generally, the more common a file format is, such as a Microsoft Word document, the DOC or DOCX or JPEG image file, those ones that are really common that we encounter all the time, the more likely it is that there will continue to be tools available to access those files in the future. If Microsoft Word, for whatever reason, ceased to exist tomorrow, there would be a global outcry because there are so many users of that software.
And because there are currently so many users of that software, there are also other types of software that already exist that can read those files because it is so ubiquitous. So we can be reasonably confident that a standard file format like Word will continue to be accessible into the near future. But does ClarisWorks ring a bell for anyone?
So we regularly get older files being transferred to us, which were created by obsolete software packages like ClarisWorks, which was an old Mac Word processing software, or things like WordStar or WordPerfect, or even more recent file types like Apple Pages or Keynote presentations that won't read properly on a Windows system. So you get files like the one on the right that won't open or won't render properly. And so the information in them can't be accessed.
So if a file is particularly significant to you, but in a less common file format, it may be valuable to create a backup access copy in a more standard file format. And again, those standard file formats will have a critical mass of people using them and working to ensure that they continue to be accessible over time, whereas a really niche or proprietary format is just at a much greater risk of obsolescence.
So in my work as a digital archivist, I've encountered a lot of different digital file formats and types of media carriers. But there's still always some surprises. In 2013, the Turnbull Library accessioned a collection of papers and digital files by musician Ian Morris, mainly relating to his work as a record producer and sound engineer. He performed as part of Th' Dudes and is known for his work to create the iconic snare drum sound in Hello Sailor's "Gutter Black" or as I came to know it, the theme song for "Outrageous Fortune".
Now the Ian Morris collection contained 91 CDs, 86 open reel tapes, 64 digital audio cassette tapes, and 6 audio cassettes, all of which hold their own challenges in terms of the fragility of media carriers and obsolescence. But it also contains something I hadn't come across before. And that was a Jaz disk.
So Jaz disks were a short-lived cousin of Zip disks which were popular in the early 2000s. And we had several Zip drives at the Library, but we didn't have any Jaz drives. And for a long time, we had no luck sourcing one. So we were completely unable to read this disk and figure out what was on it and if it was in scope for the collections.
This past spring, our music curator decided to try again to put out the call for a Jaz drive. And we found one on eBay in Australia, but they wouldn't ship overseas. But then, like a beacon in the night, a Jaz drive came onto the market. It was brand new, still in the original packaging, and had never actually been opened before.
It had been listed online by a seller in the US. And he was willing to ship to New Zealand, so that was amazing. I think we actually referred to it as a digital archivist miracle. But now, we had to actually get this drive to work.
The Jaz drive didn't come with a power cable. So initially, we tried a power cable from another device. And that didn't work at all. That resulted in the power pack and the cable heating up very quickly. So we stopped using that immediately and unplugged everything.
After a chat with another member of staff who just happened to be an old tech aficionado, we learned that for those types of power cables, there are two different ways that the 5-volt and ground connections can be wired. The cable that we had tried first was wired the wrong way. But luckily, we had another cable power pack from an old Iomega hard drive that had the correct wiring. And so then, the Jaz drive turned on.
We inserted the Jaz tools disk to check that the drive was working. And it made all of the appropriate whirring noises so we were in business. However, we still had to get the Jaz drive connected to a computer. And not just any computer, but a Mac computer. And again, cabling was the key difficulty.
We needed to find a SCSI or SKUZ-ee cable that could connect the Jazz drive to an old J3 Mac tower which still ran the original operating system of Mac OS 8.6. None of this is a particularly easy task. And with the speed in which materials go obsolete and the speed in which hardware is replaced by newer machinery.
However, again, our old tech aficionado had a G3 Mac Tower which was, again, a small miracle. And we were able to create a disk image of the Ian Morris Jaz disk on the G3. And then we mounted the disk image in the correct environment and transferred the disk image to a USB. And from there, we used our forensic recovery of evidence device, the really big computer on the left, with our team leader, Anna, and FTK Imager to extract the files from the disk image.
So the secrets of the Jaz drive were finally revealed. And it held a Pro Tools session for importing audio from digital audio tape. And the big reveal was that we had sound recordings of the poet, Sam Hunt, recording or reciting various psalms from the Bible. Now, the digital files themselves were SD2 files, a type of proprietary digital audio workstation file, which we had previously encountered in the John Cousins collection.
As the SD2 files were well obsolete, the library used an old MacBook Pro 7.1 from 2010 to convert them to WAV files or waveform audio files using iTunes software where they could then be preserved more easily in the National Digital Heritage Archive. So we had finally extracted the files. The curator confirmed that they were of significance to the collection. And we were able to retain migrated versions to be accessible into the future.
It was a great outcome. And I always enjoy being able to put on my detective hat and try to get to the bottom of these digital mysteries. But it's not a quick or easy process. And it was very much dependent upon working together as a team to basically do a bit of experimentation and just keep trying and keep sourcing the types of hardware and software that we needed to make it work.
Lesson 4: Older digital is fragile, but so is new
And this brings us to real-life lesson number four — older digital is fragile, but so is new. So this brings us to my final collection case study. Luke Rowell, better known as Disasteradio or Eyeliner, is one of New Zealand's foremost computer musicians.
Over the last 20 years, Luke has performed hundreds of gigs around the world and released over 15 albums of popular electronic music. Among Luke's best-known tracks are the synth-pop hit, "Gravy Rainbow", and his Eyeliner albums are considered among the essential works of the vaporwave movement.
As a computer musician, Rowell's compositions were created using Digital Audio Workstations, or DAWs, and GarageBand which is bundled with Apple devices is a DAW that people may be familiar with. Now these digital audio workstations allow for techniques that were previously only available in recording studios to be accessible to these software users, including multi-track recording, tape editing, sound mixing, and signal processing. And they also offer entirely new tools relating to digital sampling, sequencing, and sound synthesis.
So the files that are created by these DAWs or digital audio workstations are essentially the digital equivalent of what music archives have been previously collecting in analogue form. And if we don't preserve this material, we inadvertently create a void around how born-digital music — and that's most music today — is now being created. However, there wasn't a lot of research available in the digital preservation community about how to actually preserve these files long-term. So in 2020, the library set up a pilot project working directly with Luke to try to archive his materials.
The two albums chosen for the pilot were the 2007 Disasteradio album, Visions, which was created on Jeskola Buzz software, and the 2015 Eyeliner album, Buy Now, which used Nuendo software. Our first discovery, as it turned out, was that the Visions Buzz projects could no longer be opened properly, a mere 13 years after they were created. So, sadly, this was a bit of a digital preservation fail, but it also proved the value and necessity of embarking on this work.
So in place of Visions, we swapped in another Nuendo album, which was 2010's Charisma. The Library worked directly with Luke to create migrated versions of the files in more stable file formats, such as those WAV or waveform audio files, which have a much greater chance of remaining accessible into the future.
Now as a form of migration for digital preservation, this was a very new approach for the Library. Normally, digital preservation staff would manage the migration process following the receipt of the digital files, creating access copies in more stable formats. But the Disasteradio project deviated from this, with migration being undertaken prior to transfer and by the donor himself, as the person best placed to understand his setup and to accomplish the task efficiently. And I'm going to let Luke speak about his creation.
Recording of Luke Rowell: Hey, it's me. And now we're out in the country in the sunny Canterbury plains—
Valerie Love: Sorry.
Recording of Luke Rowell: —Aotearoa, New Zealand with another special announcement.
Last year, with National Libraries New Zealand, we released an archive project for Eyeliner's, Buy Now. Now we're, back at it again for Disasteradio's 2010 album, Charisma. Produced with the National Libraries team and music curator, Michael Brown, we have for you stems with effects, Horses, MIDI, Hot Compost, screencasts featuring the production techniques and ideas behind all the songs, Cool Dogs, including Ralph, stems without effects, herbs, including coriander or cilantro, spreadsheets and screenshots detailing all the recording sessions.
Once again, released under Creative Commons license, attribution, noncommercial, share alike. Yep, this includes everybody's favorite, "Gravy Rainbow." Enjoy Disasteradio's Charisma with National Libraries New Zealand as part of the Luke Rowell music collection, and find all the links below. Cheers.
Valerie Love: So, in essence, Luke became his own digital archivist, working remotely on his files but with lots of conversation with the Library over Zoom and support from the curator and digital archivists at the Library.
Lesson 5: Mahi tahi — working together
Which brings us to our final real-life lesson from digital archiving — and that is mahi tahi — working together. There's a lot that we can do as individuals, but for the big challenges, we need to work together.
I'm thinking back to that slide about the amount of digital content out there. We can't rely on institutions like the Library being able to archive it all. So we all need to play a role in looking after the significant digital files that we create. And it all comes back to those four steps for managing your digital files from the start — identifying what's important, selecting the materials that you want to preserve over time, organising the files so they're findable — makes sense, and there's information about them, and then saving the files and backing them up into another location.
Now the Library has a lot of advice on the website about how you can care for your own digital files and manage them over time. But at its core, it really boils down to identifying and prioritizing what matters, backing up and migrating the files that you really care about. And again, making sure that there's good information about them. So that way other people can understand those files and the significance of them as well.
So in the digital world, we're always playing catch up. Technology changes at a dizzying pace. And we can't actually plan for what technology. And best practice will look like 50 years from now — maybe not even 15 years from now. And with environmental changes and impacts of climate change becoming more and more severe, we do need to rethink what's currently considered best practice and really work towards a more sustainable future.
Remember that your digital files do have a carbon footprint. So less really is more. It really is OK to hit that delete button on the scone photos and other digital files that you don't need, and spend your time preserving the files that you do.
So if I could offer one last piece of advice, it would be to keep taking those small steps forward, one day at a time. By understanding the digital files that you have, prioritising what's important to preserve, and making sure those files are backed up and saved, we can keep the files that we create today accessible tomorrow, and next month, and next year, and the year after that. And if we keep doing that, then eventually we get to that unknowable future. So ngā mihi nui, thank you all very much.
Pātai | Questions
Joan McCracken: Ngā mihi, Val, for some reason, my camera does not want to come on. However, we do have — oh, hooray, I think — we do have some questions. And we do have a little time for more. So if anyone else would like to add a question to Q&A, please do so. But we've got two really interesting questions to start off, one, from Fiona Brooker.
Is there a central database of historic equipment?
Listening to the need for historic equipment, is there a central database of who has what equipment — cables, et cetera — so that multiple repositories can share equipment and New Zealand and potentially further afield?
Valerie Love: Yeah, that's a really great question. I don't know of any sort of official listing. But I do know that we definitely talk to each other at other institutions. I have lots of back-and-forth conversations with people at Archives New Zealand, and Te Papa and other places about what tools and resources they might need for incoming collections. And we do try to work together as much as possible.
So one of the things that I'm actually really excited about in the next few years is there's the new Archives New Zealand building that's being built next door and is going to have a shared bridge between the National Library and Archives. And there's actually going to be a digital preservation studio space in that new building, which is going to have equipment and resources that we all already have.
But they're going to be in one single location. So that way, we can have a dedicated space instead of just at our desks and working a bit more individually. So that's a really great step forward. And yeah, it would be great to have a bit more of a formal registry of what materials there are and what types of things we still might need to source.
How would you suggest smaller institutions without government funding approach the ongoing storage of their digital collections?
Joan McCracken: Thank you. Kia ora, Valerie. From Sussie Best. Great kōrero. How would you suggest smaller institutions without government funding approach the ongoing storage of their digital collections? We were advised to get an NAS and cloud storage, but these are prohibitively expensive options.
Valerie Love: Yeah, that's a great question. Digital preservation is one of those things that can basically be as big as you want it to be. So I mean, obviously, at the National Library, we've got like a pretty robust setup. But digital preservation can be really small scale as well.
It can be backing things up onto portable hard drives and making sure that you check them on a regular basis and have a plan in place to migrate them to different drives every 5 or 10 years, something like that, running scripts to generate checksums on the digital files. So again, you can tell if they've accidentally changed over time.
So there's like the real big things that you can do. But then there's also little steps that organisations that don't have as much resources and funding can still do to protect their digital files. And there is lots of information out there and people are very much welcome to get in touch too if they have specific questions that we might be able to provide some advice for.
Joan McCracken: Thank you, Val. I have put in the chat some of the links that you mentioned, especially to the caring for collections. So that might also be helpful to people. And we have another question here about social media platforms.
Archiving social media platforms
Looking into social media platforms, where most are generated in a public forum without hardware backups, do you anticipate these being adopted as archival records?
Valerie Love: Yeah. That's a really, really interesting question. Lots of people are using social media as a default archive. There's lots of Facebook groups where people are uploading photographs to document neighbourhoods at different times. So social media is definitely a way in which communities are documenting themselves and creating these digital archives. In some ways, that's a fantastic thing.
In other ways, that's a little bit scary. Because social media platforms are ultimately a commercial enterprise. And if they stop making money, then they can easily just cease to exist and delete all of those files and delete all of that information. So there's lots of social media platforms that no longer exist where people put a whole bunch of really great content and really great stuff that does have historical value but which we no longer have access to because those platforms have gone under, like Bebo, like Old Friends — although, the Library does actually have an archive copy of Old Friends, which is great.
But again, there are so many different platforms out there. So if you are using social media as a digital archive, make sure that you export or download a copy of your social media. So that way, you can still have a record of what was uploaded to those spaces. Some social media platforms are better than others about letting you do that.
For example, Facebook groups can't be downloaded because they're created by so many different contributors that you can't actually export that content out. So once it's there, it's sort of there. And we just sort of have to hope that it will continue to be there. But there's other platforms where it is a bit easier to export your data.
How future-proof are sites like Dropbox for storing files?
Joan McCracken: Thank you, Val. I have just posted into the chat your blog from a few years ago about the Library's project to do preservation of Facebook and other social media. So let's go to another question. Hi, Val. How future-proof are sites like Dropbox for storing files?
Valerie Love: Oh, that's a great question. Again, it sort of comes down to that whole, it's a commercial entity. And if they stop making money, they will stop providing a service. We've seen also a lot of platforms like things like Flickr and even some of the Google platforms where you started out with a whole bunch of storage or unlimited storage and then they realised oh, actually, that's not financially viable.
So we're going to give you a certain amount of time to cut down your free storage to five gigabytes or something like that instead of the 15 that you may have previously had. So it is something to consider that even those platforms — they can change their policies at any time. And so you do have to make sure that you're ready to respond to that and have a copy of your stuff elsewhere.
But generally, things like Dropbox are a pretty good place to make sure that you've got that cloud copy of the files that you care about instead of just being on a laptop or something like that. So that's a really good — it's a really good thing to have. If you have your digital files on, for a laptop, for example, or a computer, you maybe have them on a portable drive. And then you maybe have a copy of them in the cloud if it's that really significant stuff that you want to make sure you continue to have a copy of.
Thoughts on the cloud
Joan McCracken: Oh, well, that leads very nicely into the next question, Val. Our archive collection is tied to a territorial authority who is storing info in the cloud. Your thoughts on the cloud?
Valerie Love: Oh, my thoughts on the cloud. My thoughts on the cloud are complicated. At the moment, most of the commercial cloud storage is not going to be Aotearoa-based. Things like Amazon, Web Services, and things like that, those cloud storage providers will be either US, Australia, other places.
So there are private clouds in Aotearoa, New Zealand. And so if making sure that the data you're creating is being hosted locally is important, which oftentimes is very, very important. That is definitely something that exists, but just not with those massive providers. Although my understanding is that they are building or they are planning to build new data centers in Southland and other places. So there will be more local cloud storage available in the future. I don't have the details about exactly when and what — when that will happen.
Cloud storage generally, it is tricky again there is that environmental cost to cloud storage and those servers. There's been lots of research about the impacts of those servers running 24/7, the amount of fuel that they use, the amount of noise that they create, the impact on local communities. It's a real tricky one.
At the same time, data is absolutely necessary. Data storage is necessary. It's not anything that I have a good answer for, unfortunately.
Can computer manufacturers such as Apple assist in preservation efforts?
Joan McCracken: Well, Thank you for that, Val. We've got two questions — we've only got a couple of minutes left — but two questions that are really touching on the commercial aspects of preservation. And one is from our Facebook livestream. Can computer manufacturers such as Apple assist in preservation efforts? And the other one is about the monopoly that is the Wayback Machine. Don't know if you want to comment on either or both of those.
Valerie Love: Yes. I would love to see tech companies becoming more proactive about digital preservation and about ensuring that software is more backwards compatible, that it is easier to migrate things. I mean, there's definitely been work in this space. But it is something that I do think the tech companies need to become more involved in. Some of them are sort of, shall I say, moving in the wrong direction.
Joan McCracken: — And the very last question.
Valerie Love: Yes? Sorry.
What about the Sound Archive?
Joan McCracken: Sorry. No, no. My — I had little cut out there. I just — we did have one other question, but I'm not quite sure — Oh, I'll give it to you and see how you get on with it. What about the Sound Archives, is the question.
Valerie Love: So this is sort of — I guess, this sort of goes back to all of us, like working together and needing to make sure that there's different places that are preserving things. And I guess that sort of ties into the Internet Archive question as well.
Sound archives, I think, are incredibly important. It's really important to have these resources available in order to really understand pieces of our culture and pieces of contemporary life. And the same thing with web archives. And it is a bit tricky.
Sorry, I'm jumping back and forth between questions. But it is always a bit worrying if there's like one place that is the source of information and it does need to be more distributed to make sure that if something goes wrong with that one Archive that there's still other places where that information isn't just totally lost. I'm not sure I answered that at all but —
Closing comments
Joan McCracken: Oh, well thank you very much. I think you did. We have come to the end of our time for today. I will just finish off. But there's some lovely comments in chat. So make sure you have those as well. And thank you to all of you who've contributed to the conversation through the chat. And my apologies for the difficulties early in the presentation.
Thank you too, to my colleagues who have supported today's presentation. And thank you to everyone who's joined us. If you'd like to hear about future events being held at the Library on site or online and you're not already on our What's On mailing list, please do sign up. You can subscribe on the Events page on the National Library website. We've added the address. Actually, we haven't. We should have added the address in the chat. I'll try and do that now.
Remember, you can save the chat and the links we've added by clicking on the ellipsis by the chat button. We look forward to the next time you can join us. Next month, our talk will be about children's literature and children's books. So I hope you can come to that one. Ka kite ano. And we'll finish with a whakatauaki.
Ma te kimi ka kite.
Ma te kite ka mohio.
Ma te mohio ka marama.
Nau mai, haere mai. Thank you, Val.
Valerie Love: Kia ora. Thanks, everyone, for coming along today.
Any errors with the transcript, let us know and we will fix them. Email us at digital-services@dia.govt.nz
Preserving born-digital collections with a music month theme
In honour of New Zealand Music month, Valerie will give an overview of some of the contemporary born-digital music collections held at Turnbull Library, and challenges of preserving digital media. The talk will also offer tips and advice for preserving your own digital files.
Register for a link to join this talk
This event will be delivered using Zoom. You do not need to install the software in order to attend, you can opt to run zoom from your browser.
Register if you’d like to join this talk and we'll send you the link to use on the day.
About the speaker
Valerie Love is Kaipupuri Pūranga Matihiko Matua Senior Digital Archivist at the Alexander Turnbull Library.
Check before you come
Due to COVID-19 some of our events can be cancelled or postponed at very short notice. Please check the website for updated information about individual events before you come.
For more general information about National Library services and exhibitions have look at our COVID-19 page.
Connecting to collections talks
Want to know more about the collections and services of the Alexander Turnbull Library and National Library of New Zealand? Keen to learn how you can connect to the collections and use them in your research or publication? Then these talks are for you. Connecting to Collections talks are held on the 3rd Tuesday of each month (February to November).
Have a look at some of the previous talks in the Connecting to collections series.
Connecting to collections 2021