Caring for taonga — Digital collections | Āta tiakina ngā kohinga matihiko
Digital material depends on technology and is inherently fragile. If you want to access your digital files into the future, you will need to think now about digital preservation.
Digital preservation is the active, ongoing management of digital files to ensure your digital content continues to be accessible. This may include copying files from their original locations to a dedicated digital storage area, ensuring that you have backup copies, and checking that your digital files can still be opened.
Digital file basics
A digital file is stored information that is accessible to a computer program. Computer operating systems and application software interpret the data as text characters, image pixels, or audio samples.
A file format is a standard arrangement of data within a digital file. A format specifies how bits are structured and encoded (written) to represent information (sound, text, images). Examples of file formats include, TIFF, JPEG, WAV, MP3, DOCX.
Digital files need hardware, the computer device, and software, the set of instructions run on the device in order to read the digital file.
This dependence on technology means digital content is continually at risk. Rapid technological changes mean digital files created just a few years ago may no longer be accessible as new technology appears, and older hardware and software become obsolete and disappear.
Born-digital material is anything that is created digitally, without an analogue original. Examples include word processing documents, digital photographs, websites, social media materials, digital audio and video files.
Digitised files are items that began as a physical object and were converted to a digital format. Examples include PDF scans of original meeting minutes; scanned photograph from the original.
The long-term care of born-digital and digitised files is the same; however, it is useful to know which files have been digitised and whether it is a faithful version of the original.
If you aren't sure of some of the terms we use have a look at in the glossaries in the resources section.
Digital preservation principles, policies and priorities
Digital preservation is a combination of principles, policies and actions to ensure that content is accessible from one day to the next, and that you can provide future access.
Preservation principles outline the preservation values of your organisation, the policies are the set of guidelines that provides the mandate for you to carry out digital preservation work. Both will inform your preservation priorities and actions.
These are interrelated, so some actions will be required to understand what your priorities should be, andbe and help inform and develop your policies.
The key principles for working with digital collections are:
- Do no harm to the physical media or digital content
- Regularly and systematically backup your files
- Regularly check your files
- Be consistent and document your process
To ensure your digital collections continue to be preserved and accessible means ensuring adequate dedicated and secure storage and financial resource to meet the on-going costs of maintaining your preservation storage.
The Sustainable Heritage Network has resources online that can help you determine how much storage you might need.
To understand what your storage needs are consider:
- Who will have access to your digital storage? (the fewer people the better)
- What locations will you use for the primary files, vs any working files or access copies (3-2-1 )
- What is your backup policy and frequency of backups?
- How often will you check that you can still open and read your files, that the files are stable, and that any necessary software is available?
The Sustainable Heritage Network has a range of useful resources, including creating policies for digital preservation.
Know and understand where your data is stored
Data sovereignty generally refers to understanding that data is subject to the laws of the country in which it is stored. Māori data sovereignty recognises that Māori data should be subject to Māori governance, and ensures that data for and about Māori can be safeguarded and protected.
Many cloud storage providers host their data on servers overseas, so if the location of your data is important to you, make sure you are aware of where your data will be held, and what protections are in place.
Digital file inventory
To ensure the preservation of your digital files you need to know what you have. Knowing what you have helps you make informed decisions and determine your priorities. This will take some time.
Approach this step methodically and take time to ensure you have accurate information to manage your digital collections.
At this stage of information gathering, it is not necessary to run or try to open the files. Record any documentation and contextual information about the digital content, such as information from labels.
For now, leave the physical digital carriers where they are. The carriers can be relocated once you have systems and processes in place that can track and document movement.
Action: Create a list of your digital materials and physical media carriers
Gather information about what digital files you have and where they are stored. Use a simple spreadsheet to make a list. The two examples below show the kind of information you might find useful.
Example table of where files are currently stored
|Do you have files on a server?||✔||1|
|Do you have a collection management system?||✔|
|Do you have cloud storage?||✔||2 - Google Drive & Dropbox|
|Do you have floppy disks?||✔||34|
|Do you have CDs?||✔||12|
|Do you have USBs?||✔||15|
|Do you have memory cards? Compact flash cards or SD cards for example?||✔||5|
|Do you have computer hard drives and/or laptops?||✔||4 total (2 computers, 2 laptops)|
Example table capturing information about your digitial material
Below is an example of the fields you might use to capture relevant information from your digital material.
|Physical digital carrier||Location||Labels or written information (as written)||Number|
|5.25 inch floppy disks||Shelf Four, Box 22 with spiral bound Tribunal reports||Tribunal research ’87||2|
|3.5 inch floppy disk||Shelf 2, unlabelled box with physical folders of papers||AGM and meeting minutes 1990-92||8|
|External hard drive||Unlabelled box with copies of Māori land court minute papers||WD 1 TB||1|
|CD-ROMs||Drawer 4, Filing cabinet, Admin office||Marae 100th 1994||2|
|USB drive||Drawer 2, Filing cabinet, Admin office with draft newsletters||Apacer 32 GB||1|
|USB drives||Shelf 1, Box 4 with Trust Holdings files||None||2|
Dedicated digital storage space
You will need dedicated, secure storage for your digital materials before you read and copy your media. You may not have a full digital preservation system in place, but there are things you can do short to medium term to manage and care for your digital collection.
Action: Establish dedicated digital storage space
Types of storage include physical servers or hard drives, a network location, or virtual machines/cloud servers. You may also wish to invest in a digital preservation or digital asset management system.
Unique digital content should never exist in just one location. It is best practice to have at least three copies of files in multiple locations.
How much digital storage will I need?
- The size of files depends on the format type and the file quality.
- Plan for 1-2 GB of data per hour of audio files.
- Video files may be 100 GB or more of data per hour.
Action: Create a folder for each digital collection
Now that you’ve determined where your digital taonga are going to be stored generally, you can begin thinking about how best to organise your digital collections.
We recommend creating an electronic folder for each collection, with a consistent naming structure. This might include the name of the collection, an accession number, the year the material was received, or other identifying information.
Assess your digital files
More ‘active’ assessment of the digital files will be required to understand what types of files you have. This includes reading the physical media, scanning for viruses, documenting what is on each media carrier and copying files to a secure location.
Action: Create file list/content inventory
Create a file list, or content inventory, for each piece of media. This will allow you to better understand and document the context of the files. Some types of metadata, such as file name, creation date, last modified date, and more can be extracted from the digital files themselves using file profiling tools.
The file list allows you to record collection metadata both for your own processing and management of the digital content, and for future researchers' information.
Be aware that not all software can recognise macrons in file names, so use care when naming files.
The file list should include metadata such as:
|File path||My Documents\Events\Tui_27th_birthday_ photos\Keke.jpg|
|Last modified date||19/03/2016|
|Note||Duplicate of Keke.jpg|
Action: Read your physical media
Principle: Do no harm to the physical media or digital content.
Digital files and carriers are fragile, and data can be easily modified, deleted or corrupted. Just opening a file can affect the metadata and encoded information in a file, such as time and date stamps. When you are handling your digital materials:
- Always handle in a clear and clean workspace.
- Make sure your hands are clean and dry before handling your media. Media are damaged by dust, dirt and oil from fingerprints.
- Avoid having other electronic devices nearby that can interfere with signals.
- Avoid placing media directly beside devices containing magnets (such as headphones or speakers) as magnetic fields can damage magnetic media like floppy disks.
- Avoid touching surfaces of optical discs and floppy disks, contact pins of SD cards.
- Avoid writing directly on the media or using adhesive labels.
- Ideally place each item in a conservation quality paper or polypropylene bag and label the enclosure rather than the physical media.
Some equipment or tools may be needed to read your physical media such as:
- a 3.5 inch floppy disk to USB drive
- an external optical disc drive for CDs and DVDs
- A write-blocker when working with USB drives or external hard drives
Scan for viruses
Once you have inserted the disk or plugged in your media, scan for any viruses, particularly if you are introducing content to a networked or corporate machine. If your computer doesn’t already have anti-virus software on it, there are a variety of commercially available options.
Make copies of original files
Copy digital files from the original media to the secure storage location, making sure to record any documentation and contextual information about the digital content, such as information from labels.
Action: Create file directory and folders to copy digital files
In your dedicated digital storage space, create a folder for each digital collections.
Create a new directory or folder within your collection folder for each physical media carrier you are copying.
Retain the original order of the folders and file structure, original file names, as well as information captured when creating your file list, like file creation, and last modified date when copying digital content.
Action: Copy digital files
Copying individual files from a media carrier or other location is often the easiest way of transferring files, using a computer’s basic ‘copy and paste’ functionality. There are also a range of file transfer tools available.
Prioritise older media carriers like floppy discs and CDs for copying first to secure storage. The older the media carrier the more likely it is to become obsolete and unreadable.
Action: Establish fixity (checksums)
File fixity in digital preservation is the process of recording that digital files are un-changed. It is a bedrock of digital preservation to establish fixity to maintain the integrity of digital content and ensure that digital files have not been altered or degraded during the copying process or over time.
Checksum is a unique sequence of letters and numbers (e.g. F4A711A70A21D2BBF0FA402D7F5375EF) generated by a software application, that is used to check files and data for corruption or change. Common checksum types are named for the algorithm that produces them: MD5, SHA1 or SHA256.
Fixity checks can be done by computing checksums for files and comparing them to a checksum that is already stored – if the checksum is the same, nothing has changed in the file.
You can use tools (like DROID) to generate checksums for a file before you copy it, and then check that the copy still has the same checksum as the original. Checksums are a string of numbers that acts as a digital fingerprint for each file
Regularly generating checksums against files and comparing to previous versions will identify any files that have changed over time. While a changed checksum can alert you to a change, it cannot tell you exactly what has changed or reverse the change. It might indicate that a digital file has not properly downloaded, or hard drive problems might have caused an error or corruption to the file.
Digital Record Object Idenfification (DROID) — software used to profile digital collections. It can list files, identify file formats and generate checksums for fixity. Created by The National Archives UK and based on their file format database PRONOM.
Make some decisions about long term preservation
Use the information you have collected doing your digital file inventory and thinking about storage space to make decisions about the long-term preservation of your digital content.
Action: Identify file formats and other technical metadata
Knowing what kind of file formats you have is the first step in the long-term preservation of your digital content.
For preservation and long-term management of digital files, it is essential to understand the file types in your collections. Often file extensions such as .doc, .pdf, .jpg are used to identify file types, but they may not always be correct. Older files, as well as contemporary files created on Macintosh computers may not have file extensions.
The National Archives UK has developed a free software tool, Digital Record Object Identification (DROID), that can tell you what file versions you have, their age and size, and when they were last changed, and generate checksums to help identify any duplicates.
Action: Prioritise your digital materials
Once you know what digital materials you have and where they are, you can begin to prioritise your collections to determine where best to deploy your efforts, time, and money.
The volume of digital material being created is constantly increasing. It is impossible to save every digital file ever created, so prioritising content for preservation is essential.
Be selective about the digital content you decide to retain. It may be that only certain files or types of files are required for long term preservation. It is better to have a smaller set of high-quality digital files, than millions of files that have little or no intellectual or cultural value.
Priorities to consider about content include:
- value, importance, or function of material to your organisation
- uniqueness of material
- level of current and future use
- other organisation-specific conditions/policies
Priorities consider about the media carrier include danger of loss from:
- media degradation due to age or condition of carrier media
- obsolescence, software or hardware incompatibility
Backup, backup, backup
Principle: Regularly and systematically backup your files
Regular backups and regular checks to ensure that files have not been changed or become corrupted over time is critical.
Action: Implement the 3-2-1 rule of backing up
Implement the 3-2-1 rule of backing up important digital materials.
Keep three copies of your data in two separate, secure locations, one of which should be offsite.
- Your primary files should be separate from your backups.
- Your two backups should be on two separate media carriers.
- Use different brands/types of media carrier so there is less chance of failing at the same time.
- Make sure that one copy of your digital files is kept offsite.
A cloud storage service is also an option — just be certain that you are using a cloud backup service and not a file-syncing tool. It is also important to consider the integrity of your data, for example, understanding where and how a cloud storage service provider stores and manages your digital data.
Establish and document which set are the primary files , and which are the backups , in order to avoid confusion.
You can do this by using meaningful folder naming confentions for sets of files and backups, for example,
Action: Regularly check your digital files
Digital files can degrade over time and if the software or hardware required to render them is obsolete, the files may no longer be accessible with contemporary software.
Backing up your digital files and documentation is also an integral part of developing your organisation’s disaster preparedness and response planning.
In addition to regular backups, check that you can still open older files. Always make a copy of the file before opening.
You can tell if a file has been changed or corrupted by generating checksums for digital files and comparing them over time.
Set a schedule to run regular fixity checks over files to ensure that they haven’t changed over time. An easy way to do this is to re-run DROID over your files, and files. Then compare the checksum output to the checksums when they were first created. You may wish to do this annually, or more frequently as resourcing allows.
World Digital Preservation Day is the first Thursday in November each year. You could set a calendar reminder to check your digital files each November.
Preservation storage physical handling
Errors and failures can be caused by improper physical handling. A clean, dust-free environment away from sources of damp, heat, sunlight and magnetic fields should be maintained. Storing each item in its own enclosure, on a shelf in a well-insulated room will help maintain a stable environment and reduce damaging fluctuations. See below for advice on storing digital collections.
Magnetic media — Floppy disks
- Store away from magnetic fields — such as speakers, televisions and PCs.
- Store discs upright in individual boxes constructed from polypropylene as they provide better protection than paper sleeves.
- Boxed discs can be stored upright on shelves or within boxes on shelves.
- Label the boxes rather than the disc.
- Avoid adhesive labels as the adhesive can migrate and damage the magnetic media.
Optical media — CDs, DVDs
- Store discs vertically in jewel cases. Ideally the jewel cases should be constructed from polypropylene.
- If labelling discs is necessary, use a water-based permanent marker on the clear inner hub.
- Avoid applying adhesive labels to the discs.
Solid state media — USB sticks, memory cards
Flash drives, pen drives, compact flash and SD cards are solid state with no moveable parts. The inner workings are primarily a printed circuit board soldered into place.
- Always replace the cap on USB sticks.
- Use polyethylene ‘ziplock’ bags to individually house drives or retain in original cases.
Action: Establish physical and intellectual control over physical media carriers
You will need dedicated storage locations for physical media carriers such as CD-ROMs and USB drives, as well as secure location(s) for storing the digital files themselves. You will also need to establish a set location for supporting contextual information or policies for intellectually managing digital files.
Physical media carriers can damage or be damaged by paper and other types of materials they may be stored with. It is recommended that physical media carriers be stored in separate housing, where possible.
Once the inventory of floppy disks, CDs and DVDs, external or flash drives has been completed you can remove the physical media from their original folder or box location and arrange in a dedicated physical media storage location. Include a separation sheet or other documentation to record where the media came from. A copy of this can remain inside the box.
Document and be consistent
Principle: Be consistent and document your process
It can take a long time to move through the workflow and fully process digital collections. It is essential that work is documented, so everyone across your organisation understands what has happened, and can make informed decisions for their step in the workflow.
This is important whether you are one staff member doing the entire workflow, or if various people do different parts. It also helps understand why a decision was made if it was documented at the time, so you can re-evaluate if you need to.
Action: Establish guidelines and apply them consistently.
Establish guidelines to manage your digital collections that are relevant and applicable to your collection and organisation. Apply them consistently.
Have someone else review the guidelines to ensure that your process is clear, understandable, and easily reusable. Ensure that all staff and any volunteers who work with digital materials understand handling principles for digital materials.
Access to your digital collection
Once you have the files safely stored you need to make them discoverable to potential users. This may be a simple as a PDF file or spreadsheet listing digital collections and their locations, or a full digital asset management system which links the digital object to its description.
To provide access to the digital collections requires:
- a description of the material so it can be found
- an access copy of the original digital file
Copyright and other considerations
Digital files are often contemporary content, which means there may be copyright considerations relating to access and use. Digital materials, particularly social media content, may have multiple creators and contributors. Ensure that access to digital content does not compromise the rights and privacy of those represented in the collections, and that any conditions for or restrictions on access are documented.
Copyright and museums — Te Papa National Services Te Paerangi
Action: Describe or catalogue digital collections
Create catalogue records or descriptive finding aids to provide information relating to digital collections. Provide an overview of the content, quantities of material, access and use conditions, technical requirements, and other information to facilitate access to the digital material. Be transparent about processing of digital collections and decisions to not retain materials. If materials are awaiting processing and not yet available for use, note this as well.
Action: Create access copies
Keep your original files secure by creating access or working copies of digital content for staff and researcher use. Some types of files can be altered each time the file is opened, so it is safest to always work on copies.
If the original file format is unusual, or requires specialised software, it is useful to create preservation access copies for digital files.
For example, you may wish to create a preservation copy of a file in a more widely used format. This means that the informational content will still be accessible even if the original software becomes obsolete or ceases to exist.
When making preservation access copies, keep the original file in its original format as the primary file, and make a new copy in another format for access. Document when and why the access copy was made.
The most common issue when working with digital materials is incompatibility. This may be a result of incompatibility of hardware, software, or operating systems, and is often a result of obsolescence.
My media won’t read
Hard drives may not always be compatible with a computer if they have been formatted for a different operating system, generally Mac or PC. Creative communities tend to work on Mac computers, so if your PC cannot read a drive, it is helpful to try reading it on a Mac.
Optical discs such as CDs and DVDs may not read if they have been damaged or scratched, or if the disc has degraded. The average lifespan of an optical disc is 10-25 years, so it is important to copy content from optical discs.
File names/file paths are too long to copy
Sometimes if a transfer fails you might get an error saying that the folder path, the label of the location of a file within a directory, is too long.
Mac operating systems do not have a limit on the length of file paths, but Windows systems have a 256-character limit. You may need to change the file path or file name to be shorter in order to copy the content. Ensure that any changes to file names or file paths are documented first.
File name has special characters that prevent copying
If a file name has special characters that prevent copying, you may need to remove the special character and record the original name. Be aware that some operating systems have difficulty with macrons in file names.
A word processing document isn’t readable
You may experience encoding issues with older word processing files where the text doesn’t render properly to create a file which is readable. Sometimes trying the files in different versions of software or viewing them in a text editor, may allow you to read at least the text of the document.
If you have questions that aren’t covered here, or just want to talk your process through with someone, please contact the National Preservation Office, email email@example.com
Ngā rauemi | Resources
Australasia Preserves — A digital preservation community of practice
Demystifying Born Digital — This project helps library and archive staff gain the confidence necessary for taking initial steps to launch a born-digital management program that can be scaled up over time.
Digital Preservation Coalition Handbook — A key knowledge base for digital preservation, peer-reviewed and freely accessible to all.
Digital Preservation Handbook__Glossary — Glossary of definitions and acronyms about digital preservation.
Digital Preservation at the National Library of New Zealand — preserving New Zealand digital heritage.
Barrera-Gomez, Julianna and Erway, Ricky, OCLC Research Report, ‘Walk This Way: Detailed Steps for Transferring Born-Digital Content from Media You can Read In-house. (2013) — This report collects the assembled wisdom of experienced practitioners to help those with less experience make appropriate choices in gaining control of born-digital content.
Erway, Ricky, OCLC Research Report, ‘You’ve got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media’ (2012) — This report is intended for anyone who doesn’t know where to begin in managing born-digital materials.
Federal Agencies Digital Guidelines Initiative Glossary — Glossary of terms about digital preservation.
Te Mana Raraunga Māori Data Sovereignty Network — Te Mana Raraunga as the Māori Data Sovereignty Network and advocate for Māori rights and interests in data to be protected as the world moves into an increasingly open data environment.
Sustainable Heritage Network — Digital Stewardship Curriculum Self-guided modules, particularly for those starting out in digitisation and digital preservation, including developing policies