4.7 Identifying and describing born digital

We sought to better understand some of the technical aspects around the kinds of digital content being collected, from how born-digital content is being described and managed, to what kind of technical appraisal and processing was taking place in institutions across New Zealand.

Almost 80% of respondents reporting have a collection management system or catalogue for describing archival material, and 60% reported using that system to describe born-digital archival material, however 42% percent of respondents skipped the question of born-digital description entirely.

Only 36 respondents were able to estimate the percent of born-digital material with at least a collection level record, and the response average was 65% with no catalogue record of any kind for born-digital material. From the data, we can estimate that essentially at least half of born-digital material held by institutions is not described and not under sufficient intellectual control. Of those who selected other, all reported some version of “not yet, but maybe in the future.”

Table 11: Collection management and description systems

Do you have a collection management system, catalogue, or database for describing your special collections and archival material?

Answered question: 74. Skipped question: 33.

Answer Percent Count
Yes 79.7% 59
No 13.5% 10
Other 6.8% 5

Table 12: Description of born-digital archival material

Do you use this collection management system, catalogue, or database system to describe your born-digital archival material?

Answered question: 62. Skipped question: 45.

Answer Percent Count
Yes, always 59.7% 37
Yes, sometimes 30.6% 19
No 3.2% 2
Other 6.5% 4

Table 13: Estimated percentage of born-digital materials with at least a collection level record

Estimate the percentage of Born-Digital materials that have at least a collection level record.

Answered question: 36. Skipped question: 71.

Answer Average Count
No catalogue records of any kidn 65.9% 14
Print catalogue record only 18.5% 10
Online catalogue record 59.1% 25
Catalogued as part of larger archival and manuscript collections 50.9% 17

4.8 Born digital processing and workflows

We asked a number of questions about identifying, inventorying and processing born-digital material. Encouragingly only 31% of respondents reported that they had born-digital material they could not identify. For most it seems less a case of not knowing what they have and more about not knowing how to manage or preserve what they do have, for example “we have an overall understanding of the preservation of born-digital, but limited technical skills and equipment” or “we have material we cannot manage and preserve to correct standards due to institutional resources and digital architecture.” There were also a small number of responses that were unable to answer because they have not yet done an inventory of collections to identify and check all content.

Figure 5: Unidentifiable or unmanageable born digital

Does your institution have born digital material that you cannot identify or do not know how to manage and preserve? 31% say yes, and 60% say no.

For those who were able to report on their born-digital material, respondents overwhelmingly had material on physical digital media carriers such as floppy disks, CDs, DVDs, and external hard drives.

Table 14: Institutions with born-digital archival material on physical digital media

Does your institution have born-digital archival material on physical digital media such as floppy disks, CDs, DVDs, external hard drives, etc?

Answered question: 71. Skipped question: 36.

Answer Percent Count
Yes 88.7% 63
No 5.6% 4
Don't know 2.8% 2
Other 2.8% 2

Respondents reported that this material is spread across different formats, with over 90% reporting that they had CDs and DVDs. A large percentage also reported holding material on obsolete media such as 5.25” floppies (33%) and 3.5” floppies (59%). But born digital lives on a diversity of media formats, as survey respondents also reported holding zip disks, data tapes, video and audio tape, and in one case an external server.

Figure 6: Types of physical digital media held

What kinds of physical media do you currently hold? Over 90% or organisations hold CDs and DVDs. Around half hold 3.5 inch floppy disks, USB stick drives, external USB drives, and internal hard drives. 20-30% hold 5.25 inch floppy disks and ZIP disks.

Most also estimated that the majority of their born digital physical media dates to the last two decades. However respondents did note significant proportions of material from the pre-1990s and 1990s periods.

Figure 7: Estimated percentage of physical digital media by date range

Please estimate the percentage of physical media that you hold from each data range. One third of material is pre 1990s, 20% is 1990s, 30% is 2000s, and 25% is 2010s.

Physical digital storage media, including older media already headed toward obsolescence such as floppy disks, and optical disks like CDs and DVDs, are fragile, unstable, and unsuitable for long-term management and preservation of data. Current practice recommends content on these types of media is safest being transferred to a more stable and secure storage medium. When it comes to managing these physical media carriers and the content held on them, only 34% of participants reported currently transferring content to another more secure location or medium (Figure 8). This represents a substantial risk to New Zealand’s digital heritage. The chance of losing collections due to media, hardware, or software failure while this content remains on physical media carriers only increases over time. A first priority for organisations looking to begin a born-digital management programme should be to transfer the contents of this media to a safer storage medium.

Figure 8: Percentage of institutions transferring content from physical digital media

Are you currently transferring born digital archival content from physical media? One third say yes, 28% say not yet, but planning to, 24% say no.

For those not currently transferring content off physical digital media, most (72.7%) reported not doing so because they lacked the staff time, and another 50% reported that they did not currently have the proper technology to read and transfer content (Table 15). Comments to this question included reporting: “a backlog of material that has never been transferred,” “would like to, but lack training, direction, equipment, and staff,” and “our storage space is limited and not fully secure.” The results suggest a significant institutional prioritisation of this work will be necessary in order to begin caring for this born-digital archival material.

Table 15: Reasons for not transferring from physical digital media

If you are not transferring born-digital archival content from physical media, why not? Check all that apply.

Answered question: 44. Skipped question: 63.

Answer Percent Count
Do not have digital storage available 15.9% 7
Do not know how to do the transfer 15.9% 7
Do not have proper technology to read and transfer content 50% 22
Lack of staff time 72.7% 32
This is not a priority for our institution 40.9% 18
This is not considered a high enough risk for our institution 11.4% 5
Other 20.5% 9

The results show that clearly most institutions are not systematically processing and managing their born-digital content. When asked about specific digital processing tools almost half of survey takers skipped the question entirely, and very few reported using any of the tools currently available. The comments under these questions included: “don’t know,” “unknown,” “no idea” and “never heard of any of them so we obviously aren’t doing anything at all!” While a few institutions do report managing transfers and having systems in place to track the authenticity and integrity of their born-digital content, the evidence suggests the majority of institutions could use guidance about available tools and training on when and how to implement them. As two respondents noted, “It would be great to have information about tools for processing workflow—I don’t know about these yet” and “guidelines in this area would be very helpful and would enable faster progress for us in this area.”

Table 16: Tools using in processing workflow

What tools are you currently using or plan to use in your processing workflow? Please check all that apply.

Answered question: 58. Skipped question: 49.

Answer Percent Count
BagIt 1.7% 1
BitCurator 3.4% 2
DROID 8.6% 5
Exactly 1.7% 1
FITS 1.7% 1
FTK (Forensic Tool Kit) or other commercial digital forensic software 0% 0
FTK Imager 1.7% 1
Forensic writeblockers 3.4% 2
JHOVE 5.2% 3
Commercial software for file transfer and verification 3.4% 2
Open source or free software for file transfer and verification 20.7% 12
Siegfried 1.7% 1
No specialised tools yet 63.8% 37
Other 19% 11

4.9 Staffing

Table 16: How many positions are working with born digital collections

How many staff positions are responsible for working, either full or part of their time, with born digital collections?

Answered question: 69. Skipped question: 38.

Answer Percent Count
None 10.1% 7
Less than one 33.3% 23
One 10.1% 7
2-5 37.7% 26
6 or more 2.9% 2
Other 5.8% 4

Throughout the survey issues of staffing and training emerged as a common theme. In designing the survey we wanted to see how institutions rated their current staff expertise and training and what gaps participants would identify in knowledge and skills. Over 50% of respondents noted that they currently have one or fewer staff positions responsible for working with born-digital collections. When asked about future staffing needs, the three highest ranked future needs were: basic introductory training in digital collecting in practice and theory, development of workflows, and training in specific tools or processes.

Table 17: Ranked future staff training and requirements

Thinking about the future, please rank from 1 to 5, with 1 being highest and 5 lowest, the staff requirements necessary to meet institutional needs for collecting and managing born-digital archival material.

Answered question: 63. Skipped question: 44.

Answer 1 2 3 4 5 Average Count
Basic introduction to digital collecting theory 14 10 10 7 12 2.87 53
Advanced technical training and development for staff 8 14 10 13 10 3.05 55
Development of workflows for managing born digital 19 11 12 10 4 2.45 56
Training on specific tools and processes 7 14 17 15 3 2.88 56
The creation of new roles to work specifically with born digital 12 6 7 11 25 3.51 61

When asked in more detail about specific areas of education and training, (Figure 9) those selected by the greatest number of respondents included born-digital transfers and processing (71.4%), digital preservation (69.8%), born-digital collecting and appraisal (69.1%), and arrangement and description (60.3%). Notably seven of the ten possible training areas were selected by over fifty percent of respondents as areas where training is needed. While one response noted that “at present staff have sufficient training for our needs,” other comments included “probably everything to be quite honest,” and “need more everywhere.” Participants also identified lack of staff and funding more generally being an impediment to developing staff experience with born-digital content. Open-ended comments echoed this theme: “If I am being realistic - we won't be thinking about employing anyone specifically for born digital material because we don't have the funding. We will be employing a 2 day a week Archives assistant but they will be doing everything!!” and “We are severely understaffed and underfunded. We do not have the time nor the resources to spend too much time on born digital material when we are struggling to managed the physical collection we already have.”

Figure 9: Identified areas of staff education and training

In which areas do staff need more education and training? Two thirds of respondents say born digital collecting and appraisal, transfers and processing, arrangement and description, and digital preservation. Other main areas are knowledge of obsolete technology, file format identification, and metadata standards.

4.10 Access

The survey asked a number of questions about providing access to born-digital collections. 70% of respondents reported users requesting access at least some of the time to born digital collections.

Do users currently request access to your institution's born digital archival materials? Nearly a quarter say yes, often. 29% say yes, occasionally, and 20% say yes, rarely. 20% say no, but we anticipate use in the future, and less that 10% say no, never.

Figure 10: User requests for access to born-digital archival materials

While 70% of survey respondents reported requests for access to born-digital content, 78% reported that they do or would provide access to this kind of material.

Figure 11: Provide access to born-digital archival materials

Do you provide access to any of your born digital archival material? Nearly 80% say yes, the remainder say no.

How much born digital archival material was available varied, from less than 10% for 23% of those answering, to 100% for 20% of responders. Respondents provided clarifying details in the comments. Often it was the case that access to collections depended on the rights associated with the item or collection. Some collections were available and others were not, other’s provided information about items through online catalogues and collection management systems, but not the items themselves. For a number of museum respondents, items were accessible only when included in exhibitions.

Figure 12: Access to born-digital material for researchers

For born digital archival material available to researchers, how is that access provided? Nearly 40% provide access by permission only, 35% provide unrestricted access through the internet, around a quarter provide restricted access in a reading room, and around 10% provide unrestricted access in a reading room. 10% provide restricted access through a virtual reading room.

If material was currently unavailable to researchers, the most likely reasons were because it was not yet processed; lack of technical infrastructure to provide access; or privacy and confidentiality concerns. Copyright, lack of description, and lack of staff resources also rated highly as reasons researchers could not access collections.

Table 18: Reasons for not providing access to born-digital archival materials

If you do not provide access to some or all of your born-digital archival materials, why not? Please check all that apply.

Answered question: 62. Skipped question: 45.

Answer Percent Count
Material not processed 50% 31
Do not have the technical infrastructure 45.2% 28
Privacy and confidentiality concerns 45.2% 28
Copyright concerns 40.3% 25
Material not described or catalogued 37.1% 23
No staff resources or expertise to facilitate access 37.1% 23
Donor restrictions on access 32.3% 20
Security concerns 27.4% 17
Legal or statutory restrictions on access 22.6% 14
Other 12.9% 8

4.11 Future requirements and current challenges

Finally we asked participants to think about future requirements and current challenges. We asked participants to rank their requirements for collecting, managing, and preserving born digital archival material five to ten years into the future. Most ranked building staff expertise with born digital highest, closely followed by technical infrastructure. These results echo the responses throughout the survey that have identified staffing, expertise, and infrastructure as key to moving forward with born digital collecting and management.

Table 19: Institutional future needs ranked

Thinking about your institution in the next 5-10 years, please rank your requirements for collecting, managing, and preserving born-digital archival material from 1 to 5, with 1 being of highest importance and 5 lowest.

Answered question: 65. Skipped question: 42.

Answer 1 2 3 4 5 Average Count
Building staff expertise with born digital in particular 18 14 15 10 5 2.52 62
Technical infrastructure 10 15 15 12 5 2.77 57
New or increased staffing 13 12 12 7 18 3.08 62
Increased or targeted funding 10 11 11 12 12 3.09 56
Institutional prioritisation of born digital management 10 6 7 13 16 3.37 52

The two final open ended questions also reflected this trend, with most describing their biggest challenges as staffing and staff expertise, technical infrastructure and developing institutional support, and business processes and workflows. Figure 13 (below) illustrates our categorisation of participant’s top three identified challenges. The results show that staffing, technology and institutional support all rank highly. Also high on many of the respondents list was finding the time to plan and implement new processes and procedures for born-digital collecting.

Figure 13: Top challenges identified by institutions

Top institutional challenges. 45% identified staffing, 30% identified technology.

Finally there was a strong desire for collaboration and information sharing among respondents, illustrating again the real need for training and guidance as well as the potential to build a community of practice in this area. Some of the final comments include:

I would welcome a collaborative approach to knowledge sharing, planning and collecting digital-born materials between New Zealand collecting institutions working in this space. [Name redacted] hopes to begin piloting targeted collecting of digital-born manuscript and ephemera materials by the end of 2016, to work out what's involved - particularly looking at digital materials/iterations which are progressively replacing physical formats like letters, diaries, invitations etc.

A challenge for us is changing the focus of the library from passively collecting published material to being actively involved in the creation of born digital local content/local history e.g. training staff/volunteers to collect oral histories.

A challenge is shifting perceptions that born digital collecting is credible content to collect.

Institutional prioritisation of born digital management & policies to support this is a challenge.

We need a new paradigm - current structure based on analogue environment.

Support from national institutions for digital repositories, practice, training and financially viable solutions for born digital material is lacking.

Born digital content is important, and will become a priority... once we have better control over the physical collection. As a small regional library, we would like to benefit from the research/processes of larger, municipal libraries with greater resources in this area.