Limited access to some knowledgeFebruary 2nd, 2015 By Lucy Schrader
Universal access to all knowledge
Leading up to last year’s National Digital Forum, we got a special treat: a day to chat with the Internet Archive's founder Brewster Kahle, brainstorm new ideas, and figure out how we can work together more closely. We had a couple of spark-filled sessions on web preservation and digitisation, which he then built on during a panel discussion on the digital future of creative works and his keynote at NDF.
Brewster’s visit has left me stewing on some big questions about open access, copyright, and the role of libraries and archives. In my head that’s coalesced around one significant fact of user behaviour: with today's internet, people already think we have universal access to all knowledge, and act accordingly.
Because the internet's huge, right? It's ubiquitous, and every service, tool, and morsel of information is just a google away.
But that's enormously untrue. For starters:
- A lot of material is still in copyright, or restricted for ethical or cultural reasons
- Colossal amounts of material have never been digitised
- Both primary and secondary sources are behind various paywalls
- Data is often not pristine and ready to use – it's frequently a mess of inconsistencies that makes it hard to work with
- A lot of heritage websites are old and getting creaky, making them hard to find and use
If you take the view that what's online and accessible is the sum of human creation and knowledge, you end up with a poor shadow of what we've actually done since we began writing stuff down. Heading into the future using only this sliver as a foundation will see us repeating ourselves and making old mistakes.
But that's the assumption, and expectation, of most internet users. We do a lot to help people become better researchers and more able users of information (research guides plug!), with tools and processes that can always be improved. But there’s another particular strand to access and expectations that may motivate changes, and meet people’s expectations on the internet today.
Insufficient access to certain amounts of knowledge
To be clear, this post is talking about access to information and material that should be accessible, but isn’t due to technical or procedural barriers. We collect a great deal of material that can’t be released without restriction, due to copyright, cultural or ethical constraints, or the agreements we make with donors. If we don’t abide by those, we’ll either get in some pretty steep legal trouble, or we’ll lose the trust that allows us to build our fantastic and unique collections. Or both.
Before looking at what can be done, here are some problems that still stand between us and the information utopia of making this material as accessible as possible.
Not yet digitised
Most of the items we hold aren't digital. Our online newspapers at Papers Past are only part of the corpus of New Zealand's newspapers. Likewise, we have around 4-5% of our photographic collections online, and only 7,500 items out of our 11 kilometres of manuscripts. That's not even counting published works where copyright takes any consideration of making digitised text available off the table.
Adding to the problem, a lot of what's digitised isn't digital enough. The newspapers on Papers Past are fully transcribed, but a lot of the computer-generated text contains significant errors, making search less useful. A lot of photos are accompanied by descriptions, but few have text equivalents detailed enough to let you find all photos that contain, say, impressive moustaches. Audio and video items take more work again to make them fully searchable. Even completely digital objects like software aren't comprehensible by themselves, especially if you don't have well-commented source code.
Even without digitising everything and generating all possible metadata, the scale and complexity of large bodies of digital items is often a barrier to their use.
With the Internet Archive's help, we regularly take snapshots of the entire New Zealand internet, 200 million pages and multiple terabytes of data per harvest. These have the potential to be a phenomenal resource. However, the data is extremely hard to work with; a problem the Internet Archive has found with their own web archive as well.
We’ve only been able to make this amazing resource accessible within the Library via the Internet Archive’s Wayback Machine. While you can check out what a particular website looked like at a particular time, snapshot by snapshot, it’s still severely limited compared to what solid search and retrieval mechanisms could do with millions of pieces of content.
We need to make these digital collections computationally available. Instead of just serving up web sites, we’d like to provide entire data sets from the web corpus that can be interrogated by specialist researchers, like linguists, historians, and sociologists.
However, there is a singular lack of fully featured ‘middleware’: the tools and processes that make the corpus easily usable by people and their computers. There's real research to be done and knowledge to be gleaned from our web collections. Fixing the middleware problem – a mix of willingness and resourcing – is imperative and potentially a chance to work together with the Internet Archive. Watch this space.
The kind of 'it just works' access people expect of the internet is hampered in several ways:
- Collections are often siloed off from each other on different websites
- Access copies may be small and low-fidelity, or might have lost functionality during collection and preservation
- Mobile access is frequently lacking or missing entirely
Legal and cultural issues are reasonable limitations on access, but with our holdings it's frequently harder than it should be to access items that don’t have those restrictions. We use piles of conflicting usage statements, sometimes applied in a blanket way to items they don’t relate to. This makes figuring out whether something is usable a confusing mess, and keeps us from properly identifying and releasing open material.
Access is also less universal than we often think. The digital divide still persists, from geographic differences in service provision to varying levels of personal and social capability to inconsistent application of accessibility standards. Even if some users can get what they're looking for, not everyone can.
Money and storage
Resourcing is always going to hit us with hard limits. Improving digitisation, preservation, and access brings increased storage and bandwidth requirements, possibly at unsustainable scales.
Though we can easily keep up with all the text produced by New Zealanders today, other media are far weightier. We might want to archive work produced by YouTubers or Twitch streamers, some of whom create thousands of hours of video, and similar video-intensive applications are certainly coming.
Even if the hard drives and network usage remain practical, they might not be affordable in the quantities we need.
Solutions and steps
That's enough grim airing of problems. What’s the Library going to do about it? If we want to remove all non-purposeful restrictions on access, we have a lot of barriers to bring down and concepts to rethink.
One key is in our relationships with others, and the roles we’ll each have to play. The Internet Archive has had a lot of success being the cheeky kid to Very Serious Adults like the Library of Congress and Boston Public Library. There are no doubt areas where we could help you kids break new ground, sometimes by standing back.
So what can we do to invite creative work, support, and generally just give as much as possible to those who have the freedom and resources to act? Our upcoming work on cleaning up usage rights will help this, particularly when it's combined with the DigitalNZ API.
Maybe we should also follow the Internet Archive's lead in providing online infrastructure, like hosting for websites and podcasts, or producing software and hardware tools that help people do their own digitisation. Or at least look at how that can be better supported in New Zealand.
Internally, Brewster’s visit challenges us to rethink a lot of assumptions – at the very least, make sure they're still justified. What we think is and isn't okay about access and usage online might be out of step with the expectations of those who are trying to use our stuff, and those who are donating it too.
What boundaries can we push outward? We are bound by our legislation, but there may be some room within that to try new things.
As we're doing all that deep thinking, can we also figure out other ways to open up the collections right now? It may only be small, patchy releases of content or data, but maybe small and patchy is okay as long as we're sure we're going to keep improving what we offer.
It's on us to demand more from ourselves so we can close the gap between people's perception of what's possible, and where we are now. We'll need support from across the Department, especially in terms of willingness to try out new things, and ongoing feedback from all of you out there about what you need and what’s missing.
Thanks Brewster for an inspiring few days!