Library tech

Books on screen

July 22nd, 2013, By Greig Roulston

Books, like you’ve frequently seen them before!

After a whole lot of trial (and a little error), the Library’s book digitisation pilot programme has successfully produced our first ebooks. We want to know what you think of them so we can dive into digitising a whole lot more.

We’ve done this with Papers Past and AtoJs Online, digitising the full text of collection items to give you online access to huge piles of our documentary heritage. Whole books seemed like a good next step.

36 titles made up of 12,700 pages have been digitised and turned into pdf and epub files. We’ve loaded them into the National Digital Heritage Archive for preservation, but you can grab them all from the bottom of this post.

Man operating large computer, 1975.

The digitisation team get to work. Man operating large computer, 1975. Ref: 1/2-216797-F

I’m really looking for something on government volcanoes

The books cover a range of topics, from volcanic eruptions to government reports on famous plane crashes, and run between 42 and 920 pages.

Books selected had to be out of copyright – or works that we could easily get clearance for – and we tried to avoid reproducing someone else’s digitisation work. We stuck to New Zealand books, and made an effort to find material about Maori and the Pacific, as well as government publications.

The end result is a rather random assortment of books, ranging from 1857 (The rise and progress of Australia, Tasmania and New Zealand) to as late as 1988 (The April Report: Report of the Royal Commission on Social Policy Vol 1-5). I’m sure there’s something for everyone?

From page to a whole lot of file types

Although there was a lot about this project that was new to us, we’ve learned a lot from the project to digitise the Appendices to the Journals of the House of Representatives – thanks to that, we could follow a similar workflow.

We started with surplus copies of these books that had been flagged as extras we could safely take apart. (Never fear, in the future we won’t have to unbind books to digitise them!)

The pages were then sent off for scanning, and run through optical character recognition to create the text files. That gave us tiff files for the images, xml for the text, and pdfs and epubs for access copies.

At first, we stuck closely to the specifications for the AtoJs work, which included using bitonal (black and white) images in the pdfs. However, the resulting scans had a lot of bleed through – especially on blank pages, you could easily see the image from the other side of the paper.

We gave colour pdfs a go, and happily that gave us a far cleaner scan, as well as being much closer to the original book. The downside is a much larger file.

A bitonal scan, alongside the colour version.

A bitonal scan, alongside the colour version.

He ain’t heavy, he’s my epub

We ran into trouble again with the epub versions – they were huge, and somehow the largest one reached 90MB! To fix this, we experimented with several kinds of optimisation, but the best result came when we optimised the images for the web, which brought that massive 90MB down to just 12.

Unfortunately, that’s still not good enough if we want to make these books as easy to access as possible. Amazon won’t allow free downloads for the Kindle to be any more than 3MB, for example.

Until there’s a better way of presenting tabular data of the kind in these books, we’ll probably have to stick with including a whole lot of images.

Wellington Stock Exchange 'Stockmaster' computer, 1970.

You get really sick of hitting these keys by page 12,000. Wellington Stock Exchange 'Stockmaster' computer, 1970. Ref: EP/1970/4278/20-F

The NatLib Independent Press

We ran into more confusion when we tried to tackle the issue of ISBN numbers. At the start, we wanted a number for each epub/pdf combo, but there were a couple of problems with that. Firstly, each format that a book is released in needs its own ISBN number.

But the larger issue is that we were reluctant to become publishers in our own right – this didn’t seem to make sense in terms of what the content actually is, and it could have implications for later digitisation projects.

The final outcome is that we have created reproductions of these books in digital formats – we don’t own them or have a stake in them. Instead, we’ve created a new avenue of access to the content. Your usual library stuff, really.

Finding and the future

Now that we’ve made these, we need to make sure they’ll be findable. We’ve created catalogue records for these books in the new RDA standard, with one record covering all the digital formats. The ebooks have been loaded into the NDHA at attached to the records, so people looking them up will be able to get the files. Each record has multiple MARC 347 fields, one for each digital format.

Eventually, we hope to have all this content (and more) loaded into a fulltext searchable platform like Papers Past, but that’s still looking to be a while away.

We’re hoping to complete a second pilot project during the 2013/2014 year, which will digitise from bound volumes and test a different workflow.

How’s our digitising?

As this is a pilot project we’re interested in hearing your thoughts and comments. Here’s a few questions to get you thinking, but feel free to get in touch with any thoughts or queries. Your feedback will help us design the second pilot run and later work.

  • Do the pdfs need to be in colour? Is the additional file size worth it to get a sense of what the actual book looked like, or is the textual content and portability more important?

  • Have we picked the right formats? Do the ones we’ve picked add to the usability or value of the digitised items?

  • Would this kind of content be better off sitting in a fulltext searchable environment, like Papers Past or AtoJsOnline? Are the epubs better as additional content rather then the main event?

  • Do you gather ebooks from a variety of sources, or just use the ecosystems of your chosen platform? Should we try to get these books into the Nook, Kobo, Kindle, and other ecosystems?

If you have any thoughts, comments or ideas on more of this content being digitised please do let us know. Send any feedback or comments to Greig Roulston or Sam Minchin, or leave a comment here.


Download books from the digitisation pilot!

View the pdf versions of these books through the NDHA, or download the epub directly for use with your e-reader.

Adventures in Geyserland: life in New Zealand's thermal regions, including the story of the Tarawera eruption and the destruction of the famous terraces of Rotomahana

Alfred Warbrick. Published by Reed in Dunedin, 1934

Catalogue record

Download this book

Download the epub (979KB) | View the pdf (21MB)


The April report: report of the Royal Commission on Social Policy

Published by the New Zealand Royal Commission on Social Policy in Wellington, 1988

Catalogue record

Download this book

Volume 1: New Zealand TodayDownload the epub (12.9MB) | View the pdf (159MB)

Volume 2: Future DirectionsDownload the epub (5MB) | View the pdf (199MB)

Volume 3, part 1: Future Directions, Associated PapersDownload the epub (3.5MB) | View the pdf (166MB)

Volume 3, part 2: Future Directions, Associated PapersDownload the epub (4.7MB) | View the pdf (188MB)

Volume 4: Social PerspectivesDownload the epub (4.2MB) | View the pdf (174MB)


Arthur's Pass and the Otira Gorge

B.E. Baughan. Published by Whitcombe & Tombs in Auckland, 1925

Catalogue record

Download this book

Download the epub (1.6MB) | View the pdf (10MB)


Captain James Cook, R.N.: one hundred and fifty years after

Joseph Carruthers. Published by John Murray in London, 1930

Catalogue record

Download this book

Download the epub (1.6MB) | View the pdf (53MB)


Catalogue of the Echinodermata of New Zealand: with diagnoses of the species

Frederick Wollaston Hutton. Published by James Hughes, Printer, in Wellington, 1872

Catalogue record

Download this book

Download the epub (2.2MB) | View the pdf (33MB)


The conquest of the New Zealand alps

Samuel Turner. Published by T. Fisher Unwin in London, 1922

Catalogue record

Download this book

Download the epub (1.9MB) | View the pdf (61MB)


Fifty years in Maoriland

James Thomas Pinfold. Published by Epworth Press, J. Alfred Sharp in London, 1930

Catalogue record

Download this book

Download the epub (1.8MB) | View the pdf (29MB)


The forest flora of New Zealand

Thomas Kirk. Published by Government Printer in Wellington, 1889

Catalogue record

Download this book

Download the epub (7.2MB) | View the pdf (264MB)


Forests and forestry in New Zealand: a statement prepared for the British Empire Forestry Conference (Australia and New Zealand)

Published by Government Printer in Wellington, 1928

Catalogue record

Download this book

Download the epub (1.8MB) | View the pdf (11MB)


Geology of New Zealand

Patrick Marshall. Published by Government Printer in Wellington, 1912

Catalogue record

Download this book

Download the epub (6.4MB) | View the pdf (40MB)


The geology of New Zealand: an introduction to the historical, structural and economic geology

James Park. Published by Whitcombe & Tombs in Christchurch, 1910

Catalogue record

Download this book

Download the epub (7.1MB) | View the pdf (87MB)


Glimpses of the Australian colonies and New Zealand: a thrilling narrative of the early days: embodying the life-history of Captain William Jackson Barry who arrived in New South Wales in 1829

William Jackson Barry. Published by Brett in Auckland, 1903

Catalogue record

Download this book

Download the epub (5.7MB) | View the pdf (62MB)


A history of New Zealand

Arnold Wilfred Shrimpton. Published by Whitcombe & Tombs in Christchurch, 1930

Catalogue record

Download this book

Download the epub (5MB) | View the pdf (68MB)


The history of the Church Missionary Society in New Zealand

Eugene Stock. Published by the New Zealand Church Missionary Society in Christchurch, 1935

Catalogue record

Download this book

Download the epub (727KB) | View the pdf (12MB)


Journal kept in New Zealand in 1820

Alexander McCrae. Published by the Alexander Turnbull Library in Wellington, 1928

Catalogue record

Download this book

Download the epub (681KB) | View the pdf (10MB)


Maori witchery: native life in New Zealand

Charles Robert Browne. Published by J.M. Dent in London, 1929

Catalogue record

Download this book

Download the epub (569KB) | View the pdf (30MB)


New Zealand – a healthy country: striking facts and records: survey of activities of Department of Health

Published by Government Printer in Wellington, 1925

Catalogue record

Download this book

Download the epub (482KB) | View the pdf (9MB)


Nuclear power generation in New Zealand: report of the Royal Commission of Inquiry

Royal Commission on Nuclear Power Generation in New Zealand. Published by Government Printer in Wellington, 1978

Catalogue record

Download this book

Download the epub (2.2MB) | View the pdf (66MB)


Pioneer engineering: a treatise on the engineering operations connected with the settlement of waste land in new countries

Edward Dobson. Published by Crosby Lockwood and Co. in London, 1877

Catalogue record

Download this book

Download the epub (3MB) | View the pdf (41MB)


Pure gold and rough diamonds: gems from the scrapbook of a travelling watchmaker and jeweller in Otago and Southland

James Ballantyne Hislop. Published by Whitcombe & Tombs in Christchurch, 1943

Catalogue record

Download this book

Download the epub (1MB) | View the pdf (25MB)


Recollections of early New Zealand

Henry Bruce Morton. Published by Whitcombe & Tombs in Auckland, 1925

Catalogue record

Download this book

Download the epub (2.9MB) | View the pdf (34MB)


Report of the Royal Commission on Monetary, Banking, and Credit Systems

Published by Government Printer in Wellington, 1956

Catalogue record

Download this book

Download the epub (5.5MB) | View the pdf (105MB)


Report of the Royal Commission to Inquire into the Crash on Mount Erebus, Antarctica, of a DC10 Aircraft Operated by Air New Zealand Limited

Published by Government Printer in Wellington, 1981

Catalogue record

Download this book

Download the epub (1.3MB) | View the pdf (41MB)


The rise and progress of Australia, Tasmania, and New Zealand: in which will be found a colonial directory, increase and habits of population, tables of revenue and expenditure, commercial growth and present position of each dependency, intellectual, social & moral condition of the people, &c., gathered from authentic sources, official documents, and personal observation in each of the colonies, cities, and provinces enumerated

Daniel Puseley. Published by Saunders & Otley in London, 1857

Catalogue record

Download this book

Download the epub (3.1MB) | View the pdf (67MB)


The rural economy and agriculture of Australia and New Zealand

Robert Wallace. Published by Sampson Low, Marston in London, 1891

Catalogue record

Download this book

Download the epub (7.1MB) | View the pdf (121MB)


Salary and wage fixing procedures in the New Zealand State Services: report of the Royal Commission of Inquiry

Published by A. R. Shearer, Government Printer in Wellington, 1968

Catalogue record

Download this book

Download the epub (1.2MB) | View the pdf (50MB)


Tales of the golden west: being reminiscences of Westland from its settlement by gold-seekers and traders

Waratah. Published by Whitcombe and Tombs in Christchurch, 1906

Catalogue record

Download this book

Download the epub (625KB) | View the pdf (26MB)


The Tarawera eruption, 1886

Ellen Ida Massy. Published by the Proprietor of the "Empire Review" in London, 1903

Catalogue record

Download this book

Download the epub (639KB) | View the pdf (6MB)


Waitangi: ninety-four years after

Thomas Lindsay Buick. Published by Thomas Avery in New Plymouth, 1934

Catalogue record

Download this book

Download the epub (1.5MB) | View the pdf (27MB)


A weird region: New Zealand lakes, terraces, geysers, and volcanoes, with an account of the eruption of Tarawera

Thomson Wilson Leys. Published by New Zealand Newspapers Ltd. in Auckland, 1950

Catalogue record

Download this book

Download the epub (1.7MB) | View the pdf (16MB)


Who are the Maoris?

Alfred Kingcome Newman. Published by Whitcombe & Tombs in Christchurch, 1912

Catalogue record

Download this book

Download the epub (1.3MB) | View the pdf (44MB)


Young New Zealand: a history of the early contact of the Maori race with the European, and of the establishment of a national system of education for both races

Arthur Gordon Butchers. Published by Coulls Somerville Wilkie in Dunedin, 1929

Catalogue record

Download this book

Download the epub (2.7MB) | View the pdf (72MB)


Post a blog comment
(Your email will never be made public)
Sarah Ell
28 August 2017 8:55pm

This is genius! Three of those books I really needed for a project I am working on an now I have lovely e-versions of them. Would be good to be able to download the PDF to use offline.

Pete Sime
8 August 2013 1:22pm

Excellent work. I'd like to see more reports from Royal Commissions and other commissions of inquiry (if they won't already be covered by the AtoJs project)

Katherine
30 July 2013 10:40am

Thank you for such an interesting and informative post! You raise some really good questions, I'll be interested to see what you decide and how these discussions influence future book digitization projects.

Russell Tuffery
24 July 2013 12:21pm

Great work. One work that would be good to have online is the historic Mason Inquiry of 1988

Report of the Committee of Inquiry into Procedures Used in Certain Psychiatric Hospitals in Relation to Admission, Discharge or Release on Leave of Certain Classes of Patients. Publisher: [Wellington, N.Z.] : The Committee, [1988]
Description: Book 237 p. : ill. ; 26 cm.

Ramesh Chakrapani
24 July 2013 4:02am

Congrats on the successful digitisation. The epubs render perfectly on my iphone. Kudos.