• Home
  • Blog
  • “I could have confronted a million monkeys in the Himalayas…”

“I could have confronted a million monkeys in the Himalayas…”

June 27th, 2012 By maxine

I’m getting a bit sick of all these Dacobra movies and TV shows

DACOBRA, like Sherlock Holmes, Captain Kettle, Dr. Nikola, and other celebrated names in modern fiction, will become a household name all the world over, and you will do well not to miss the opportunity our columns will afford of making the early acquaintance of this great new character.”

DACOBRA! The name was splashed across the pages of the Saturday supplement to the Auckland Star, promising adventure and something vaguely Eastern. The edition of March 8, 1902, proudly touted its new serial, a sure “sensation in the literary world” and source of “unusual pleasure”.

Advertising Dacobra in the Auckland Star.
A sure hit! Unputdownable! Superlative! Auckland Star, 8 March 1902, Page 3.

Serialised fiction was a staple of both metropolitan and local papers, particularly in the decades around the turn of the century. The stories were the length of hefty novels – often because they were, chopped up for regular reading.

The newspapers of New Zealand had hundreds of them, and now so does Papers Past.

From the printing press to your screen

I was curious about what it would take to pull one of these stories out of Papers Past, where the original pages have been scanned and transcribed with OCR, and turn it into something you could read all together.

Download Dacobra, or, The White Priests of Arhiman (1MB, pdf)

Raw text for anyone who wants to try their hand turning it into an epub (410KB, txt)

Turns out, it still takes a lot. No one’s going to be piping these stories straight to their Kindle any time soon. Even with all the technological underpinnings, the path from A to B requires an awful lot of human intervention.

The trouble with w’s, and other problems

Papers Past isn’t just scanned – it comes with a text equivalent that makes it searchable. That’s been created by running the scans through optical character recognition, or OCR. It does pretty well, but it’s not a perfect process, and in some cases you get very odd results.

Though the sheer variety of mistakes makes it hard to do anything programmatic (like automatically turning every instance of nr:u|=#\j into monkey ), there are some patterns that can smooth the path.

Large chunks of this story were having real trouble with w , frequently turning it into Av and producing sentences like “Ave Avish you Avould Awash Avith Avater”. I could try replacing every Av with w , but of course there are entirely legitimate words like avuncular and flavour and moshav.

More piecemeal, but less likely to lead to other problems, was replacing particular words throughout the text. Just swapping out Avould , Avich , and Avhere with would , which , and where saved me possibly hundreds of edits. What I couldn’t avoid were the hundreds of dots and scratches the OCR process had decided were commas, periods, or semicolons.

On occasion, columns hadn’t been properly separated before running the pages through the transcription. The text became a solid clump of half sentences running into each other, and it’s really annoying. At a couple of points I found it easier to transcribe the text myself instead of fix the machine-made version. Robot uprising avoided.

Double columns in Papers Past.
Still reads pretty well, really.

Even more troubling, there’s a whole Saturday supplement (where the story ran in the Star) missing from the database. The collections can only be as good as what’s collected, of course, and apparently that week’s bonus pages never made it to the Library in the first place.

Luckily, the Auckland Star wasn’t as widely available as the Sunday Star-Times of today. The Hawera & Normanby Star therefore had a reason to bring DACOBRA to a whole new audience, and their supplements made it into Papers Past.

Actually, Papers Past is really really good

It’s not all bad news, of course. It’s phenomenal to me that this 110 year old newspaper is even more accessible to me than it was to its original reader. The processes, software, and flat-out work that has gone into Papers Past blows me away. Even as someone doing research with historic materials 10 years ago, I’m envious of what the kids today have to play with. For example, you could go in, yank out and consolidate a serialised story, clean it up, and fire it out to the world…

Papers Past sits on a platform called Veridian, which also underlies a lot of full-text newspaper collections out there. Later versions (which we’re eying up, and which others like Trove are currently using) include some very nice features that would help my book-extraction out, like user corrections.

Papers Past could be an even better resource if users could correct errors they find. Right now, messed-up text isn’t just ugly, it means search results are less useful, and it could be made a whole lot better with the contributions of our amazing and dedicated users. It’d be like Distributed Proofreaders, but with newspapers.

Mr. Maxwell, monkey-shooter

So was it worth it? Is the story all I’d hoped for? Good grief no. It’s actually quite bad. Unsympathetic characters with muddled motivations, a mystery that delivers little interest when revealed, and an utter lack of economy in the writing (yes, I’m one to talk).

Lionel Maxwell’s an unsympathetic character, but I can’t tell if that’s the author’s intent. Aside from the depressingly unsurprising racism, sexism, colonialism, and ignore-the-plebs-ism that comes naturally to characters of the day, our sculptor-hero is also arrogant in his art, unreflective in his understanding of the world, and generally a bad friend.

Maybe the plot or world-building could have made up for it. As I made my way through the early chapters, I thought I was in for something almost Lovecraftian; a well-travelled scholarly type finds some truth in those weird old tales in a strange, out of the way part of the world. Well, Scotland. But it just never gets that weird, or that horror-laden.

Even the inherent funniness of monkeys, which pop up throughout the story, is chilled somewhat by the inclination of characters to kill them every chance they get.

Still, please do give it a go, especially since I spent so long cleaning it up for you. Maybe on a quiet Saturday morning, when you’re done with the paper.

Random notes

I made the PDF in Apple’s iBooks Author, before realising it doesn’t export to ePub. Darn it.

This isn’t a pure transcript – I fixed a lot of typos from the original newspaper. Ideally the text version on Papers Past would be faithful to the source, though.

The photo on the PDF’s cover is of Miss Nance O’Neil, as portrayed in a piece of ephemera in our free download pool.

Post a blog comment

(Your email will never be made public)
Rowan Gibbs
19 July 2012 5:05pm

I spend a lot of time searching for and then downloading and saving short stories and longer serials by New Zealand authors in PapersPast. But I seldom even look at the OCR'ed text (let alone spend time correcting it) - as far as I'm concerned that serves only the (very useful) purpose of making the stories findable. I copy the images of the text and paste them (trimmed when necessary) into a Word file. One can then read the story just as it appeared in the paper - in the original font and keeping the original "flavour". (and illustrations where present, e.g. in the NZ Illustrated Magazine). This also avoids a major problem in that on occasion the OCR'ed text sometimes OMITS several words and this can be detected only if the OCR version and the image were carefully compared - a LOT of work. Of course my Word files end up very large but that is only a problem if emailing - and rather than that they can then easily be placed on flickr on in Dropbox - I have done this when sending stories by Australian authors to researchers in Australia. PapersPast is a great resource and the history of New Zealand newspaper fiction is still to be written. (I have also, to a lesser extent, searched out poetry by NZ authors in PPast. This I pass on to the New Zealand poetry archive who print it out, but I have also published two sample booklets on NZ poets which give a biography (derived largely from PPast) and an anthology of their poetry (printed from images taken from PPast): copies of these are in the National Library - Mary Anne Josephine Wall (Mary Anne Josephine Crawford) : a bio-bibliography : with a selection of M.A.J. Wall’s poems, and "A bird of our clime" : Otago’s songstress Marie R. Randle ("Wych Elm") : a bio-bibliography. I do agree that an online correctable PPast would be even better - when I find something of interest to me on the Australian or California online newspaper sites I always correct the text. Rowan Gibbs / Wellington