Behind the scenes on MatapihiSeptember 7th, 2007
Matapihi is no more! But everything that was in there is now findable in DigitalNZ – Ed.
We have just completed adding more than 600 new artworks, paintings and drawings to Matapihi. These items come from War Art Online, a new digital collection from Archives New Zealand Te Rua Mahara o te Kāwanatanga.
Matapihi is a collaborative project, initialised at the National Digital Forum, as a window to the rich and varied "digital collections of different New Zealand organisations". The website was the recipient of the Créme de la Créme Innovation award in the 2004 TUANZ (e)-vision Awards.
Getting the metadata
Staff at the National Library have been involved in the software and website build, and work collaboratively with peers in partner organisations to organise the metadata that goes into Matapihi. The metadata standard for Matapihi is Dublin Core, XML/RDF and XLINK.
QA and Pre-processing
It is never a straight-forward process, bringing divergent data together from different sources. Once we receive the XML metadata, it goes through a quality assessment (QA) process and we hold discussions with the provider (there may be several rounds of these). Then we put it through a custom-made Java pre-processing program, which parses the data and perform required transformations including
- the expanded form of country abbreviations
- the expanded form of language abbreviations
- expanding date ranges
- replacing non-standard unicode with standard unicode characters
- including the partner label in each record, when it doesn't exist
These transformations make the data better accessible to searchers and to display in proper form across all supported browsers. We built the pre-processor program, over one and a half months, to cover the various types of data processing - file, string and XML. The Java side of the program does the string manipulation - extracting and merging records with files, unicode processing and the like. The XSLT transformations handle the XML metadata level manipulation. The pre-processor outputs system-digestable size metadata load files of 1000 records each.
Loading and configuration
Once the metadata files are ready, they are loaded to a institution-specific 'container' in the repository. The container is then linked to an online collection for that particular institution. Finally, all these collections are then linked together in one Matapihi collection. However, the separate institution collections facilitate an advanced search on a particular institution.
Showcases and Lucky Dip
After the metadata load, the next steps are to include the latest records in the various features we have to make them more accessible. Yes, I'm referring to the Showcases and Lucky Dips.
Showcases are collections defined in our digital repository system. Showcases are meant to provide a way for users to browse through items based on a particular subject or theme such as Kiwiana, Self-portraits, Native Birds and many more. Authorised curators for each institution, select items that belong to a showcase theme. Then, the selected items are added to the relevant showcase collection.
Lucky dip is custom program, written in Perl, that is designed to bring you a random image from the collections, every time you click on the link. Once the partners data is added to the repository, all their item numbers are added to this program. Different partners may supply different number of items, so this program has a weight value assigned to each partner's collection. This value enables each partner to have equal chance of coming up in Lucky Dip, no matter if they have 300 items or 50,000. We do try to be fair.
Other considerations - Tagging
The War Art Online project has done a very nice implementation of tagging. A user can go to any image and add tags relevantto the image contents. Naturally, there have been some queries about tagging in Matapihi. Our current build of Matapihi is based on systems defined in 2004, when tagging wasn't such a common feature. We are in the process of looking at the next generation of platforms that provide this and other related functionality.