2016-04-04

It's time for another Islandora Show & Tell! This one comes to us from a milestone repository: the Chinese University of Hong Kong Digital Repository, which is the first known Islandora site in production in Asia (and also a member of the Islandora Foundation!).

Unfortunately, my traditional search for cats doesn't turn up anything in the CUHK repository. But the search feature itself is a showpiece in terms of Islandora implementations, for the way it handles Chinese characters, supporting an impressive set of Chinese book collections. While there are materials in English as well, I found more results for 書 than for book.

This is all-around a flagship site for handling Chinese text in Islandora, with three important customizations: flipping the bookreader direction from left to right, vertical text display of characters, and the implementation of traditional, simplified and variant forms of Chinese characters for searching across all digital collections. They have shared their custom solr config files on GitHub. The site runs on AWS, starting with a m3.large instance and recently upgrading to m4.xlarge for better performance in browsing and object ingestion.

My favourite collection in the repository is the Calligraphy & Paintings collection, which contains beautiful images such as this:



I highly recommend you explore for yourself, especially if you have an interest in preserving Chinese texts as digital objects. CUHK is leading the way. For more information about how they built their collections, I interviewed Dr. Louisa Lam, the head of Research Support and Digital Initiatives at the Chinese University of Hong Kong Library:

What is the primary purpose of your repository? Who is the intended audience?

CUHK Library started its digitization initiatives 20 years ago and has created over 5 million digital objects. As an ad hoc attempt at that time, different systems were used to tailor made to the needs of a particular project, resulting in myriad of backend and frontend systems, that took much time to manage and difficulties were encountered in upgrading them with new functionalities. In order to better support research and learning, and to promote open access and sharing, the Library sees the need to find a new permanent home for these digital content. The main purpose of developing this Digital Repository is to bring all the digital content into a single platform so that users are able to search and browse across all collections in the same single system; users are more at ease in using a single web interface; the web presence and discoverability of these collections can be enhanced; and more functionalities can be developed on the Repository, in particular, the support for digital scholarship.

Our repository is targeted for all researchers and also anyone interested in our collections. The Library supports open access and by opening up the digital collections, it is hoped that the University academics will find it easier to collaborate with the global research community.

Why did you choose Islandora?

There are both push and pull factors. Push factors: As mentioned above, we encountered difficulties in maintaining the old system, which is based on the proprietary Tamino XML Database. Being less popular in recent years, not many technical staff are versatile with this system. Its lacking of frontend system requires us to develop a new interface for each project which is time-consuming and costly. In view of the long term sustainability of the Library’s digital initiatives, we need to explore a new system that will overcome the problems of the old system and also meet with  the international standard and best practice for metadata, discovery and preservation. In particular, an open source software that will grow faster and further than a proprietary system is preferred.

Pull factors: Islandora and Hydra are the two final candidates for our selection amongst a number of open source and commercial systems. We decided to go for Islandora because it meets all our requirements, the most important are its multi-language support especially CJK (Chinese, Japanese & Korean) for our Chinese book collections; its support for digital humanities, the availability of both frontend and backend, and a very strong and active user community to actively develop new features. In addition, our team has more practical experience in using Drupal. It is natural for us to select a system that is also Drupal-based. Another consideration is the vendor support offered by discoverygarden (DGI). We have a very small technical team and they are not very familiar with open source systems.

Since we are given a tight time-line to launch the Digital Repository, our focus is on the smooth migration of millions of digital objects. The availability of discoverygarden to support Islandora gives staff a strong psychological support in case a last-resort-help is needed. It turns out that the service of discoverygarden helps a lot in our development.

Which modules or solution packs are most important to your repository?

Internet Archive (IA) Book Solution Pack is definitely our choice. The reason is very simple; most of our digital collections are Chinese books, so we rely heavily on the processing and display of book items. discoverygarden has helped enormously to customize the IA reader and enable a different flipping direction for our Chinese collections. The details are in this PPT presented in the Islandora Conference 2015.

What feature of your repository are you most proud of?

Other than the customized IA book reader, we are also proud to have vertical text display in the transcription view (example) Transcription view is not new in Islandora, but vertical display that fit for traditional writing direction of Chinese characters does enrich our user experience a lot and also facilities researchers to explore the collections. We are also proud to implement simplified, traditional and variant Chinese characters to ensure the same Chinese character in different forms can be accurately retrieved.

Another unique feature of the Repository that is different from most Islandora sites is that most of our millions of digital objects are in Chinese and in book format. We hope to benefit researchers in this area by opening up this rich Chinese collections.

Who built/developed/designed your repository (i.e, who was on the team?)

Even though we have a very large digital collection that pending for migration, we only have a small team of staff to manage the repository.

Our Digital Services Librarian, Jeff Liu, assumed the role as Repository Manager, who plans the project including the budget, interface, functions, and most importantly, digitsation workflows and migrations of collections and objects into the repository.

We have one web developer who joined in 2014 summer. He learned and is familiarized with Drupal and deployed our library website launched in summer 2015. After that, he dived into Islandora, and is responsible for all the technical task of our repository including tweaking Drupal, XSLT transformation to fit for our local use case and also OS and server management in AWS Cloud. He is also our main contact with discoverygarden.

With this small team, we also hire service from experts and elites from the discoverygarden that know Islandora much better than anyone.

Do you have plans to expand your site in the future?

Our Repository is just launched; many collections are not yet migrated, the largest of which is Chinese rare books. Though we do not have the exact number of objects on hand as it is growing every day, we expect it will take around 1.5 years to migrate.

Another significant collection in the migration pipeline is the Electronic Theses and Dissertations Collection which has around 15,000 records; we are currently testing the Islandora Scholar module.  We are also planning to migrate millions of digital objects of Hong Kong Literature Collection using the Serial Solution Pack. Most digital objects come from local literature journals that scanned in PDF form with embedded text inside. We believe migrating the collection will revitalize itself and also enhance the traffic to the new repository.

At the same time, we will offer digitization services to faculty that are interested to develop and deposit their digital collections into the Repository. The Repository will also be used to support digital humanities projects if the nature of such projects fit with the Repository system.

What is your favourite object in your collection?

Most of the digital objects in the Repository are either books or journal articles. But the Oracle Bones collection is quite different. It is an image collection that contains the images of Oracle Bones in the Shang Dynasty (c. 1558 BC - c. 1046 BC) like this one. In the past, we only used a webpage to display the image of Oracle Bones. But during the migration of this collection, an expert from Institute of History and Philology of Academia Sinica was invited to preserve the bones and provide metadata for each piece. Then we create MODS using OpenRefine (Thanks University of Toronto for their sharing their methods). The collection is revitalized again in the repository with a new set of metadata and the OpenSeaDragon Viewer is also perfect for magnifying the object for a clearer view.

Show more