Digitizing the Darlington Memorial Library is an ambitious undertaking by the University Library System (ULS) since the scale and complexity of formats represents so many challenges. Although the ULS considered outsourcing the digitization of major components of this collection, the costs were simply too high and uncontrolled. Rather, the ULS made a strategic internal investment of its own resources to provide the necessary staffing and equipment for the project.
While many library departments are involved in different phases of the project, the primary digitization responsibility falls on the ULS Digital Research Library (DRL). The DRL commenced scanning books from the Darlington library in the summer of 2006 and will continue its digitization activities for some time.
The DRL comprises three librarians, three scanning technicians, and several University of Pittsburgh graduate students enrolled in the School of Information Science. We have allocated 90 hours of scanning time per week on several different devices.
After a careful investigation into equipment needed to digitize the variety of material found in the Darlington library, the DRL acquired two large-format scanning devices from the French manufacturer i2S. With the A0 and A1 DigiBooks, the DRL can scan bound books and large-format objects up to 34 x 49 inches using a combination of grayscale and color cameras. The DRL also utilizes two other flatbed devices (Epson Expression 10000XL and Microtek 9800XL) used for scanning prints, negatives, etchings, lithographs, engravings, maps, etc. that measure up to 11 x 14 inches.
Over the course of the next several years, the DRL will digitize material from the Darlington library that falls into six categories: atlases, books, broadsides, images, manuscripts, and maps. A carefully planned workflow allows the DRL to accommodate a constant flow of material into and out of the department.
Due to the quantity of books it must handle for this project, the DRL created a tracking database to monitor the steps necessary to digitize, inspect, index and mount the digital versions. Books are a complicated format to represent in an online environment; for example, the workflow for digitizing books consists of eight major divisions of labor and 53 individual steps. While some of these steps require human interaction (e.g., scanning, verifying, assigning structural divisions, etc.), many are automated, such as the creation of derivative page images for online display, the optical character recognition (OCR) process that derives searchable text, and the assembly and indexing of the XML for each book.
The workflow for image content, such as maps, prints, and broadsides, is far less complicated since this content is typically represented as a sole single-sided object associated with one metadata record. While the DRL digitizes this material, archivists and curators within the ULS Archives Service Center perform the description and cataloging responsibilities.
All material is digitized at 400 ppi in either 8-bit grayscale or 24-bit color, depending on the nature of the material. Digital master files are saved as uncompressed TIF images; derivatives in the JPEG2000 format are generated for display.
During the digitization of books and atlases, only the text block is captured. When foldouts are present, they are digitized. The field of capture for digitizing other material, such as broadsides, prints, etchings, artwork, manuscripts and maps, extends slightly beyond the object’s physical border to show the entire page.
Metadata supporting the online books and atlases are based on the Text Encoding Initiative (TEI) Document Type Definition. Metadata for the broadsides, images, manuscripts, and maps are based on the Dublin Core and local practices.
To serve its collections online, the DRL implements the DLXS suite of digital library middleware developed by the University of Michigan’s Digital Library Production Service. DLXS tools facilitate the indexing, searching and display of an array of digital content. In particular, the DRL utilizes the DLXS Text-class and Image-class middleware to serve the Darlington content.