Digitisation – experience from the Digitised Legislation Knowledge Store (DTT) project

Authors

  • Ildikó Boros

Keywords:

-

Abstract

The Digitised Legislation Knowledge Store (DTT) project of the Library of the Hungarian Parliament was implemented in the period January 4, 2010 to November 30, 2012, as a priority project, in the framework of the Electronic Administration Operational Programme (EKOP), supported by the European Union and co-financed by the European Regional Development Fund. This article describes the workflow of digitisation. After creating the IT background the general technical parameters were defined, and the physical and logical presentation of digital documents started. This was followed by the workflow for mass uploading: METS/XML was prepared based on templates for various groups of materials, and plans for quality control were made using mathematical-statistical methods. Principles for the selection of works for digitisation were defined, with categories according to fields of science, document types, language, time, etc. Hungarian-language materials were selected from the main collecting areas of the Library (law, history and political science). The size, structure and condition of the volumes selected were registered on status sheets. Books were described individually, while journals, official gazettes and decisions were described in groups in a conservation status database. In order to select suitable copies it was necessary to identify the bibliographic, copy and publishing data in catalogues, and to check, improve and prepare for conversion the records of books and journals concerned, to enter their copyright status and collection organisation codes into the database. Metadata were prepared. The digitisation company was chosen within a public procurement process. Because of the value of works selected for digitisation, their uniqueness and in many cases irreplaceable nature, as well as preservation considerations, digitisation took place on site, mostly using a Kirtas KABIS III. robotic scanner. Large volumes and foldout attachments were digitised by a flatbed scanner. The processing of the images, their cutting and correction was carried out by the Book Scan Editor (BSE) software. Metadata were assigned to each page of each volume by the GLOBE-Index software. This was followed by optical character recognition (OCR): after image processing the data were transferred into the OCR database, and the OCR Engine automatically created the final two-layer PDF file format. The pages prepared were then input into the DigiTool software in this format. The first step of checking was the automatic control of TIFF images. During manual inspection the general quality control of scanning results took place. Within the project two million pages have been digitised, the total number of volumes was 5272. The documents can be accessed in accordance with the copyright legislation in force. A major part of works (40%) is under copyright protection; consequently they can be displayed on the Library’s computers only for scientific research or private study. The works that are not subject to copyright protection are available without any limitation to the public on the Internet. The DTT portal is barrier-free; those visually impaired can use it properly as well.

Downloads

Published

2013-05-20

How to Cite

Boros, I. Digitisation – experience from the Digitised Legislation Knowledge Store (DTT) project, Scientific and Technical Information, 60(7), p. 283–290, 2013.

Issue

Section

Articles