Introduction

The various electronic aspects of our project can be presented schematically as shown below. Central to the diagram are the XML-encoded transcriptions of the manuscripts (using a subset of the the TEI encoding guidelines). From the TEI-encoded thascriptions two presentations of the data emerge, first the online version for searching and textual analysis, secondly the typeset version, to be published by Oxford University Press. The other parts of the diagram will be briefly described.

We have chosen the XML language as vehicle for encoding the manuscripts for various reasons. XML has rapidly become a widely used standard for the logical (as opposed to presentational) markup of texts, which facilitates the use of the same data for different ends. Further, XML is independent of any one hardware or software system, thus guaranteeing the accessibility of our files beyond the near future. But most important, the DTD developed by the Text Encoding Initiative provides a very solid basis on which we can build a finetuned DTD, specific for our project. The TEI consortium has done an impressive job in describing a vast range of possible structures in scholarly texts, including the complex structure of critical editions. The authority of TEI becomes immediately apparent from the large number of digitization projects using these specifications.

Project-specific DTD

In DLP.dtd our encoding rules are defined. We use a subset of the TEI DTD, covers all tags needed for manuscripts with scribal additions, deletions, marginals notes, etc. In addition, it will provide for a full philosophical and historical apparatus as well. Our DTD is described in detail in the Markup Policy section. The DTD is used to validate our documents prior to the conversion to the web or to TeX.

TEI to Web Conversion

The prentations of the XML data online is facilitated by modular and object-oriented middleware, which processes the queries entered by users on a web page. Searched are processed by the flexible PAT search engine of Open Text. To be searchable by the PAT search engine the data are converted to the Text Class XML, which is closely related to TEI XML; this conversion is carried out by XSLT and Perl. The search engine returns the retrieved data to the middleware and dynamically transforms it to web pages on the basis of templates and style sheets. Different views for the normalized form and diplomatic form are realized by a combination of templates and the middleware. More information is available in the TEI to Web Conversion section.

TEI to TeX Conversion

To obtain a printed version we convert the XML documents to the TeX typesetting language by means of a Perl program. TeX is a flexible text editor that can be programmed specifically to deal with the design of complex books, such as of text-critical editions (with marginal notes, line numbering and notes automatically attached to line numbers). Moreover, TeX can meet the highest typographical standards; for example, the system provides subtle character kerning, ligatures, old-style figures and a sophisticated line breaking algorithm. TeX can be used as logical markup language. The control codes supplied in the transcriptions are translated to a graphical result by means of typesetting routines that are stored in a separate document. A collection of such routines is called a TeX macro package. In our project we used a dedicated macro package which is programmed in accordance with the specifications of the Clarendon Locke Edition. More information is available in the TEI to TeX Conversion section.

Metadata

All transcriptions in our database are accompanied by metadata. Because of their complex nature these metadata are not stored in the in XML files but in a separate FileMaker Pro database. Here various types of information about the manuscripts and the texts contained in the manuscripts are stored, such as a physical description of the manuscripts with respect to size, watermarks, stichtings and scribes, and a description of the historical background of the texts and their relation with Locke’s other works. The contents of the FileMaker database are exported and modified by Perl into an XML format that is validated by a project-pecific DTD. These XML data can be processes by the middleware and presented on the web. More information is available in the Metadata section.