Markup Policy

This document describes the markup policy for the XML documents for the Digital Locke Project. The markup is layed down technically in Document Type Definition DLP.dtd, which contains the definitions of and relation between the XML tags. The file DLP.dtd consists exactly of the code that is displayed in blue.

Contents

1. Relation to tei.2

2. Manuscript description in <teiHeader> element

3. Manuscript transcription in <text> element

3.1. Non-empty elements (<p>, <add>, <del>, <sic>, <hi>, <note>, <corr>, <unclear>, <xref>, <abbr>)

3.2. Empty elements (<anchor>, <gap>, <milestone>, <lb>, <pb>)

3.3. Critical apparatus (<app>, <lem>, <rdg>)

3.4. Entities

4. Line breaking and indentation conventions

1. Relation to tei.2 [Back to Contents]

This DTD is a subset of TEI P4 (base set prose, with additional fragments linking, transcr and textcrit). As a result, a document that is valid according to DLP.dtd will also be valid according to tei2.dtd, provided that the DOCTYPE declaration contains the following INCLUDEs.

<!DOCTYPE TEI.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN" "tei2.dtd" [
   <!ENTITY % TEI.XML      'INCLUDE'>
   <!ENTITY % TEI.prose    'INCLUDE'>
   <!ENTITY % TEI.linking  'INCLUDE'>
   <!ENTITY % TEI.transcr  'INCLUDE'>
   <!ENTITY % TEI.textcrit 'INCLUDE'>
]>

Using a subset of TEI has two advantages. First, it gives us a small compacted set of tagging rules, that can be specifically documented with a view to the Locke manuscripts. Secondly, the DLP.dtd describes documents that can be processed by our TEI2TeX tools; therefore, this DTD can be used to validate documents prior to the conversion.

2. Manuscript description in <teiHeader> element [Back to Contents]

Like all TEI documents, a DLP document contains two elements at top level, the teiHeader and the text.

<!ELEMENT TEI.2 (teiHeader,text)>

In our DTD the <teiHeader> element is minimalized, as all metadata for the transcriptions are stored in a separate database. For the sake of compatibility all elements required by TEI are retained, but the only important element is the <idno> within the <publicationStmt>; this links the DLP document to te metadata database. The other elements are supplied for completeness. Here is a sample of a <teiHeader>.

<teiHeader>
   <fileDesc>
      <titleStmt>
         <title>Of the Conduct of the Understanding</title>
         <author>John Locke</author>
      </titleStmt>
      <publicationStmt>
         <idno type="DLP">1</idno>
      </publicationStmt>
      <sourceDesc>
         <p>Source information is stored in a separate database<p>
      </sourceDesc>
   </fileDesc>
</teiHeader>

This structure is represented in the following element definitions in DLP.dtd.

<!ELEMENT teiHeader (fileDesc)>
<!ELEMENT fileDesc (titleStmt,publicationStmt,sourceDesc)>
<!ELEMENT titleStmt (title,author)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT publicationStmt (idno)>
<!ELEMENT idno (#PCDATA)>
<!ATTLIST idno type CDATA #REQUIRED>
<!ELEMENT sourceDesc (p)>
<!ELEMENT bibl (#PCDATA)>

3. Manuscript transcription in <text> element [Back to Contents]

The <text> element contains the <body> element, which contains the various divisions of the manuscript. Each <div> element must contain an id attribute; this is not required by TEI, but DLP.dtd is much more strict in various respects, as will become clear below. The id attribute contains a unique label by which they can be identified.

<!ELEMENT text (body)>
<!ELEMENT body (div)+>
<!ELEMENT div (p+)>
<!ATTLIST div id ID #REQUIRED>

3.1. Non-empty elements [Back to Contents]

In this group we have the elements that can contain various other elements. In order to facilitate the definition of these elements we introduce the parameter entity %higher;

<!ENTITY % higher '(#PCDATA|add|del|sic|hi|note|abbr|corr|unclear|anchor|gap|milestone|
 pb|lb|app|lem|rdg|xref)*'>

This entity is used in the definition of the elements <p>, <add>, <del>, <sic>, <hi>, <note> and <lem>. These are discussed here with exception of <lem>, which will be discussed in section 2.3. All element definitions (rendered in bold) are taken directly from the TEI specification. As a result, some of the definitions are more generic than required by our project.

<p> (paragraph) marks paragraphs. This element is always a child element of <div>.

<!ELEMENT p %higher;>

<add> (addition) contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector. This tag is not used for additions by the editor, for which the <cor> tag is intended. The <add> tag may contain a place attribute to indicate that an addition is not inline. If the addition is above the line we add the attribute place="supralinear". If the addition is above the line but the caret is missing, the attribute reads place="supralinearnocaret". If the addition is in the margin, the attribute reads place="margin". If the addition is on another page, the attribute reads, for example, place="p259". If no attribute is supplied, the addition is inline. If an addition is written in different ink or in pencil the rend attribute is used with value rend="difink" or rend="pencil". Sometimes, Locke deletes a word or phrase and then undeletes it by subdotting. We mark this as an <add> with rend="subdotting". In some cases an addition is made in another hand, which can be indicated by the resp attribute.

<!ELEMENT add %higher;>
<!ATTLIST add place CDATA #IMPLIED>
<!ATTLIST add rend (difink|pencil|subdotting) #IMPLIED>
<!ATTLIST add resp CDATA #IMPLIED>

Sample:

that service it might
<add place="supralinear">
   doe
</add>
and was designed for:

<del> (deletion) contains a letter, word or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector. This tag is not used for additions by the editor, for which the <sic> tag is intended. The <del> tag may contain a rend attribute to indicate that a deletion is made by superimposition (rend="superimposition") instead of overstrike. If no attribute is supplied the deletion is by overstrike. Similar to the <add> tag, the <del> tag can be accompanied by rend="difink".

<!ELEMENT del %higher;>
<!ATTLIST del rend (superimposition|difink) #IMPLIED>

Sample:

may imagin
<del>
   and
</del>
a vast and almost

<sic> contains text reproduced although apparently incorrect or inaccurate. The editor uses this tag for editorial deletions.

<!ELEMENT sic %higher;>

Sample:

The first is of those <sic>those</sic> who seldom reason at all

For editorial alterations of the text a combination of <sic> and <corr> (see section 3.2) tags is used. For example:

The first is of <sic>that</sic> <corr>those</corr> who seldom reason at all

<hi> (highlighted) marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made. A rend attribute is required. Note that in the transcription the value italic indicates that the text is underlined by the scribe.

<!ELEMENT hi %higher;>
<!ATTLIST hi rend (italic|superscript|underline|bold) #REQUIRED>

Sample:

may call <hi rend="italic">large sound round about sense</hi> have not

<note> contains a note or annotation. <note>s can be divided into four categories, depending on the attributes used.

<!ELEMENT note %higher;>
<!ATTLIST note
   place (margin) #IMPLIED
   n (trans|crit) #IMPLIED
   resp (ed) #IMPLIED
   targetEnd IDREF #IMPLIED>

The first type of note is the marginal note by the scribe. This note has only one attribute: place="margin". For example:

our partiall views
<note place="margin">
   Partial views
</note>
.

The second type of note is philosophical or historical commentary by the editor. This note has only one attribute: resp="ed". For example:

for gold and hid treasure,
<note resp="ed">
   Cf. Prov. 2: 3-5.
</note>
but he that does soe

The third type of note is editorial commentary on an aspect of the transcription that is not covered by the other tags. This is indicated by the resp="ed" and n="trans" attributes. If the note is attached to a point in the text (i.e. the point where the <note> tag is placed) no targetEnd attribute is supplied. If the note refers to a passage the targetEnd attribute is supplied. On the use of the <anchor> element see section 3.3. For example:

<note resp="ed" n="trans" targetEnd="A815">
   <hi rend="italic">Par. 99 continues addition to the previous par.</hi>
</note>
...
or deformity to any of their opinions.
<anchor id="A815"/>

The fourth type of note is text-critical commentary, indicated by the resp="ed" and n="crit" attributes. This type of note is covered in section 3.4.

<corr> (correction) contains the correct form of a passage apparently erroneous in the copy text. The editor uses this tag for editorial additions. There are three types of editorial additions. As it is not always clear to which type an editorial addition belongs, we choose not to differantiate between them with respect to their markup.

<!ELEMENT corr %higher;>

The first type is the addition of letters or words that are absent from the manuscript by a mistake of the author or scribe. For example:

that come under their consideration <corr>and</corr> can as it were in the twinkleing

The second type is the addition of punctuation that is absent from the manuscript because of differences between ancient and modern punctuation conventions. The most important instance of this addition is the editorial full stop. For example:

frequent and very observable<corr>.</corr>

The third type is the addition of text that was not written out because it is part of a passage that was deleted by the author before the word was completed. For example:

we conclude not right from
<del>
   p<corr>artiall</corr>
</del>
our partiall views

<unclear> contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source. Most often this tag applies to heavy overstrikes which have made the text difficult to read.

<!ELEMENT unclear %higher;>

Sample:

in love with it and
<del>
   <unclear>sincerely</unclear>
</del>
diligently seeking after it

<xref> (extended referencen) defines a reference to another location in the current document, or an external document. This tag is used in editorial notes (<note resp="ed">) to make references to other texts. The n attribute contains a document ID that corresponds to the ID in the metadatabase and a section number.

<!ELEMENT xref %higher;>
<!ATTLIST xref n CDATA #IMPLIED>

Sample:

<note resp="ed">
   For Locke on logic see also ‘By this learned art’,
   <xref n="27,1">MS Locke, c.28, fol. 117r</xref>.
</note>

<abbr> (abbreviation) contains an abbreviation of any sort. The expan attribute gives the expansion of the abbreviation. Please note that this tag is currently not in use in our transcriptions.

<!ELEMENT abbr (#PCDATA)>
<!ATTLIST abbr expan CDATA #REQUIRED>

Sample:

as <abbr expan="and">&amp;</abbr> of Sagacity

3.2. Empty elements [Back to Contents]

In this group we find five empty elements <anchor/>, <gap/>, <milestone/>, <lb/>, and <pb/>, which do not apply to a range of text but to a point in the text.

<anchor> (anchor point) attaches an identifier to a point within a text, whether or not it corresponds with a textual element. This element is used to mark points in the text to which the tags <note n="crit"> and <app> refer. ids are generated automatically and do not have any logical connection with the text. However, ids starting with an "A" belong to one of the first three types (i.e. tags dealing with the transcription of the manuscript text), ids starting with an "B" belong to an <app> tag (i.e. the tag dealing with the variant readings of the critical apparatus, as explained in section 2.4).

<!ELEMENT anchor EMPTY>
<!ATTLIST anchor id ID #REQUIRED>

<gap> (omitted material) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible. This tag requires the extent attribute, which indicates the length of the gap, expressed in (an estimate of) the number of characters. This tag often occurs in combination with the <del> tag.

<!ELEMENT gap EMPTY>
<!ATTLIST gap extent CDATA #REQUIRED>

Sample:

may
<del>
   cl<gap extent="1"/>
</del>
call

<milestone> marks the boundary between sections of a text, as indicated by changes in a standard reference system. This tag has three required attributes n, ed and unit. The n attribute contains the number of the section or part. The ed attribute contains the edition from which the numbering is taken. The unit attribute contains the level of the numbering; at the moment three levels are provided, "section", "part" and "page".

<!ELEMENT milestone EMPTY>
<!ATTLIST milestone
   n CDATA #REQUIRED
   ed CDATA #REQUIRED
   unit (section|part|page) #REQUIRED>

Sample:

<milestone n="B" ed="Schuurman" unit="part"/>
<milestone n="98" ed="Schuurman" unit="section"/>
<milestone n="3" ed="King" unit="section"/>

<lb> (line break) marks the start of a new (typographic) line in some edition or version of a text. This element is used when Locke has broken a word across two lines, indicating the word break by a hyphen. In this case it is unclear to the editor whether the word was intended with or withour a fixed hyphen.

<!ELEMENT lb EMPTY>

Sample:

the ordinary drudgery of a day-<lb/>labourer.

<pb> (page break) marks the boundary between one page of a text and the next in a standard reference system. The n attribute contains the page or folio number of the manuscript. If no catchword is present at a page break, the rend="nocatch" attribute is supplied. Sometimes the scribe has omitted a page number; in this case the page sumber is supplied by the editor, which is indicated by resp="ed".

<!ELEMENT pb EMPTY>
<!ATTLIST pb
   n CDATA #REQUIRED
   rend (nocatch) #IMPLIED
   resp (ed) #IMPLIED>

Sample:

the Actions and discourses of <pb n="58" rend="nocatch"/> mankinde will finde their

3.3. Critical apparatus [Back to Contents]

For some texts and fragments that belong to our project multiple sources are available. If these sources differ in their readings of the text, the editor can collate the variants with the <app> tag.

<app> (apparatus entry) contains one entry in a critical apparatus, with an optional lemma and at least one reading. We allow two types of <app> entries, one without and one with the to attribute. The first type must contain the lem element. The second type must not contain the <lem> element. The second type is used when, for some reason, the first type is not applicable, for example when the lemma is too long, or when overlapping lemmas occur. At the moment all instances of <app> elements in the transcriptions of the Digital Locke Project are of the first type, i.e. without to attribute. The second type may occur in future transcriptions. For now, this type is not covered in these guidelines.

<!ELEMENT app (lem?,rdg+)>
<!ATTLIST app to IDREF #IMPLIED>

<lem> (lemma) contains the lemma, or base text, of a textual variation. In other words, this is the text accepted by the editor as the correct reading. Of course, this reading can be the result of a transcription containing additions, deletions, etc. Therefore, the %higher; parameter entity is applied. The n attribute can be used for additional information such as a reference a page number.

<!ELEMENT lem %higher;>
<!ATTLIST lem
   wit CDATA #REQUIRED
   resp CDATA #IMPLIED>
   n CDATA #IMPLIED>

<rdg> (reading) contains a single reading within a textual variation. The required wit attribute contains the source of the variation. A resp attribute may be added to indicate who is responsible for the variant, if this can be determined.

<!ELEMENT rdg %higher;>
<!ATTLIST rdg
   wit CDATA #REQUIRED
   resp CDATA #IMPLIED>
   n CDATA #IMPLIED>

Here is an example:

but the rest of that vast
<app>
   <lem>expansum</lem>
   <rdg wit="C28" resp="Locke">expansion</rdg>
</app>
they give up to night and darkeness and so avoid comeing near it.

3.4. Entities [Back to Contents]

At the current state a few entities are applied in the transcriptions. The &and;, &And; and &et; entities are used for abbreviations. &tmp; (text-to-margin break) indicates the point where an addition proceeds from the main text area to the margin of the manuscript. &supdot; is a short notation for the editorial full stop, which occurs very often in the transcriptions. On &sp; see below.

<!ENTITY and "and">
<!ENTITY And "And">
<!ENTITY et "et">
<!ENTITY sp " ">
<!ENTITY supdot "<corr>.</corr>">
<!ENTITY tmb "|">

4. Line breaking and indentation conventions [Back to Contents]

To enhance the readabily of the XML documents we use hard carriage returns and indentation. As the lines between two carriage returns can be rather long, the documents are best viewed with a text editor that can toggle between non-wrapped layout (in which case the structure of the text becomes apparent) and soft-wrapped layout (in which case the entire text is visible). On the Mac platform for example the BBEdit text editor provides this functionality. Here is some sample code.

... doe not arise soe much from&sp;
<?B0?>
  their <del>nall</del> <add place="p69">naturall</add>
  <note resp="ed" n="trans">
    <hi rend="italic">abbreviation expanded for copyist</hi>
  </note>
<?E0?>&sp;
faculties as acquired habits. He would be laughed at that should goe&sp;
<?B0?>
  <note resp="ed" n="trans" targetEnd="EA8_40">
    <hi rend="italic">catchword not repeated on p. 70</hi>
  </note>
  about
  <anchor id="EA8_40"/>
<?E0?>&sp;
<pb n="70"/> to make a fine dancer ...

Note that at the end of some lines the &sp; entity is used, which is defined as a space. This is convenient because in our XML documents the carriage return and the indents are not considered as spaces. Another thing that requires some comment are the processing instructions <?B0?> and <?B0?>. These are used for controlling the TEI2TeX script. More information about this script is available in the TEI to TeX section