Difference between revisions of "Transcription"

From XML
Jump to: navigation, search
Line 1: Line 1:
The transcription containing the actual text of the edition is encoded within a <text> element in the XML/TEI file. The structure of the text, the source of the text, required annotations and all other peculiarities are taken into account.
The transcriptions that we make usually have a double purpose:
* they document the state of the text as encountered on the (manuscript or printed) page, thereby facilitating the creation of a diplomatic edition,
* they encode all the decsions made by the editors to create a reading text out of the diplomatic transcription, facilitating the display of a reading text.  
For the transcription of the manuscripts, the aim is to stay as close to the original text as possible. We focus on the production of diplomatic editions. This means that the edition is based on only one source and that the text and all graphic information will be displayed in accordance with this source. We don’t aspire a typographic imitation of the source, but aim for a functional reproduction of the text. As already stated, staying true to the text is key for the transcriptions. This means that deviations from the standard spelling and grammar are copied, changes made during writing - immediately or later - are documented, and if relevant, the physical structure of the source is reproduced.
==Structure of the text==
==Physical structure==
A text usually is not a ‘flat’ entity, but consists of different layer with their own hierarchy. These layers are reproduced within the XML file. The text of the edition is encoded in a <text> element, which contains a mandatory <body> element. The structure of a simple text would look as follows:
The logical structure was discussed under [[Text structure]]. The physical structure is usually encoded using so-called
milestone (empty) elements: they do not contain their content, but are placed where certain
properties of the text change. The elements that we commonly employ are
* linebreak: <lb>  
* page break: <pb>
Other layers of the text can be encoded with the following elements (if so desired supplemented with an @type attribute to further specify them):
* '''&lt;p>''' Paragraph;
* '''<ab>''' Anonymous block (an arbitrary unit of text, but without the semantic baggage of a paragraph);
* '''&lt;div>''' Section (i.e. chapter, rubric or another kind of section);
* '''<head>''' Titles of chapters, headings etc.;
* '''<group>''' To indicate different textst within the transcription.
==Reproduction of the source==
Linebreaks and page breaks are placed at the beginning of the line or page.
To reproduce all characteristics of a text and its medium in the digital edition we use a number of standard XML elements:
In many of our editions we want to be able to reproduce the physical linebreaks in the ms or printed text,
as well as the logical units that are the result of editorial interpretation. Because of that we usually encode <lb/>
elements even where the onset of a new paragraph or a new verse line would dictate a new linebreak anyway.
==Writing process and damage==
We use the following elements:
* '''<add>'''          Addition;
* '''<add>'''          Addition;
<pre>It is a <add>complete</add> text.</pre>
<pre>It is a <add>complete</add> text.</pre>
* '''<damage>''' Damage or text loss;
<pre><damage extent="whole leaf" agent="rubbing"> ... </damage></pre>
* '''&lt;del>''' Deletion;
* '''&lt;del>''' Deletion;
<pre>It is <del type=strikethrough>not</del> a complete text.</pre>
<pre>It is <del type=strikethrough>not</del> a complete text.</pre>
* '''<hi>'''         Highlight (if you need to highlight something and then specify how it appears in the source);
<pre><hi rend="underline">This text is underlined</hi></pre>
* '''<restore>''' Used to mark an earlier deletion that is undone;
* '''<restore>''' Used to mark an earlier deletion that is undone;
<pre><restore seq="2"><del seq="1">cocktail</del></restore>
<pre><restore seq="2"><del seq="1">cocktail</del></restore>
<del seq="2"><add seq="1">drink</add></del>
<del seq="2"><add seq="1">drink</add></del>
* '''<retrace>''' A letter or word retraced to clarify the intended word or letter
* '''<damage>''' Damage or text loss;
<pre><damage extent="whole leaf" agent="rubbing"> ... </damage></pre>
Two (or more) pieces of text that have switched position are encoded using the hi:transpose element. The transposed texts are written in their original order. The target attribute indicates where the text fragment is moved. Example:
<pre><hi:transpose seq="1" xml:id="i1" target="#i2">development</hi:transpose> <hi:transpose seq="1" xml:id="i2" target="#i1">art</hi:transpose></pre>
==retorical structure==
==Visual characteristics==
* '''<hi>'''         Highlight (if you need to highlight something and then specify how it appears in the source). The content of the rend-attribute is project-specific, though we try to use the same values for the same phenomena in all projects. If you need multiple values (like underlined and superscript) just enter them separated by a space: <hi rend="underline super">.
<pre><hi rend="underline">This text is underlined</hi></pre>
<pre><hi rend="underline bold">This text is underlined and bold</hi></pre>
==Reproduction of the source==
* '''<space>'''        Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines;
* '''<space>'''        Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines;
<pre><space dim="vertical" unit="lines" quantity="2"/></pre>
<pre><space dim="vertical" unit="lines" quantity="2"/></pre>
* '''<unclear>''' Uncertain reading of the text, for example due to damage;
* '''<unclear>''' Uncertain reading of the text, for example due to damage;
<pre>A single <unclear>word</unclear> is hard to read.</pre>
<pre>A single <unclear>word</unclear> is hard to read.</pre>
* <metamark> Indicates a sign added to aid in reading the text, for instance to indicate where a text is continuing.
==Editorial interventions==
===Incorrect text===
An incorrect text can be encoded in <sic>. The corresponding correction is incoded into <corr>. Both elements goes into <choice>, as in the example:
<pre><choice><sic type="grammar">Happely</sic><corr>Happily</corr></choice></pre>
===Supplied text===
Where the text is believed to be incomplete, editors can supply information:
<pre>The last letter is missin<supplied>g</supplied>.</pre>
Some attributes are available on a number of elements to further qualify the content:
* '''@rend'''         This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in  the source. The allowed values are defined in the schema;  
* '''@rend'''         This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in  the source. The allowed values are defined in the schema;  
* '''@place'''         Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer;
* '''@place'''         Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer;
* '''@type'''          Used to classify or sort elements.  
* '''@type'''          Used to classify or sort elements.  
Different sources will contain different sorts of specialties, so the XML elements that encode these qualities of the text will vary to a certain extent per project.  
Different source types have teheir own special characteristics, and the XML elements that encode these qualities of the text will vary to a certain extent per project.  
==See also==
==See also==

Revision as of 15:25, 26 April 2017

The transcriptions that we make usually have a double purpose:

  • they document the state of the text as encountered on the (manuscript or printed) page, thereby facilitating the creation of a diplomatic edition,
  • they encode all the decsions made by the editors to create a reading text out of the diplomatic transcription, facilitating the display of a reading text.

Physical structure

The logical structure was discussed under Text structure. The physical structure is usually encoded using so-called milestone (empty) elements: they do not contain their content, but are placed where certain properties of the text change. The elements that we commonly employ are

  • linebreak: <lb>
  • page break: <pb>

Linebreaks and page breaks are placed at the beginning of the line or page.

In many of our editions we want to be able to reproduce the physical linebreaks in the ms or printed text, as well as the logical units that are the result of editorial interpretation. Because of that we usually encode <lb/> elements even where the onset of a new paragraph or a new verse line would dictate a new linebreak anyway.

Writing process and damage

We use the following elements:

  • <add> Addition;
It is a <add>complete</add> text.
  • <del> Deletion;
It is <del type=strikethrough>not</del> a complete text.
  • <restore> Used to mark an earlier deletion that is undone;
<restore seq="2"><del seq="1">cocktail</del></restore>
<del seq="2"><add seq="1">drink</add></del>
  • <retrace> A letter or word retraced to clarify the intended word or letter
  • <damage> Damage or text loss;
<damage extent="whole leaf" agent="rubbing"> ... </damage>


Two (or more) pieces of text that have switched position are encoded using the hi:transpose element. The transposed texts are written in their original order. The target attribute indicates where the text fragment is moved. Example:

<hi:transpose seq="1" xml:id="i1" target="#i2">development</hi:transpose> <hi:transpose seq="1" xml:id="i2" target="#i1">art</hi:transpose>

retorical structure

Visual characteristics

  • <hi> Highlight (if you need to highlight something and then specify how it appears in the source). The content of the rend-attribute is project-specific, though we try to use the same values for the same phenomena in all projects. If you need multiple values (like underlined and superscript) just enter them separated by a space: <hi rend="underline super">.
<hi rend="underline">This text is underlined</hi>
<hi rend="underline bold">This text is underlined and bold</hi>

Reproduction of the source

  • <space> Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines;
<space dim="vertical" unit="lines" quantity="2"/>
  • <unclear> Uncertain reading of the text, for example due to damage;
A single <unclear>word</unclear> is hard to read.
  • <metamark> Indicates a sign added to aid in reading the text, for instance to indicate where a text is continuing.

Editorial interventions

Incorrect text

An incorrect text can be encoded in <sic>. The corresponding correction is incoded into <corr>. Both elements goes into <choice>, as in the example:

<choice><sic type="grammar">Happely</sic><corr>Happily</corr></choice>

Supplied text

Where the text is believed to be incomplete, editors can supply information:

The last letter is missin<supplied>g</supplied>.


Some attributes are available on a number of elements to further qualify the content:

  • @rend This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in the source. The allowed values are defined in the schema;
  • @place Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer;
  • @type Used to classify or sort elements.

Different source types have teheir own special characteristics, and the XML elements that encode these qualities of the text will vary to a certain extent per project.

See also