Difference between revisions of "Transcription"
(Add clarification.) |
|||
(14 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | The transcriptions that we make usually have a double purpose: | |
+ | * they document the state of the text as encountered on the (manuscript or printed) page, thereby facilitating the creation of a diplomatic edition, | ||
+ | * they encode all the decsions made by the editors to create a reading text out of the diplomatic transcription, facilitating the display of a reading text. | ||
− | |||
− | |||
+ | ==Physical structure== | ||
+ | The logical structure was discussed under [[Text structure]]. The physical structure is usually encoded using so-called | ||
+ | milestone (empty) elements: they do not contain their content, but are placed where certain | ||
+ | properties of the text change. The elements that we commonly employ are | ||
+ | * linebreak: <lb> | ||
+ | * page break: <pb> | ||
+ | |||
+ | Linebreaks and page breaks are placed at the beginning of the line or page. | ||
+ | |||
+ | In many of our editions we want to be able to reproduce the physical linebreaks in the ms or printed text, | ||
+ | as well as the logical units that are the result of editorial interpretation. Because of that we usually encode <lb/> | ||
+ | elements even where the onset of a new paragraph or a new verse line would dictate a new linebreak anyway. | ||
+ | |||
+ | |||
+ | ==Writing process== | ||
+ | We use the following elements: | ||
* '''<add>''' Addition; | * '''<add>''' Addition; | ||
<pre>It is a <add>complete</add> text.</pre> | <pre>It is a <add>complete</add> text.</pre> | ||
− | |||
− | |||
* '''<del>''' Deletion; | * '''<del>''' Deletion; | ||
<pre>It is <del type=strikethrough>not</del> a complete text.</pre> | <pre>It is <del type=strikethrough>not</del> a complete text.</pre> | ||
− | |||
− | |||
* '''<restore>''' Used to mark an earlier deletion that is undone; | * '''<restore>''' Used to mark an earlier deletion that is undone; | ||
<pre><restore seq="2"><del seq="1">cocktail</del></restore> | <pre><restore seq="2"><del seq="1">cocktail</del></restore> | ||
<del seq="2"><add seq="1">drink</add></del> | <del seq="2"><add seq="1">drink</add></del> | ||
</pre> | </pre> | ||
+ | * '''<retrace>''' A letter or word retraced to clarify the intended word or letter | ||
+ | |||
+ | ===Transpositions=== | ||
+ | |||
+ | Two (or more) pieces of text that have switched position are encoded using the hi:transpose element. The transposed texts are written in their original order. The target attribute indicates where the text fragment is moved. Example: | ||
+ | |||
+ | <pre><hi:transpose seq="1" xml:id="i1" target="#i2">development</hi:transpose> <hi:transpose seq="1" xml:id="i2" target="#i1">art</hi:transpose></pre> | ||
+ | |||
+ | |||
+ | ==Retorical structure== | ||
+ | |||
+ | ===Opener and closer=== | ||
+ | The beginning and the ending of e.g. a letter are encoded into <opener> and <closer>. | ||
+ | * In the <opener> common elements are: <address> <date> <salute>, in different orders. | ||
+ | * In the <closer> common elements are: <salute> <signed>, but it is possible to have <address> and <date> in it. | ||
+ | Keep <closer> limited to the really closing elements. | ||
+ | |||
+ | Remarks: | ||
+ | * <p> is not allowed within <opener> and <closer>. | ||
+ | * If the text of a letter starts immediately after the salute, on the same line, we still encode the opener/salute outside of the paragraph. | ||
+ | In this case, the opener ends halfway through the line: | ||
+ | <pre><opener><lb/><salute>Cher ami,</salute></opener> | ||
+ | <p>peux-tu venir me | ||
+ | <lb/>voir demain dans la matinée? … </p></pre> | ||
+ | * In the same way, the closer need not start on a new line. | ||
+ | |||
+ | |||
+ | ===Postscripts=== | ||
+ | A postscript is not necessarily indicated by P.S. A postscript is any text added as | ||
+ | an afterthought after a letter has been signed. A postscript contains at least one or more paragraphs: | ||
+ | <pre> | ||
+ | <postscript> | ||
+ | <p>Say hello to your mother.<p> | ||
+ | </postscript> | ||
+ | </pre> | ||
+ | A letter can have multiple postscripts. Postscripts can be numbered (using the n-attribute) to indicated a logical sequence. | ||
+ | |||
+ | |||
+ | ==Visual characteristics== | ||
+ | * '''<hi>''' Highlight (if you need to highlight something and then specify how it appears in the source). The content of the rend-attribute is project-specific, though we try to use the same values for the same phenomena in all projects. If you need multiple values (like underlined and superscript) just enter them separated by a space: <hi rend="underline super">. | ||
+ | <pre><hi rend="underline">This text is underlined</hi></pre> | ||
+ | <pre><hi rend="underline bold">This text is underlined and bold</hi></pre> | ||
+ | * '''<hi:sepLine>''' Use for lines (not verselines or manuscript lines but graphical lines) that authors use to separate writing sections. Individual projects may want to further specify the nature of the line using the rend attribute. | ||
+ | |||
+ | ==Reproduction of the source== | ||
* '''<space>''' Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines; | * '''<space>''' Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines; | ||
<pre><space dim="vertical" unit="lines" quantity="2"/></pre> | <pre><space dim="vertical" unit="lines" quantity="2"/></pre> | ||
− | * '''<unclear>''' Uncertain reading of the text, for example due to damage | + | * '''<unclear>''' Uncertain reading of the text, for example due to damage or bad handwriting. If there is a number of possible readings, |
+ | group the <unclear> elements within a <choice>: | ||
<pre>A single <unclear>word</unclear> is hard to read.</pre> | <pre>A single <unclear>word</unclear> is hard to read.</pre> | ||
+ | <pre>A single wor<choice><unclear>d</unclear><unclear>k</unclear></choice> is hard to read.</pre> | ||
+ | * '''<gap>''' If text is completely illegible and cannot be transcribed at all the <gap> element is used. | ||
+ | The size of the gap can be indicated using the unit and quantity attributes. | ||
+ | <pre>A single <gap quantity="1" unit="word"/> is illegible.</pre> | ||
+ | * '''<metamark>''' Indicates a sign added to aid in reading the text, for instance to indicate where a text is continuing. | ||
+ | * '''<damage>''' Damage or text loss; | ||
+ | <pre><damage extent="whole leaf" agent="rubbing"> ... </damage></pre> | ||
+ | |||
+ | ===Hyphenation and other dashes=== | ||
+ | <c type="wbh">-</c> is used to encode an end-of-line hyphen. We encode it only where it has been used, not where is should have been used! | ||
+ | Other kinds of hyphen receive no special encoding. | ||
+ | |||
+ | If the source contains: | ||
+ | <pre> | ||
+ | normal- | ||
+ | ly | ||
+ | </pre> | ||
+ | we encode (Don’t introduce whitespace!): | ||
+ | <pre>normal<c type="wbh">-</c><lb/>ly</pre> | ||
+ | |||
+ | |||
+ | If the source contains: | ||
+ | <pre> | ||
+ | Normal | ||
+ | ly | ||
+ | </pre> | ||
+ | we encode: | ||
+ | <pre>normal<lb/>ly</pre> | ||
+ | |||
+ | If the source contains: | ||
+ | <pre> | ||
+ | well- | ||
+ | known brands | ||
+ | </pre> | ||
+ | we encode: | ||
+ | <pre>well-<lb/>known brands</pre> | ||
+ | |||
+ | ===Special characters=== | ||
+ | * — (mdash): corresponds to unicode code point x2014. | ||
+ | |||
+ | In [[Manual oXygen|Oxygen]], they can be entered through the Symbol button (if needed, add the Symbol toolbar), or use the Edit menu, option Insert from Character Map, or by typing the appropriate code, e.g. &x2014; for the mdash. | ||
+ | |||
+ | ==Editorial interventions== | ||
+ | |||
+ | ===Incorrect text=== | ||
+ | An incorrect text can be encoded in <sic>. The corresponding correction is incoded into <corr>. Both elements go into <choice>, as in the example: | ||
+ | <pre><choice><sic type="grammar">Happely</sic><corr>Happily</corr></choice></pre> | ||
+ | The elements <sic> and <corr> can also be used by themselves, but if the editor provides both the incorrect text and the correction it is advised to group them with <choice>. | ||
+ | |||
+ | ===Supplied text=== | ||
+ | Where the text is believed to be incomplete, editors can supply information: | ||
+ | <pre>The last letter is missin<supplied>g</supplied>.</pre> | ||
+ | |||
+ | |||
+ | ==Attributes== | ||
+ | Some attributes are available on a number of elements to further qualify the content: | ||
* '''@rend''' This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in the source. The allowed values are defined in the schema; | * '''@rend''' This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in the source. The allowed values are defined in the schema; | ||
* '''@place''' Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer; | * '''@place''' Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer; | ||
* '''@type''' Used to classify or sort elements. | * '''@type''' Used to classify or sort elements. | ||
− | + | ||
+ | Different source types have teheir own special characteristics, and the XML elements that encode these qualities of the text will vary to a certain extent per project. | ||
==See also== | ==See also== |
Latest revision as of 10:31, 23 January 2018
The transcriptions that we make usually have a double purpose:
- they document the state of the text as encountered on the (manuscript or printed) page, thereby facilitating the creation of a diplomatic edition,
- they encode all the decsions made by the editors to create a reading text out of the diplomatic transcription, facilitating the display of a reading text.
Contents
Physical structure
The logical structure was discussed under Text structure. The physical structure is usually encoded using so-called milestone (empty) elements: they do not contain their content, but are placed where certain properties of the text change. The elements that we commonly employ are
- linebreak: <lb>
- page break: <pb>
Linebreaks and page breaks are placed at the beginning of the line or page.
In many of our editions we want to be able to reproduce the physical linebreaks in the ms or printed text, as well as the logical units that are the result of editorial interpretation. Because of that we usually encode <lb/> elements even where the onset of a new paragraph or a new verse line would dictate a new linebreak anyway.
Writing process
We use the following elements:
- <add> Addition;
It is a <add>complete</add> text.
- <del> Deletion;
It is <del type=strikethrough>not</del> a complete text.
- <restore> Used to mark an earlier deletion that is undone;
<restore seq="2"><del seq="1">cocktail</del></restore> <del seq="2"><add seq="1">drink</add></del>
- <retrace> A letter or word retraced to clarify the intended word or letter
Transpositions
Two (or more) pieces of text that have switched position are encoded using the hi:transpose element. The transposed texts are written in their original order. The target attribute indicates where the text fragment is moved. Example:
<hi:transpose seq="1" xml:id="i1" target="#i2">development</hi:transpose> <hi:transpose seq="1" xml:id="i2" target="#i1">art</hi:transpose>
Retorical structure
Opener and closer
The beginning and the ending of e.g. a letter are encoded into <opener> and <closer>.
- In the <opener> common elements are: <address> <date> <salute>, in different orders.
- In the <closer> common elements are: <salute> <signed>, but it is possible to have <address> and <date> in it.
Keep <closer> limited to the really closing elements.
Remarks:
-
is not allowed within <opener> and <closer>.
- If the text of a letter starts immediately after the salute, on the same line, we still encode the opener/salute outside of the paragraph.
In this case, the opener ends halfway through the line:
<opener><lb/><salute>Cher ami,</salute></opener> <p>peux-tu venir me <lb/>voir demain dans la matinée? … </p>
- In the same way, the closer need not start on a new line.
Postscripts
A postscript is not necessarily indicated by P.S. A postscript is any text added as an afterthought after a letter has been signed. A postscript contains at least one or more paragraphs:
<postscript> <p>Say hello to your mother.<p> </postscript>
A letter can have multiple postscripts. Postscripts can be numbered (using the n-attribute) to indicated a logical sequence.
Visual characteristics
- <hi> Highlight (if you need to highlight something and then specify how it appears in the source). The content of the rend-attribute is project-specific, though we try to use the same values for the same phenomena in all projects. If you need multiple values (like underlined and superscript) just enter them separated by a space: <hi rend="underline super">.
<hi rend="underline">This text is underlined</hi>
<hi rend="underline bold">This text is underlined and bold</hi>
- <hi:sepLine> Use for lines (not verselines or manuscript lines but graphical lines) that authors use to separate writing sections. Individual projects may want to further specify the nature of the line using the rend attribute.
Reproduction of the source
- <space> Empty lines, using dim="vertical" and the unit and quantity attributes to indicate the number of lines;
<space dim="vertical" unit="lines" quantity="2"/>
- <unclear> Uncertain reading of the text, for example due to damage or bad handwriting. If there is a number of possible readings,
group the <unclear> elements within a <choice>:
A single <unclear>word</unclear> is hard to read.
A single wor<choice><unclear>d</unclear><unclear>k</unclear></choice> is hard to read.
- <gap> If text is completely illegible and cannot be transcribed at all the <gap> element is used.
The size of the gap can be indicated using the unit and quantity attributes.
A single <gap quantity="1" unit="word"/> is illegible.
- <metamark> Indicates a sign added to aid in reading the text, for instance to indicate where a text is continuing.
- <damage> Damage or text loss;
<damage extent="whole leaf" agent="rubbing"> ... </damage>
Hyphenation and other dashes
<c type="wbh">-</c> is used to encode an end-of-line hyphen. We encode it only where it has been used, not where is should have been used! Other kinds of hyphen receive no special encoding.
If the source contains:
normal- ly
we encode (Don’t introduce whitespace!):
normal<c type="wbh">-</c><lb/>ly
If the source contains:
Normal ly
we encode:
normal<lb/>ly
If the source contains:
well- known brands
we encode:
well-<lb/>known brands
Special characters
- — (mdash): corresponds to unicode code point x2014.
In Oxygen, they can be entered through the Symbol button (if needed, add the Symbol toolbar), or use the Edit menu, option Insert from Character Map, or by typing the appropriate code, e.g. &x2014; for the mdash.
Editorial interventions
Incorrect text
An incorrect text can be encoded in <sic>. The corresponding correction is incoded into <corr>. Both elements go into <choice>, as in the example:
<choice><sic type="grammar">Happely</sic><corr>Happily</corr></choice>
The elements <sic> and <corr> can also be used by themselves, but if the editor provides both the incorrect text and the correction it is advised to group them with <choice>.
Supplied text
Where the text is believed to be incomplete, editors can supply information:
The last letter is missin<supplied>g</supplied>.
Attributes
Some attributes are available on a number of elements to further qualify the content:
- @rend This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in the source. The allowed values are defined in the schema;
- @place Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer;
- @type Used to classify or sort elements.
Different source types have teheir own special characteristics, and the XML elements that encode these qualities of the text will vary to a certain extent per project.