Difference between revisions of "Transcription (Mondrian)"

From XML
Jump to: navigation, search
(Physical structure)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
* '''<add>''' Addition
+
==Physical structure==
* '''&lt;del>''' Deletion
+
We encode a page break before each page. The page break is represented by a pb-element. The page
* '''&lt;hi>''' Highlight (if you need to highlight something and then specify how it appears in the source)
+
break element carries a f-attribute (refers to the folio, or sheet), a n-attribute (page number)  
* '''<retrace>''' A letter or word retraced to clarify the intended word or letter
+
and a facs-attribute (reference to the corresponding <surface> or <zone> element under <facsimile>).  
* '''<restore>''' Used to mark an earlier deletion that is undone
+
* '''@rend''' This is one of the global attributes in TEI, so it is allowed with a lot of elements. It indicates how the element in question was rendered or presented in the source. The allowed values are defined in the schema.
+
* '''@place''' Indicates the location of (e.g.) an addition (above, below, margin, etc) or the closer.
+
  
 +
Example:
 +
<pre><pb f="1r" n="1" facs="#z1r-1"/></pre>
  
==The stages of changes==
+
Paragraphs of text that belong at a certain position in the letter, even though they are written on
 +
another page, are placed in their logical, not physical,  position. In that case, we repeat the
 +
pb-element of the physical page to indicate the writing is continued on this earlier page.
 +
(We still need to discuss how this will be shown to the user.)
 +
 
 +
For example:
 +
Suppose Mondrian writes a two page-letter (one sheet) and after finishing the letter adds ‘BTW, say hello to your wife’ on page 1.
 +
We represent that as follows (simplified):
 +
 
 +
In the facsimile element:
 +
<pre><facsimile>
 +
<surface n="1" xml:id="s1r">...</surface>
 +
<surface n="2" xml:id="s1v">...</surface>
 +
</facsimile></pre>
 +
 
 +
In the text:
 +
<pre><pb f="1r" n="1" facs="#s1r"/> 
 +
(text of page 1)
 +
 
 +
<pb f="1v" n="2" facs="#s1v"/> 
 +
(text of page 2)
 +
 
 +
<pb f="1r" n="1" facs="#s1r"/> 
 +
<p>BTW, say hello to your wife</p></pre>
 +
 
 +
==Writing process and damage==
 
To encode different stages of changes, we use @seq
 
To encode different stages of changes, we use @seq
 
i.e. Mondrian deleted a word and then added a new one.
 
i.e. Mondrian deleted a word and then added a new one.
Line 20: Line 44:
  
 
If a single act of  modification requires multiple elements, these elements have the same seq attribute:
 
If a single act of  modification requires multiple elements, these elements have the same seq attribute:
 
 
<pre>Ik <del seq="1">heb</del><add seq="1">lees</add> het boek<del seq="1" gelezen</del>.</pre>
 
<pre>Ik <del seq="1">heb</del><add seq="1">lees</add> het boek<del seq="1" gelezen</del>.</pre>
  
 
On the del element, we can use the rend attribute to indicate it has been overwritten, either by an add or in a Sofortkorrektur:
 
On the del element, we can use the rend attribute to indicate it has been overwritten, either by an add or in a Sofortkorrektur:
 
 
<pre><del rend="overwritten">wel</del><add>niet</add></pre>
 
<pre><del rend="overwritten">wel</del><add>niet</add></pre>
  
 
A text fragment that has been modified is tagged as <seg> (segment), for the purpose of being able to display the multiple states of the text. It is up to the editor to choose meaningful segments. In the above example the sentence might be tagged as <seg>:
 
A text fragment that has been modified is tagged as <seg> (segment), for the purpose of being able to display the multiple states of the text. It is up to the editor to choose meaningful segments. In the above example the sentence might be tagged as <seg>:
 
 
<pre><seg>Ik <del seq="1" >heb</del><add seq="1" >lees</add het  
 
<pre><seg>Ik <del seq="1" >heb</del><add seq="1" >lees</add het  
 
boek<del seq="1" gelezen</del>.</seg></pre>
 
boek<del seq="1" gelezen</del>.</seg></pre>
Line 34: Line 55:
  
 
If an earlier deletion is restored we can encode this using <restore>. An example ('cocktail' replaced by 'drink', which is then deleted while 'cocktail' is being restored):
 
If an earlier deletion is restored we can encode this using <restore>. An example ('cocktail' replaced by 'drink', which is then deleted while 'cocktail' is being restored):
 
 
<pre><restore seq="2"><del seq="1">cocktail</del></restore>
 
<pre><restore seq="2"><del seq="1">cocktail</del></restore>
 
<del seq="2"><add seq="1">drink</add></del></pre>
 
<del seq="2"><add seq="1">drink</add></del></pre>
 
 
  
 
A Sofortkorrectur is not embedded in a seg-element, because there is no need to show the different states of the text. If it is desirable to show the scope of an immediate correction by overwriting, we use add (with seq="0"):
 
A Sofortkorrectur is not embedded in a seg-element, because there is no need to show the different states of the text. If it is desirable to show the scope of an immediate correction by overwriting, we use add (with seq="0"):
 
 
<pre><del seq="0">F</del><add seq="0">V</add>ics</pre>
 
<pre><del seq="0">F</del><add seq="0">V</add>ics</pre>
  
 
We don't use @seq on retrace.  
 
We don't use @seq on retrace.  
  
When a text  continues in the margin, that does not by itself make the margin text an addition (<add>). An addition is something added at a later stage. When Mondrian adds a sign to indicate where the text continues, we encode this sign as a metamark.
+
When a text  continues in the margin, that does not by itself make the margin text an addition (<add>).  
  
==Rend details==
+
==Visual characteristics==
If you need multiple values (like underlined and superscript) just enter them separated by a space: <hi rend="underline super">.
+
If a paragraph is indented, use rend="indent" on its first line (<lb rend="indent">).  
 
+
If a paragraph is indented, use rend="indent" on its first line (<lb>).  
+
  
 
We use rend="blockletters" for block capitals.
 
We use rend="blockletters" for block capitals.
Line 65: Line 80:
 
are completely equivalent.  
 
are completely equivalent.  
  
==Transpositions==
 
  
Two (or more) pieces of text that have switched position are encoded using the md:transpose element. The transposed texts are written in their original order. The target attribute indicates where the text fragment is moved. Example:
 
  
<pre><md:transpose seq="1" xml:id="i1" target="#i2">development</md:transpose> <md:transpose seq="1" xml:id="i2" target="#i1">art</md:transpose></pre>
+
==Foreign additions in the text==
 
+
Foreign additions unrelated to the contents of the letter, probably in another hand, we encode as <ab> (anonymous block).  
==Incorrect text==
+
‘Anonymous’ here refers not to the author being unknown, but to this being a block of text not identified as a paragraph,  
An incorrect text can be encoded in <sic>. The corresponding correction is incoded into <corr>. Both elements goes into <choice>, as in the example:
+
a list or another block-level element. We use the hand attribute to point to the probable writer and describe the hands  
 
+
in the document hands.xml.  
<pre><choice><sic type="grammar">Happely</sic><corr>Happily</corr></choice></pre>
+
 
+
==Unclear, illegible text==
+
If text is hard to read, either because it has been deleted or for another reason (bad handwriting), it is encoded as <unclear>. If for instance if the word “removed’ has been deleted, but we’re not sure about the last three letters, we encode:
+
 
+
<pre><del>remo<unclear>ved</unclear></del></pre>
+
+
When two readings are possible, we can use <choice > to group them. If the last letter in “free” could also be read as “i”, we encode that as:
+
 
+
<pre>Fre<choice><unclear>e</unclear><unclear>i</unclear></choice></pre>
+
+
But an <unclear> can also occur by itself:
+
 
+
<pre>A single <unclear>word</unclear> is hard to read.</pre>
+
 
+
If text is completely illegible and cannot be transcribed at all the <gap> element is used. The size of the gap can be indicated using the unit and quantity attributes.
+
 
+
<pre>A single <gap quantity="1" unit="word"/> is illegible.</pre>
+
 
+
==Empty lines==
+
Encode using the <space> element, using dim="vertical" and the unit and quantity attributes to indicate the number of lines. For example:
+
 
+
<pre><space dim="vertical" unit="lines" quantity="2"/></pre>
+
 
+
==Hyphenation and other dashes==
+
<c type="wbh">-</c> is used to encode a hyphen that divide a word at the end of the line (and only if Mondrian uses it, not where he should have used it). Other kinds of hyphen are not encoded.
+
 
+
If Mondrian writes:
+
<pre>
+
    normal-
+
ly
+
</pre>
+
we encode (Don’t introduce whitespace!):
+
<pre>normal<c type="wbh">-</c><lb/>ly</pre>
+
 
+
         
+
If Mondrian writes:
+
<pre>
+
    Normal
+
Ly
+
</pre>
+
we encode:
+
<pre>normal<lb/>ly</pre>
+
 
+
If Mondrian writes:
+
<pre>
+
    well-
+
known brands
+
</pre>
+
we encode:
+
<pre>well-<lb/>known brands</pre>
+
 
+
 
+
The mdash corresponds to unicode code point x2014. In [[Manual oXygen|Oxygen]], it can be entered through the Symbol button (if needed, add the Symbol toolbar), or use the Edit menu, option Insert from Character Map.
+
 
+
==Notes in the text==
+
Notes unrelated to the contents of the letter, possibly in another hand, we encode as <ab> (anonymous block). ‘Anonymous’ here refers not to the author being unknown, but to this being a block of text not identified as a paragraph, a list or another block-level element. We use the hand attribute to point to the probable writer and describe the hands in the document hands.xml.  
+
  
 
For instance (this is an example of an changed address provided by an anonymous person):
 
For instance (this is an example of an changed address provided by an anonymous person):
Line 149: Line 105:
 
See [[Transcription: annotations (Mondrian)|annotations]] for notes as used to annotate the text.
 
See [[Transcription: annotations (Mondrian)|annotations]] for notes as used to annotate the text.
  
==Postscripts==
 
A postscript is not necessarily indicated by P.S. A postscript is any text added as an afterthought after a letter has been signed. A postscript contains at least one or more paragraphs:
 
<pre>
 
<postscript>
 
<p>Say hello to your mother.<p>
 
</postscript>
 
</pre>
 
A letter can have multiple postscripts. Postscripts can be numbered (using the n-attribute) to indicated a logical sequence. 
 
  
 
==Envelopes==
 
==Envelopes==

Latest revision as of 16:12, 26 April 2017

Physical structure

We encode a page break before each page. The page break is represented by a pb-element. The page break element carries a f-attribute (refers to the folio, or sheet), a n-attribute (page number) and a facs-attribute (reference to the corresponding <surface> or <zone> element under <facsimile>).

Example:

<pb f="1r" n="1" facs="#z1r-1"/>

Paragraphs of text that belong at a certain position in the letter, even though they are written on another page, are placed in their logical, not physical, position. In that case, we repeat the pb-element of the physical page to indicate the writing is continued on this earlier page. (We still need to discuss how this will be shown to the user.)

For example: Suppose Mondrian writes a two page-letter (one sheet) and after finishing the letter adds ‘BTW, say hello to your wife’ on page 1. We represent that as follows (simplified):

In the facsimile element:

<facsimile>
 <surface n="1" xml:id="s1r">...</surface>
 <surface n="2" xml:id="s1v">...</surface>
</facsimile>

In the text:

<pb f="1r" n="1" facs="#s1r"/>   
(text of page 1)

<pb f="1v" n="2" facs="#s1v"/>   
(text of page 2)

<pb f="1r" n="1" facs="#s1r"/>   
<p>BTW, say hello to your wife</p>

Writing process and damage

To encode different stages of changes, we use @seq i.e. Mondrian deleted a word and then added a new one. @seq assigns a sequence number related to the order in which the encoded features carrying this attribute are believed to have occurred.

<del seq="1">yellow</del >
<add seq="2" place="above">red</add>

You can also use seq=0 for immediate deletions (deletions while writing or Sofortkorrektur).

If a single act of modification requires multiple elements, these elements have the same seq attribute:

Ik <del seq="1">heb</del><add seq="1">lees</add> het boek<del seq="1" gelezen</del>.

On the del element, we can use the rend attribute to indicate it has been overwritten, either by an add or in a Sofortkorrektur:

<del rend="overwritten">wel</del><add>niet</add>

A text fragment that has been modified is tagged as <seg> (segment), for the purpose of being able to display the multiple states of the text. It is up to the editor to choose meaningful segments. In the above example the sentence might be tagged as <seg>:

<seg>Ik <del seq="1" >heb</del><add seq="1" >lees</add het 
boek<del seq="1" gelezen</del>.</seg>


If an earlier deletion is restored we can encode this using <restore>. An example ('cocktail' replaced by 'drink', which is then deleted while 'cocktail' is being restored):

<restore seq="2"><del seq="1">cocktail</del></restore>
<del seq="2"><add seq="1">drink</add></del>

A Sofortkorrectur is not embedded in a seg-element, because there is no need to show the different states of the text. If it is desirable to show the scope of an immediate correction by overwriting, we use add (with seq="0"):

<del seq="0">F</del><add seq="0">V</add>ics

We don't use @seq on retrace.

When a text continues in the margin, that does not by itself make the margin text an addition (<add>).

Visual characteristics

If a paragraph is indented, use rend="indent" on its first line (<lb rend="indent">).

We use rend="blockletters" for block capitals.

There is technically no difference between using a rend-attribute on an existing element and using a hi-element with that rend-attribute within that existing element. So

<addrLine rend="underline">New York City</addrLine>

and

<addrLine><hi rend="underline">New York City</hi></addrLine>

are completely equivalent.


Foreign additions in the text

Foreign additions unrelated to the contents of the letter, probably in another hand, we encode as <ab> (anonymous block). ‘Anonymous’ here refers not to the author being unknown, but to this being a block of text not identified as a paragraph, a list or another block-level element. We use the hand attribute to point to the probable writer and describe the hands in the document hands.xml.

For instance (this is an example of an changed address provided by an anonymous person):

<div type="envelope" xml:id="PD">
   <!-- envelope recto -->
   <pb n="envelope-r" xml:id="env-r" facs="#zone-env-r"/> 
   <div type="postalData">
      <md:postmark>Paris XIV … </md:postmark>
       <address type="receiver">
         …
       </address>
   </div>	
   <ab hand="hands.xml#anon">James Abbott // 20xx Newbold Eve // Bronx</ab>
…
</div>    

See annotations for notes as used to annotate the text.


Envelopes

Encode as div type=envelope. See SampleLetterWithEnvelope.xml. The addresses are encoded as divs with type="postalData". The address of the receiver goes on the front (recto) side of the envelope. Code the addresses as <address> with <addrline>s. Give the <address> a type-attribute (receiver, sender). <addrline>-elements are preceded by <lb>-elements if they begin on a new line. Short descriptive phrases (‘sent by’, ‘sender’, ‘To’) before the address go into <label>-elements.

An example of an address:

<address type="receiver">
    <lb/><addrLine rend="underline2">M<hi rend="super underline">r</hi>
        Harry Holtzman</addrLine>
    <lb/><addrLine>231 East 60<hi rend="super">th</hi> Street</addrLine>
    <lb/><addrLine rend="underline">New York City</addrLine>
</address>

See also