Textual Editing with the TEI Or, Documentary Editing with the TEI - - PowerPoint PPT Presentation

textual editing with the tei or documentary editing with
SMART_READER_LITE
LIVE PREVIEW

Textual Editing with the TEI Or, Documentary Editing with the TEI - - PowerPoint PPT Presentation

Textual Editing with the TEI Or, Documentary Editing with the TEI Or, TEI for Text Bearing Objects Lou Burnard Consulting July 2014 1/78 What is this thing? . 1 Its a text ! . 2 It's a document ! . 3 It's a moment in a developing


slide-1
SLIDE 1

Textual Editing with the TEI Or, Documentary Editing with the TEI Or, TEI for Text Bearing Objects

Lou Burnard Consulting July 2014

1/78

slide-2
SLIDE 2

What is this thing?

.

1

Its a text ! .

2

It's a document ! .

3

It's a moment in a developing process !

2/78

slide-3
SLIDE 3

Digital Editing with the TEI

Topics we will cover: Texts vs. documents TEI markup for facsimile editions TEI markup for transcription TEI ‘genetic’ markup Topics we won't : Marking up an existing collation Creating facsimile editions Documenting sources with <msDesc> Software for managing, creating, or visualising scholarly editions

3/78

slide-4
SLIDE 4

The digital turn

The humanities are all about text (non-digital) books, manuscripts, archival papers... ... as well as other -- increasingly digital -- cultural manifestations such as sounds, images, blogs, tweets ... The digital humanities are all about digital technologies and techniques for manipulating such manifestations in an integrated way . . Markup (aka encoding or tagging) is one of the key technologies behind such integration.

4/78

slide-5
SLIDE 5

What does markup do?

It makes explicit to a processor how something should be processed. Historically, "markup" was what told a typesetter how to deal with a manuscript Nowadays, it is what tells a computer program how to deal with a stream of textual data.

5/78

slide-6
SLIDE 6

Where is the textual data and where is the markup?

6/78

slide-7
SLIDE 7

Where is the textual data and where is the markup?

7/78

slide-8
SLIDE 8

Which textual data matters most?

the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody...

8/78

slide-9
SLIDE 9

Which textual data matters most?

the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody...

8/78

slide-10
SLIDE 10

Which textual data matters most?

the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody...

8/78

slide-11
SLIDE 11

Which textual data matters most?

the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody...

8/78

slide-12
SLIDE 12

Which textual data matters most?

the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody...

8/78

slide-13
SLIDE 13

Document and text

. . A "document" is something that exists in the world, which we can digitize. . . A "text" is an abstraction, created by or for a community of readers, which we can encode.

9/78

slide-14
SLIDE 14

The document as ‘Text-Bearing Object’

. . Materia appetit formam ut virum foemina Traditionally, we distinguish form and content In the same way, we think of an inscription or a manuscript as the bearer or container or form instantiating an abstract notion

  • - a text

. . But don't forget... digital texts are also TBOs!

10/78

slide-15
SLIDE 15

A word from our sponsor

11/78

slide-16
SLIDE 16

Digital simulacra

Texts are four dimensional: a document has a physical presence with visual aspects (some

  • f) which may be transferred more or less automatically from
  • ne document to another

a text has linguistic and structural properties which may be transcribed, translated, and transmitted, but only with some human intervention a text conveys information about the real world, which may be understood (or not), annotated, or used to generate new texts texts and documents usually have associated metadata, documenting what it is, where it came from, its history etc. . . Good markup thus has to operate in all of these dimensions

12/78

slide-17
SLIDE 17

Ebooks, for example

An ebook provides : a surrogate for the appearance of a pre-existing (non-digital) document a re-presentation of that document's linguistic and structural content annotations explaining the context in which it was originally produced and the ideas it contains Managing large numbers of such resources requires good descriptions ("metadata") which make possible "intelligent" complex searching and analysis . . Increasingly we want to share and integrate (or mash-up) these digital resources in new and unexpected ways

13/78

slide-18
SLIDE 18

Editorial underpinnings

Textual editing inevitably reflects a theoretical stance about what a text is, or should be. But there are many conflicting theories/traditions about the editing of texts: Greg, Bowers, McKerrow, Tanselle ... . Greetham, McCann, Shillingsburg ... historisch-kritische Ausgabe (aka ‘The Germans’) l'édition génétique (aka ‘The French’) As facilitator of multiple theories, the TEI tries to avoid a theoretical stance, but rarely succeeds ...

14/78

slide-19
SLIDE 19

Old Skool Textual Criticism

This sort of thing... . . A complex print format containing information whose structure it might be useful to encode... cf dictionaries.

15/78

slide-20
SLIDE 20

Looking closer at a simple example

The following line from Hamlet might be printed as:

. . LAERTES. Alas, then she is drowned.

together with the following critical apparatus:

. . 4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.

16/78

slide-21
SLIDE 21

Critical Apparatus: <app>, <rdg>, and <lem>

<app> (apparatus entry) contains one entry in a critical apparatus, with an optional lemma and at least one reading. <rdg> (reading) contains a single reading within a textual variation. <lem> (lemma) contains the lemma, or base text, of a textual variation.

17/78

slide-22
SLIDE 22

For example ...

. . <app> <lem>Alas, then she is drowned.</lem> <rdg wit="#Hib">Alas, then is she drowned.</rdg> <rdg wit="#F">Alas then, is she drown'd?</rdg> <rdg wit="#Q3">Alas then is she drownd.</rdg> <rdg wit="#Q2">Alas, then, she is drownd.</rdg> <rdg wit="#Q1">So, she is drownde:</rdg> </app>

18/78

slide-23
SLIDE 23

Modelling textual variation

Schmidt's model of ‘multiversion documents’: http://multiversiondocs.blogspot.com Not unlike Sperberg-McQueen's Rhine-delta model from 1989 http://cmsmcq.com/1989/ rhine-delta-abstract.html, this probably provides a better data structure for representing the results of automatic collation. . . But in fact it seems that people don't care that much about pre-existing collations. They want to make their own, sharing

  • utputs from collation engines such as Juxta.

19/78

slide-24
SLIDE 24

Transcription of primary sources using the TEI

<text> : contains a structured reading of a document's intellectual content ... its ‘text’ <facsimile> : organizes a set of page images representing a document <sourceDoc >: contains a structured representation of a document considered purely as a physical object <teiHeader> : provides metadata for the whole thing, at various levels, notably including a <msDesc> . . A <TEI> element contains at least a <teiHeader>, followed by as many of the others as you wish to encode.

20/78

slide-25
SLIDE 25

A digital facsimile edition

In the simplest case, we just want to organize a series of page image files so that an application will display them correctly.

. . <TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <!-- metadata describing our digital edition --> </teiHeader> <facsimile> <graphic url="page1r.png"/> <graphic url="page1v.png"/> <graphic url="page2r.png"/> <graphic url="page2v.png"/> </facsimile> </TEI>

This method lacks structure...

21/78

slide-26
SLIDE 26

Structuring a digital facsimile

we might have several graphics for the same component surface we might want to indicate groupings of surfaces e.g. leaves of a codex

. . <facsimile> <surfaceGrp type="leaf"> <surface> <graphic url="page1r.png"/> <graphic url="page1r.tiff"/> </surface> <surface> <graphic url="page1v.png"/> </surface> </surfaceGrp> </facsimile>

22/78

slide-27
SLIDE 27

Structuring a digital facsimile

We might also want to distinguish zones, rectangular or otherwise, within a surface. A <zone> is any polygon identified within a surface Its location may be indicated by means of the @points attribute Points defining a zone must use the coordinate system defined for the surface A coordinate system defines a range of values for the (x,y) point-pairs defining a two-dimensional polygon: not a measurement

. . <facsimile> <surface ulx="0" uly="0" lrx="40" lry="30"> <graphic url="page1r.png"/> <zone points="22,10 30,21 17,25 12,23"> <graphic url="page1rdetail.png"/> </zone> </surface> </facsimile>

23/78

slide-28
SLIDE 28

Aligning images and transcription

The @facs attribute, available on any transcriptional element, points to a <zone>, <surface>, or (in the simplest case) a <graphic> (And the @start attribute on <zone> or <surface> can also point into a transcription)

. . <facsimile> <surfaceGrp type="leaf"> <surface xml:id="p1r"> <graphic url="page1r.png"/> <graphic url="page1r.tiff"/> </surface> <surface xml:id="p1v"> <graphic url="page1v.png"/> </surface> </surfaceGrp> </facsimile> <text> <pb facs="#p1r"/> <!-- text from page 1 recto transcribed here --> <pb facs="#p1v"/> <!-- text from page 1 verso transcribed here --> </text> 24/78

slide-29
SLIDE 29

A more complex example (1)

We can identify several distinct zones in this page: The heading the ornamental capital the picture of a bell ...

25/78

slide-30
SLIDE 30

A more complex example (2)

. . <facsimile> <surface xml:id="B49r" ulx="0" uly="0" lrx="52" lry="32"> <graphic url="bovelles.png"/> <zone xml:id="B49rHead" ulx="4" uly="4" lrx="27" lry="13"/> <!-- contains the title --> <zone xml:id="B49rCap" ulx="4" uly="31" lrx="12" lry="38"/> <!-- contains the ornamental capital --> <zone xml:id="B49rFig" ulx="10" uly="40" lrx="27" lry="52"/> <!-- contains the diagram of a bell --> </surface> </facsimile> 26/78

slide-31
SLIDE 31

And the transcription

. . <facsimile> <surface xml:id="B49r" ulx="0" uly="0" lrx="52" lry="32"> <graphic url="bovelles.png"/> <zone xml:id="B49rHead" ulx="4" uly="4" lrx="27" lry="13"/> <!-- contains the title --> <zone xml:id="B49rCap" ulx="4" uly="31" lrx="12" lry="38"/> <!-- contains the ornamental capital --> <zone xml:id="B49rFig" ulx="10" uly="40" lrx="27" lry="52"/> <!-- contains the diagram of a bell --> </surface> </facsimile> <text> <body> <pb facs="#B49r"/> <fw>De Geometrie 159</fw> <head facs="#B49rHead"> DU SON ET ACCORD DES CLOCHES ET DES ALleures des cheuaux, chariots & charges: des fontaines,& encyclie du monde: & de la dimension du corps humain.</head> <head>Chapitre Septiesme.</head> <div n="1"> <p>Le son & accord des cloches pendans en ung mesme axe, est faict en contraires parties.</p> <p> <g facs="#B49rCap">L</g>Es cloches ont quasi figures de rondes pyramides imperfaictes & irregulieres: & leur accord se fait par reigle Geometrique: comme si les deux cloches C & D sont pendantes à vn mesme axe, ou essieu, A B: je dy que leur accord se fera en contraires parties co<ex>m</ex>me voyez icy figuré. Car quand l'vne sera en haut, l'autre declinera en bas. Autreme<ex>m</ex>t si elles de.<figure facs="#B49rFig1"/> <pb/> <!-- ...

  • ->

</p> </div> </body> </text> 27/78

slide-32
SLIDE 32

Surfaces and zones...

The relationship between surface and zone can be quite complex :

28/78

slide-33
SLIDE 33

Multi-part surfaces

Zones can cross surface and zone boundaries :

29/78

slide-34
SLIDE 34

Transcription : a can of worms

. . What's going on here? .

1

‘agreable’ is struck-through, ‘pleasing’ is written above it, in the interlinear space. .

2

‘agreable’ is deleted and replaced by ‘pleasing’ .

3

Originally, the text read ‘agreable’ , but at some subsequent stage this word was deleted; the word ‘pleasing’ was added in the same context.

30/78

slide-35
SLIDE 35

Transcription : a can of worms

. . What's going on here? .

1

‘agreable’ is struck-through, ‘pleasing’ is written above it, in the interlinear space. .

2

‘agreable’ is deleted and replaced by ‘pleasing’ .

3

Originally, the text read ‘agreable’ , but at some subsequent stage this word was deleted; the word ‘pleasing’ was added in the same context.

30/78

slide-36
SLIDE 36

Transcription : a can of worms

. . What's going on here? .

1

‘agreable’ is struck-through, ‘pleasing’ is written above it, in the interlinear space. .

2

‘agreable’ is deleted and replaced by ‘pleasing’ .

3

Originally, the text read ‘agreable’ , but at some subsequent stage this word was deleted; the word ‘pleasing’ was added in the same context.

30/78

slide-37
SLIDE 37

Transcription : a can of worms

. . What's going on here? .

1

‘agreable’ is struck-through, ‘pleasing’ is written above it, in the interlinear space. .

2

‘agreable’ is deleted and replaced by ‘pleasing’ .

3

Originally, the text read ‘agreable’ , but at some subsequent stage this word was deleted; the word ‘pleasing’ was added in the same context.

30/78

slide-38
SLIDE 38

Transcription: a special kind of reading

What is the goal of a transcription? to make a primary source accessible ... ... and comprehensible which may imply adding or using much additional information Hence, all transcription is selective all transcription is imaginative

31/78

slide-39
SLIDE 39

Making a documentary transcription

The <sourceDoc> element allows us to represent ‘uninterpreted’ text within a document The <sourceDoc> element contains <surface> and <zone> elements, just like a <facsimile> ... ... except that its components contain transcribed text as well as (or instead of) images A special kind of zone, called <line> is also available along with a small number of neutral tags for obvious metatextual interventions

32/78

slide-40
SLIDE 40

Layers of transcription

The palaeographic layer : what characters do we see here? The documentary or diplomatic layer : what was actually written on the page? The editorial or semantic layer : how should it be read?

33/78

slide-41
SLIDE 41

Palaeographic layer

identify the marks we consider to be letters map the letters to an appropriate unicode character decide which non-standard or variant characters we need to preserve . . The TEI <g> element is your friend...

34/78

slide-42
SLIDE 42

Documentary transcription of page 5

. . <surface n="5r"> <zone>5</zone> <line>& demy trebuchans de</line> <line>soixante sept pieces au marc</line> <line>a ung felin & demy de remede</line> <line>Cours chacune piece po&#xFFFD;</line> <line>cinquāte soubz tourois.</line> <line>Sur le pris de quinze livres </line> <line>tourois le marc dargent le roy </line> <line>fust cõtinuee la fabricaciõ</line> <line>des testons a dix deniers </line> <line>dix huict grains troys quartz</line> <line>de fin qui valt unze deniers </line> <line>six grains dargent le roy, a</line> </surface> 35/78

slide-43
SLIDE 43

Source-oriented (physical) transcriptional elements

<mod> generic tag for marking any kind of modification in the document, without attributing a specific function to it <metaMark> any kind of written mark intended to determine how the document should be read <retrace> writing which has been rewritten or otherwise fixed <undo>, <redo>, written modifications which have been reversed

  • r reinstated respectively.

<transpose> and <transposeGrp> transposed sequences

36/78

slide-44
SLIDE 44

Example of metamark

. . <surface> <metamark function="used" rend="line" tar- get="#X2"/> <zone xml:id="X2"> <line>I am that halfgrown <add>angry</add> boy, fallen asleep</line> <line>The tears of foolish passion yet undried</line> <line>upon my cheeks.</line> <!-- ...

  • ->

<line>I pass through <add>the</add> travels and <del>fortunes</del> of <retrace>thirty</retrace> </line> <line>years and become old,</line> <line>Each in its due order comes and goes,</line> <line>And thus a message for me comes.</line> <line>The</line> </zone> <metamark function="used" tar- get="#X2">Entered - Yes</metamark> </surface> 37/78

slide-45
SLIDE 45

Example of retracing

Image from a ms of Peer Gynt, Collin 2869, 4°, I.1.1, the Royal Library of Copenhagen

. . <line>... Sku<retrace cause="unclear">l</retrace>dren </line>

38/78

slide-46
SLIDE 46

Example of transposition

. . <line> <seg xml:id="ib01">bör</seg> <metaMark rend="ul" function="transposition" target="#ib1"> 2. </metaMark>

  • g <seg

xml:id="ib02">hör</seg> <metaMark rend="ul" function="transposition" target="#ib02">1. </metaMark> </line> <!-- ...

  • ->

<transpose> <ptr target="#ib02"/> <ptr target="#ib01"/> </transpose> 39/78

slide-47
SLIDE 47

Text-oriented (logical) transcriptional elements

Traditional TEI structuring elements (<div>, <head>, <p> etc.) Various other phenomena which commonly attract editorial attention :

  • riginal layout information

abbreviations or other arcana ‘evident’ errors which invite correction or conjecture scribal additions, deletions, substitutions, restorations non-standard orthography (etc.) which invites normalisation irrelevant or non-transcribable material passages which are damaged or illegible

40/78

slide-48
SLIDE 48

Original layout information

Within <text> the logical view is privileged, but the physical view can ‘show through’ as empty milestone elements : <gb> the start of a new gathering or quire <pb> the start of a new page <cb> the start of a new column <lb> the start of a new written line These are primarily useful to establish a reference system. The <fw> element can be used to mark ‘paratextual’ features such as running heads, foliotation etc. The <handShift> element can be used to mark changes of hand or writing in a document.

41/78

slide-49
SLIDE 49

Textual transcription of page 5

. . <p> <!-- ...

  • ->

<pb n="5r"/> <fw place="topRight" type="pageNum">5</fw> <lb/> <expan>et</expan> demy trebuchans de <lb/>soixante sept pieces au marc <lb/>a ung felin <expan>et</expan> demy de remede <lb/>Cours chacune piece <expan>pour</expan> <lb/> <expan>cinquante</expan> soubz <expan>tournois</expan> <pc>.</pc> </p> <p> <lb/>Sur le pris de quinze livres <lb/> <expan>tournois</expan> le marc dargent le roy <lb/>fust <expan>continuee</expan> la <expan>fabricacion</expan> <lb/>des testons a dix deniers <lb/>dix huict grains troys quartz <lb/>de fin qui <expan>valent</expan> unze deniers <lb/>six grains dargent le roy, a <!-- ...

  • ->

</p> 42/78

slide-50
SLIDE 50

Abbreviations &c.

In Western MSS, we commonly distinguish : Suspensions the first letter or letters of the word are written, generally followed by a point : for example ‘e.g.’ for ‘exempla gratia’ Contractions both first and last letters are written, generally with some mark of abbreviation such as superscript strokes, or points : e.g. ‘Mr.’ for ‘Mister’ Brevigraphs Special signs such as the Tironian nota used for ‘et’ , the letter p with a barred tail used for ‘per’ , the letter c with a circumflex used for ‘cum’ etc. Superscripts Superscript letters (vowels or consonants) used to indicate various kinds of contraction: e.g. ‘w’ followed by superscript ‘ch’ for ‘which’ . . . Most of the symbols needed are available in Unicode, though not necessarily in all fonts.

43/78

slide-51
SLIDE 51

Abbreviation and Expansion

An abbreviation may be viewed in two different ways: as a particular sequence of letters or marks upon the pa thus, a ‘p with a bar through the descender’ , a ‘superscript hook’ , a ‘macron’ as another way of representing the letter or letters it is believed to be standing for: thus, ‘per’ , ‘re’ , ‘n'

44/78

slide-52
SLIDE 52

Two Levels of Encoding Abbreviations

TEI proposes elements for two levels of encoding: the whole of an abbreviated word and the whole of its expansion may be marked using <abbr> and <expan> respectively abbreviatory signs or characters and the ‘invisible’ characters they imply may be marked using <am> and <ex> respectively

45/78

slide-53
SLIDE 53

A simple example (1)

. Editorial strategy may be simply to note that we have expanded the abbreviations:

. . <p> <lb/>Cours chacune piece <expan>pour</expan> <lb/> <expan>cinquante</expan> soubz <expan>tournois</expan> <pc>.</pc> </p>

46/78

slide-54
SLIDE 54

A simple example (2)

As you noticed, ‘pour’ was actually written ‘po’ followed by an ‘r’ subscript; ‘cinquante’ as ‘cinquāte’ with a macron on the ‘a’ to indicate nasalisation. We could therefore encode as follows:

. . <p> <abbr>po&#xFFFD;</abbr> .... <abbr>cinquāte</abbr> </p>

... or we could choose one of the following styles:

. . <p> po<am>&#xFFFD;</am> ...

  • r po<ex>u</ex>r </p>

. . <abbr>po<am>&#xFFFD;</am> </abbr> . . <expan>po<ex>u</ex>r</expan>

47/78

slide-55
SLIDE 55

Simple example (3)

And of course TEI permits both cake and the eating off it:

. . <p> po<choice> <am>&#FFFD;</am> <ex>ur</ex> </choice> </p> . . <choice> <abbr>po<am>&#xFFFD;</am> </abbr> <expan>po<ex>u</ex>r</expan> </choice>

48/78

slide-56
SLIDE 56

Classifying abbreviations

The @type attribute on <abbr> is a useful way of categorising abbreviations, whether for statistical purposes, or to allow for different types to be rendered differently:

. . <choice> <abbr type="brevigraphe">po<am>&#xFFFD;</am> </abbr> <expan>po<ex>u</ex>r</expan> en <choice> <abbr type="suspension">fin<am>.</am> </abbr> <expan>fin<ex>ir</ex> </expan> </choice> </choice>

This encoding might be displayed as : ‘po(u)r en finir’ As elsewhere, the @resp and @cert attributes can also be used to indicate who is responsible for an expansion, and the degree of certainty attached to it.

49/78

slide-57
SLIDE 57

Changes of hand : <handShift>

A special kind of milestone the <handShift> can be used to mark the beginning of a sequence of text written in a new hand, or the beginning of a scribal stint.

. . <l>When wolde the cat dwelle in his ynne</l> <handShift medium="greenish-ink"/> <l>And if the cattes skynne be slyk <handShift medium="black-ink"/> and gaye</l>

The @medium attribute normalizes features of the writing itself. Similar attributes are provided for @script and @scribe

50/78

slide-58
SLIDE 58

Changes of hand : <handNote>

More precise descriptions of a hand (or a script) may be provided in the header

. . <handNotes> <handNote xml:id="h1" script="copperplate" medium="brown ink">Carefully written with regular descenders</handNote> <handNote xml:id="h2" medium="pencil">Unschooled scrawl</handNote> </handNotes>

These are then referenced from the transcription, either by means

  • f the @hand attribute on a transcriptional element, or by means of

the @new attribute on a <handShift> element.

. . <zone hand="#h1">.. and that good Order Decency and regular worship may be once more introduced and Established in this Parish according to the Rules and Ceremonies of the Church of England <handshift new="#h2"/> and as under a good Consciencious and sober Curate there would and ought to be</zone>

51/78

slide-59
SLIDE 59

Corrections and emendations

The <sic> element can be used to indicate that the reading of the manuscript is erroneous or nonsensical, while <corr> (correction) can be used to provide what in the editor's opinion is the correct reading:

. . <sic>relea</sic> . . <corr>relicta</corr>

The two may, of course, be combined within a <choice> element:

. . <choice> <sic>relea</sic> <corr cert="high">relicta</corr> <corr cert="low">relatio</corr> </choice>

52/78

slide-60
SLIDE 60

Normalization

Source texts rarely use modern orthography. For retrieval and other processing reasons, however, the modernized form may be very. The <reg> (regularized) element is available used to mark a normalized form; the <orig> (original) element to indicate a non-standard spelling. These elements can optionally be grouped as alternatives using the <choice> element:

53/78

slide-61
SLIDE 61

Normalisation example

. . <lb/>dix <choice> <orig>huict</orig> <reg>huit</reg> </choice> grains <choice> <orig>troys quartz</orig> <reg>trois-quart</reg> </choice>

In this case, a further semantic regularisation is possible :

. . <lb/> <measure quantity="18.75" unit="gr">dix <choice> <orig>huict</orig> <reg>huit</reg> </choice> grains <choice> <orig>troys quartz</orig> <reg>trois-quart</reg> </choice> </measure> 54/78

slide-62
SLIDE 62

Additions, deletions, and substitutions

Alterations made to the text, whether by the scribe or in some later hand, can be encoded using <add> (addition) or <del> (deletion). Where the addition and deletion are regarded as a single substitution, they can be grouped together using the <subst> (substitution) element : <add> (addition) or <del> (deletion) are used for evident alterations in the source a combined addition and deletion may be marked using <subst> (substitution)

55/78

slide-63
SLIDE 63

(Probably) a substitution

. . <subst> <del>half-</del> <add>all</add> </subst> blind

56/78

slide-64
SLIDE 64

A fuller example

. . <l>And towards our distant rest began to trudge,</l> <l> <subst> <del>Helping the worst amongst us</del> <add>Dragging the worst amongt us</add> </subst>, who'd no boots </l> <l>But limped on, blood-shod. All went lame; <subst> <del status="shortEnd">half-</del> <add>all</add> </subst> blind;</l> <l>Drunk with fatigue ; deaf even to the hoots</l> <l>Of tired, outstripped <del>fif</del> five-nines that dropped behind.</l>

57/78

slide-65
SLIDE 65

Semi-legible writing

Use <unclear> if the document is partly illegible i.e. it can be read but without perfect confidence. The @reason attribute here states the cause of the uncertainty in transcription.

. . I <subst> <add>might</add> <del> <unclear rea- son="overinking" cert="medium" resp="#LDB">should</unclear> </del> </subst>have

58/78

slide-66
SLIDE 66

Supplied and damaged writing

Use the <supplied> element if the transcriber has provided a reading not actually visible in the text, whether because of damage

  • r scribal error : @reason here indicates why the text has been

supplied.

. . …Dragging the worst among<supplied reason="authorialError">s</supplied>t us…

Use the <damage> element to record the existence of physical damage to the document, whether or not the damaged part is readable

59/78

slide-67
SLIDE 67

A damaged ms

. . <lb/>IN the bosom <damage group="1">o</damage>f one of those spa<lb n="2"/>cious coves wh<damage group="1">ich inde</damage>nt the eastern <lb n="3"/>shore of the <damage group="1"> <supplied>Hudson, at </supplied> </damage>that broad <lb n="4"/>expansion <damage group="1"> <supplied>of the</supplied> r </damage>iver denominated <lb n="5"/>by the ancient Dutch navigators <del>of <lb/>those waters</del> the Tappaan Zee, and 60/78

slide-68
SLIDE 68

Lacunae

When what is missing cannot be confidently supplied, the <gap> element should be used. Its @reason attribute explains the reason for the omission and its @extent and @unit attributes indicate its size.

. . <gap reason="wormhole" extent="7" unit="mm"/> . . I am dr Sr yr <gap reason="illegible" quantity="3" unit="word"/>Sydney Smith

61/78

slide-69
SLIDE 69

Some difficulties

These methods are perfectly adequate in simple cases. They rapidly encounter problems when:

  • verlap happens (as it always does)

the sequence of interventions is important or indeterminate the layout and the meaning of the writing are not easily separable . . Work-arounds do exist for all of these, of course

62/78

slide-70
SLIDE 70

Additions and deletions crossing element boundaries

When additions and deletions are not conveniently well-nested within other parts of the structure, we can use spanning techniques. The elements <addSpan> and <delSpan> delimit a span of text by pointing mechanisms rather than by enclosing it. @spanTo indicates the end of a span initiated by the element bearing this attribute.

. . <l> .... </l> <addSpan spanTo="#id4"/> <l> <!-- an interpolated line --> </l> <anchor xml:id="id4"/> <l> .... </l>

63/78

slide-71
SLIDE 71

Underspecification

The Guidelines are a bit vague on how exactly spanning should be interpreted. Consider

. . <p>apple <delSpan spanTo="#x"/>banana</p> <p>cherry<anchor xml:id="x"/>date</p>

Does this mean

. . <p>apple date</p>

  • r

. . <p>apple</p> <p>date</p>

?

64/78

slide-72
SLIDE 72

Using attributes to clarify who did what when

The author (WJ) wrote ‘One must have lived... ’ The author added the word ‘But’ before ‘One’ An editor (FB) corrected ‘One’ to ‘one’

. . <add place="supra" hand="#WJ" cert="medium"> But</add> <choice> <sic>One</sic> <corr resp="#FB" cert="high">one</corr> </choice> must have lived ... <!-- elsewhere --> <respStmt xml:id="FB"> <resp>editorial changes</resp> <name>Fredson Bowers</name> </respStmt> <respStmt xml:id="WJ"> <resp>authorial changes</resp> <name>William James</name> </respStmt>

65/78

slide-73
SLIDE 73

Using <restore> to indicate authorial change of mind

The author writes ‘For I hate this my body’ The word ‘my’ is deleted The author writes ‘stet’ in the margin The <restore> element can be used to indicate that a deletion has been reversed:

. . <l>[...] For I hate this <restore hand="#dhl" type="marginalStetNote"> <del>my</del> </restore> body [...]</l>

... note that we have not encoded the ‘metamark’ ‘stet’ , but rather its effect.

66/78

slide-74
SLIDE 74

Recording the genesis of a text

. . Let us hypothesize that the different colours of ink here are associated with different layers (stages, phases...)

67/78

slide-75
SLIDE 75

Documenting the layers

. . <profileDesc> <creation> <listChange ordered="true"> <change xml:id="ST-1">First layer, in black ink</change> <change xml:id="ST-2">Second layer, in red</change> <change xml:id="ST-3">Corrections and révisions, in blue</change> <change xml:id="ST-4">Deletions and usage notes in green</change> </listChange> </creation> </profileDesc>

68/78

slide-76
SLIDE 76

Associating a layer with a part of the transcript

. . <zone xml:id="zone1" layer="#ST-1"> <line> 28) le court de tennis. Les tribunes sont ... Deux joueurs</line> <!-- ...

  • ->

<line>30) l’un des joueurs de tennis se tient ... trois</line> <line>fois sur le sol</line> <zone layer="#ST-2"> <line>31) </line> <line>Vue de face</line> <line> à contre jour</line> <metaMark function="add"/> <line>la vieille dame ... dans le vestibule (contre-jour)</line> </zone> <line>3<subst layer="#ST-2"> <del>1</del> <add>2</add> </subst>)Le groupe de cavaliers ...</line> <!-- ...

  • ->

<line>... Le mot FIN apparaît sur l’écran.</line> </zone> <zone layer="ST-3"> <line>Dans ce cas...</line> </zone> 69/78

slide-77
SLIDE 77

How far will the TEI take us ?

In particular, is the TEI scheme adequate for the needs of those transcribing ‘modern’ manuscripts ? surviving medieval or early modern manuscripts generally have a public function, and a more or less conventionalised (if complex) format modern manuscripts or authorial drafts however often contain entirely private or idiosyncratic signs, with no clear communicative function

70/78

slide-78
SLIDE 78

For example...

71/78

slide-79
SLIDE 79

Text/Image

At all periods we find ‘playful’ texts whose meaning is conveyed by their documentary appearance as much as by their linguistic properties, or by the interplay between the two. The TEI initially ruled such texts out of scope, for a variety of reasons, not least a dearth of image-processing technologies.

72/78

slide-80
SLIDE 80

Concerns that won't go away

The process by which a document was created may be as important as its final or canonical textual form It may be impossible to talk about the text independently of its documentary instantiation, whether because

the meaning of the text is presented entirely or partly graphically the document is deliberately constructed in a non-linear or combinatorial way, in order to generate many ‘texts’

73/78

slide-81
SLIDE 81

Document vs. text

By hypothesis, distinguishing these levels may help our editorial task : at the documentary level : pages, surfaces, writing, tears, crossings-out, stains... at the textual level : corrections, modifications, additions, deletions, transpositions... . . Distinguishing these levels in our encoding is a good way of studying their interaction

74/78

slide-82
SLIDE 82

Robinsonian provocations

. . One may contemplate, with equanimity, every complexity of Byzantine medieval military history but be quite defeated by the unfamiliar vocabulary of the mysteriously interconnected universe which is the TEI. http://www.digitalmedievalist.org/journal/1.1/ robinson/#robinson.dm.1.1.0140(2005) . . ‘almost without exception, no scholarly electronic edition has presented material which could not have been presented in book form ... most electronic scholarly editions [fail] to use new computer methodologies to explore the texts which they present’ http://computerphilologie.uni-muenchen.de/ jg03/robinson.html (2005)

75/78

slide-83
SLIDE 83

So what should we anticipate doing with the editions of the future?

Better visualisation tools Better access mechanisms, for both metadata and data Tools for dynamic hypothesis testing More use of social networking and crowd sourcing

76/78

slide-84
SLIDE 84

New technical paradigms

Visualisation and analysis at two levels sub-documentary components across corpora of documents locating and presenting for patterns of variation . . quantitative codicology meets evolutionary biology

77/78

slide-85
SLIDE 85

The wisdom of crowds and the demise of the editor

. . ‘We are all engaged in the business of understanding: distributed editions fashioned collaboratively may become the ground of our mutual enterprise.’ (PMWR, 2007) Transcribe Bentham : http://www.ucl.ac.uk/transcribe-bentham/ Oxyrnchus papyri : ancientlives.org . . ‘What is needed is a commitment to cooperative work among developers in a chaotic environment of experimentation and communication.’ (CMSMCQ, 1996)

78/78