Markup - Why? How? Espen S. Ore University of Oslo What is markup? - - PowerPoint PPT Presentation

markup why how
SMART_READER_LITE
LIVE PREVIEW

Markup - Why? How? Espen S. Ore University of Oslo What is markup? - - PowerPoint PPT Presentation

Markup - Why? How? Espen S. Ore University of Oslo What is markup? (My working definition/description) Markup is what is done so that a part of a document (a text, an image, a sound file etc.) can be identified, pointed at, and some


slide-1
SLIDE 1

Markup - Why? How?

Espen S. Ore University of Oslo

slide-2
SLIDE 2

ESO - Köln 2012 2

What is markup? (My working definition/description)

Markup is what is done so that a part of a document (a text, an image, a sound file etc.) can be identified, pointed at, and some information or data can be linked with the selected data. Text and markup are not clearly separate animals. Word separators are just as much markup as they are text.

slide-3
SLIDE 3

ESO - Köln 2012 3

Pylos Ta641 markup with pictograms

slide-4
SLIDE 4

ESO - Köln 2012 4

The Rök stone - phonetic script

"raiþiaurikR" = "raiþ þiaurikR"

slide-5
SLIDE 5

ESO - Köln 2012 5

Rök - cryptic script

slide-6
SLIDE 6

ESO - Köln 2012 6

The hidden text - standoff lookup?

= 3/5 3/2 (clockwise) = R U ᚠᚢᚦᛆᚱᚴ FUThORK ᚼᚿᛁᛆᛌ HNIAS ᛐᛒᛘᛚᛦ TBMLY

slide-7
SLIDE 7

ESO - Köln 2012 7

Classical (or Hellenistic) markup

  • Zendodotus and Aristarchos (3rd/2nd centuries BCE) used

special symbols as flags or tags in the texts of Homer. Separate books held comments on the marked text lines.

  • This standoff markup was brought closer to the text as scholia

in medieval MSS

  • Some of this information today appears both in apparatus

entries – and in modern editions of scholia

slide-8
SLIDE 8

ESO - Köln 2012 8

Dialect collection 1746 - 85, modern edition

slide-9
SLIDE 9

ESO - Köln 2012 9

Letter from Ibsen to Gustaf af Geijerstam, Nov. 1898

slide-10
SLIDE 10

ESO - Köln 2012 10

The Ibsen letter on the web

slide-11
SLIDE 11

ESO - Köln 2012 11

The letter with markup

<div type="letter"><pb n="[1]"/> <dateline> Kristiania, den <date>4.<HIS:hisRef type="tcNote" xml:id="noteT15_876" target="B1890-1905ht_noter.xml" corresp="noteT15_876">11</HIS:hisRef>.98</date>. </dateline> <salute>Kære ven!</salute> <p> Jeg beder Dem have så hjertelig tak for alt, hvad De i den senere tid har sendt mig, både for <anchor type="lemma" xml:id="koB15_2941"/> Deres breve, som jeg har så svært for at <anchor type="lemma" xml:id="koB15_2942"/> udtyde, og nu senest for <anchor type="lemma" xml:id="koB15_2943"/> den nye bog af <HIS:hisRef type="person" target="Navneregister_HISe.xml#peASt">Strindberg</HIS:hisRef>. <anchor type="lemma" xml:id="koB15_2944"/> Bring ham min varmeste og oprigtigste taksigelse derfor og sig ham at han har beredt mig en i sandhed stor og overraskende glæde ved dette <HIS:hisRef type="tcNote" xml:id="noteT15_877" target="B1890-1905ht_noter.xml" corresp="noteT15_877">vidnesbyrd</HIS:hisRef> om at han i venlighed har tænkt på mig. Som De véd har jeg <anchor type="lemma" xml:id="koB15_2945"/>hans <HIS:hisRef type="tcNote" xml:id="noteT15_878" target="B1890-1905ht_noter.xml" corresp="noteT15_878">billede</HIS:hisRef> <pb n="[2]"/> ... </p> </div>

slide-12
SLIDE 12

ESO - Köln 2012 12

From the notes-file for the letter

<note resp="editor" xml:id="noteT15_878"> <HIS:hisRef type="tcNote" target="B1890-1905ht.xml" corresp="noteT15_878">billede</HIS:hisRef>] <hi rend="italic">HIS,</hi> billde </note>

slide-13
SLIDE 13

ESO - Köln 2012 13

An Ms for A Doll's House

slide-14
SLIDE 14

ESO - Köln 2012 14

A Doll's House on the web

slide-15
SLIDE 15

ESO - Köln 2012 15

A Doll's House in XML

<HIS:hisSp who="HELMER"> <HIS:spOpener><speaker>H.</speaker></HIS:spOpener> <lb/> <p> Du har ret; dette har rystet os begge. Der er kommet uskøn<gap reason="binding"/><lb/> hed ind imellem os, tanker om død og opløsning – Dette må<lb/> <sic>i</sic> søge frigørelse for; <HIS:hisAdd place="infralinear">Indtil da –.</HIS:hisAdd> <app type="alteration"> <lem> <HIS:hisAdd place="offline">Vi</HIS:hisAdd> </lem> <HIS:hisRdg> <HIS:hisDel rend="overstrike">nu</HIS:hisDel> </HIS:hisRdg> </app> vil <HIS:hisDel rend="overstrike">vi</HIS:hisDel> gå hver til sit. </p> </HIS:hisSp>

slide-16
SLIDE 16

ESO - Köln 2012 16

Peer Gynt - where and when

slide-17
SLIDE 17

ESO - Köln 2012 17

Peer Gynt on the web

slide-18
SLIDE 18

ESO - Köln 2012 18

Peer Gynt in XML

<set> <p> Handlingen, der begynder i <app type="alteration"><lem> <HIS:hisAdd place="offline"> <unclear reason="writing">Førstningen</unclear> af dette Aar<unclear reason="writing">hundrede</unclear> </HIS:hisAdd></lem> <HIS:hisRdg> <HIS:hisDel rend="overstrike">forrige og slutter i dett</HIS:hisDel>e </HIS:hisRdg> </app> <lb/> <app type="alteration"> <lem><HIS:hisAdd place="offline">og slutter henimod vore Dage,</HIS:hisAdd></lem> <HIS:hisRdg><HIS:hisDel rend="overstrike">Aarhundrede,</HIS:hisDel></HIS:hisRdg> </app> foregaar i Gudbrandsdalen, paa Højfjel-<lb/> dene, paa Kysten af Afrika, i Ørkenen Sahara, i Daa-<lb/> rekisten i Cairo, paa Havet o. s. v. o. s. v. – </p> </set>

slide-19
SLIDE 19

ESO - Köln 2012 19

The reason behind the Ibsen encoding

Endringene er gjengitt så diplomatarisk som mulig, slik at tilføyelser er plassert der de er foretatt, for eksempel

  • ver linjen, og markert med innføyningstegn.

Strykninger er markert med gjennomstrekning, .... (The changes are reproduced in a diplomatic way as possible, placing additional material where the additions were done, for instance over the text line and marking the additions with special symbols. Text deleted is marked with overstrike, ...)

  • but of course: there are overlapping hierarchies
slide-20
SLIDE 20

ESO - Köln 2012 20

Inline or standoff?

Back to the Ibsen letter:

<HIS:hisRef type="tcNote" xml:id="noteT15_878" target="B1890- 1905ht_noter.xml" corresp="noteT15_878">billede</HIS:hisRef> in the letter, and <note resp="editor" xml:id="noteT15_878"><HIS:hisRef type="tcNote" target="B1890-1905ht.xml" corresp="noteT15_878">billede</HIS:hisRef>] <hi rend="italic">HIS,</hi> billde</note> in the note file.

slide-21
SLIDE 21

ESO - Köln 2012 21

Linking without anchors

<HIS:hisRef type="tcNote" target="B1890-1905ht_noter.xml" corresp="byte count">billede</HIS:hisRef> in the letter, and <note resp="editor"><HIS:hisRef type="tcNote" target="B1890-1905ht.xml" corresp="byte count">billede</HIS:hisRef>] <hi rend="italic">HIS,</hi> billde</note> in the note file.

slide-22
SLIDE 22

ESO - Köln 2012 22

Standing completely off

billede

in the letter,

<note resp="editor">billede <hi rend="italic">HIS,</hi> billde</note>

in the note file, and this for instance in a record in a database:

Note-id Textfile T-from T-to N-file N-from N-to

T15_878 B1890- 1905ht.xml <b-count> <b-count> <b-count> <b-count> B1890- 1905ht_note r.xml

slide-23
SLIDE 23

ESO - Köln 2012 23

Standoff encoding?

  • Plus:

Makes overlapping hierarchies easier (which in itself opens up for many possibilities)

  • Minus:

Not many off-the-shelf tools available

No standard for data interchange

  • Possible solution:

Standoff use inside the project – or a combination of inline and standoff encoding

Project the data into a suitable hierarchy for XML (TEI) export

slide-24
SLIDE 24

ESO - Köln 2012 24

What are text and markup really?

  • There is not one single formal model for either text or markup.

Belief in a single model tries to make the Humanities into natural science.

  • The humanities can use models of text and markup for certain

uses, for certain purposes.

  • The choice of markup type (inline, standoff or a mixture) is a

matter of convenience