Information Structure Annotation and Secondary Accents Arndt Riester - - PowerPoint PPT Presentation

information structure annotation
SMART_READER_LITE
LIVE PREVIEW

Information Structure Annotation and Secondary Accents Arndt Riester - - PowerPoint PPT Presentation

Information Structure Annotation and Secondary Accents Arndt Riester 1 Stefan Baumann 2 1 Institute for Natural Language Processing University of Stuttgart 2 IfL Phonetics University of Cologne DGfS Workshop Beyond Semantics 24.2.2011 Riester


slide-1
SLIDE 1

Information Structure Annotation

and Secondary Accents

Arndt Riester1 Stefan Baumann2

1Institute for Natural Language Processing

University of Stuttgart

2IfL Phonetics

University of Cologne

DGfS Workshop Beyond Semantics 24.2.2011

Riester & Baumann (2011) Beyond Semantics 24.2.2011 1 / 22

slide-2
SLIDE 2

Annotating Information Structure (Focus-Background) for Research on Prosody

Riester & Baumann (2011) Beyond Semantics 24.2.2011 2 / 22

slide-3
SLIDE 3

Annotating Focus-Background Structure

Is considered “difficult”.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-4
SLIDE 4

Annotating Focus-Background Structure

Is considered “difficult”. Usually, focus theorists talk about given and new information, about presupposition, about “triggering alternatives”, about contrast.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-5
SLIDE 5

Annotating Focus-Background Structure

Is considered “difficult”. Usually, focus theorists talk about given and new information, about presupposition, about “triggering alternatives”, about contrast. All these notions have multiple interpretations and do not easily apply to corpus data.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-6
SLIDE 6

Annotating Focus-Background Structure

Is considered “difficult”. Usually, focus theorists talk about given and new information, about presupposition, about “triggering alternatives”, about contrast. All these notions have multiple interpretations and do not easily apply to corpus data. Usually, focus theorists don’t care.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-7
SLIDE 7

Annotating Focus-Background Structure

Is considered “difficult”. Usually, focus theorists talk about given and new information, about presupposition, about “triggering alternatives”, about contrast. All these notions have multiple interpretations and do not easily apply to corpus data. Usually, focus theorists don’t care. But we do! In corpus annotation, what comes closest to the given-new distinction are anaphora or information status. This is where we would like to start.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-8
SLIDE 8

Annotating Focus-Background Structure

Is considered “difficult”. Usually, focus theorists talk about given and new information, about presupposition, about “triggering alternatives”, about contrast. All these notions have multiple interpretations and do not easily apply to corpus data. Usually, focus theorists don’t care. But we do! In corpus annotation, what comes closest to the given-new distinction are anaphora or information status. This is where we would like to start. Annotating focus “directly” often involves question-answer tests (but where do the questions come from in narrative text?) or prosody (but does every pitch accent mean “focus”?)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 3 / 22

slide-9
SLIDE 9

Some consensus

Recently, among people taking an annotation perspective to focus (Selkirk, 2007; Götze et al., 2007; Beaver & Velleman (subm.)) some consensus seems to emerge.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 4 / 22

slide-10
SLIDE 10

Some consensus

Recently, among people taking an annotation perspective to focus (Selkirk, 2007; Götze et al., 2007; Beaver & Velleman (subm.)) some consensus seems to emerge. We must identify two sources responsible for the assignment of accent:

Riester & Baumann (2011) Beyond Semantics 24.2.2011 4 / 22

slide-11
SLIDE 11

Some consensus

Recently, among people taking an annotation perspective to focus (Selkirk, 2007; Götze et al., 2007; Beaver & Velleman (subm.)) some consensus seems to emerge. We must identify two sources responsible for the assignment of accent: Novelty and F-features (Rooth, 1992: e.g. overt contrast, association with exhaustive particles etc.), whereas the latter are referred to as “focus” (Selkirk), “importance” (Beaver & Velleman), “contrast(ive focus)” (Götze et al.), “elicited alternatives” (Riester & Baumann)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 4 / 22

slide-12
SLIDE 12

What to annotate?

Given vs. new constituents (a.k.a. information status)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 5 / 22

slide-13
SLIDE 13

What to annotate?

Given vs. new constituents (a.k.a. information status)

◮ Information status of DPs / terms (referential information status) Riester & Baumann (2011) Beyond Semantics 24.2.2011 5 / 22

slide-14
SLIDE 14

What to annotate?

Given vs. new constituents (a.k.a. information status)

◮ Information status of DPs / terms (referential information status) ◮ Focus-background structure additionally requires a notion of

Information status for non-referential expressions (lexical information status)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 5 / 22

slide-15
SLIDE 15

What to annotate?

Given vs. new constituents (a.k.a. information status)

◮ Information status of DPs / terms (referential information status) ◮ Focus-background structure additionally requires a notion of

Information status for non-referential expressions (lexical information status)

◮ Ref + Lex = RefLex scheme (Baumann & Riester, submitted;

Riester & Baumann, 2011)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 5 / 22

slide-16
SLIDE 16

What to annotate?

Given vs. new constituents (a.k.a. information status)

◮ Information status of DPs / terms (referential information status) ◮ Focus-background structure additionally requires a notion of

Information status for non-referential expressions (lexical information status)

◮ Ref + Lex = RefLex scheme (Baumann & Riester, submitted;

Riester & Baumann, 2011)

Alternative-eliciting features (F-features, contrast)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 5 / 22

slide-17
SLIDE 17

Annotating referential information status

Units: referring expressions (terms, DPs) (for details, see Baumann & Riester, submitted) Prince (1981): Gundel et al. (1993): · evoked · activated · inferable · familiar · new unused · uniquely identifiable · new brand-new · referential Lambrecht (1994): Götze et al. (2007): · identifiable active · given · identifiable accessible · accessible inferable · identifiable inactive · accessible general · unidentifiable · new

Riester & Baumann (2011) Beyond Semantics 24.2.2011 6 / 22

slide-18
SLIDE 18

Referential vs. lexical GIVENNESS (Baumann & Riester, submitted)

Schwarzschild (1999): distinguishes between (i) expressions of type e and (ii) “functional” expressions of type α, β

Riester & Baumann (2011) Beyond Semantics 24.2.2011 7 / 22

slide-19
SLIDE 19

Referential vs. lexical GIVENNESS (Baumann & Riester, submitted)

Schwarzschild (1999): distinguishes between (i) expressions of type e and (ii) “functional” expressions of type α, β

  • i. GIVENNESS = coreference anaphora

Riester & Baumann (2011) Beyond Semantics 24.2.2011 7 / 22

slide-20
SLIDE 20

Referential vs. lexical GIVENNESS (Baumann & Riester, submitted)

Schwarzschild (1999): distinguishes between (i) expressions of type e and (ii) “functional” expressions of type α, β

  • i. GIVENNESS = coreference anaphora
  • ii. GIVENNESS = entailment / set inclusion

(for words: synonymy or hypernymy)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 7 / 22

slide-21
SLIDE 21

Referential vs. lexical GIVENNESS (Baumann & Riester, submitted)

Schwarzschild (1999): distinguishes between (i) expressions of type e and (ii) “functional” expressions of type α, β

  • i. GIVENNESS = coreference anaphora
  • ii. GIVENNESS = entailment / set inclusion

(for words: synonymy or hypernymy)

Halliday and Hasan (1976): referential relations vs. lexical cohesion

Riester & Baumann (2011) Beyond Semantics 24.2.2011 7 / 22

slide-22
SLIDE 22

Referential vs. lexical GIVENNESS (Baumann & Riester, submitted)

Schwarzschild (1999): distinguishes between (i) expressions of type e and (ii) “functional” expressions of type α, β

  • i. GIVENNESS = coreference anaphora
  • ii. GIVENNESS = entailment / set inclusion

(for words: synonymy or hypernymy)

Halliday and Hasan (1976): referential relations vs. lexical cohesion Baumann & Riester: R-GIVENNESS vs. L-GIVENNESS

Riester & Baumann (2011) Beyond Semantics 24.2.2011 7 / 22

slide-23
SLIDE 23

R-GIVEN vs. L-GIVEN

(1) A colleague came in. The idiot dropped a vase.

R-GIVEN

Riester & Baumann (2011) Beyond Semantics 24.2.2011 8 / 22

slide-24
SLIDE 24

R-GIVEN vs. L-GIVEN

(1) A colleague came in. The idiot dropped a vase.

R-GIVEN

(2) A student came in. Another student greeted him.

L-GIVEN R-GIVEN

Riester & Baumann (2011) Beyond Semantics 24.2.2011 8 / 22

slide-25
SLIDE 25

R-GIVEN vs. L-GIVEN

(1) A colleague came in. The idiot dropped a vase.

R-GIVEN

(2) A student came in. Another student greeted him.

L-GIVEN R-GIVEN

(3) A policeman came in. Another guy left.

L-GIVEN

Riester & Baumann (2011) Beyond Semantics 24.2.2011 8 / 22

slide-26
SLIDE 26

R-GIVEN vs. L-GIVEN

(1) A colleague came in. The idiot dropped a vase.

R-GIVEN

(2) A student came in. Another student greeted him.

L-GIVEN R-GIVEN

(3) A policeman came in. Another guy left.

L-GIVEN

(4) A man came in. The man coughed.

L-GIVEN R-GIVEN

Riester & Baumann (2011) Beyond Semantics 24.2.2011 8 / 22

slide-27
SLIDE 27

R-GIVEN vs. L-GIVEN

(1) A colleague came in. The idiot dropped a vase.

R-GIVEN

(2) A student came in. Another student greeted him.

L-GIVEN R-GIVEN

(3) A policeman came in. Another guy left.

L-GIVEN

(4) A man came in. The man coughed.

L-GIVEN R-GIVEN

Neither type of GIVENNESS is a prerequisite for the other!

Riester & Baumann (2011) Beyond Semantics 24.2.2011 8 / 22

slide-28
SLIDE 28

R-NEW, R-UNUSED, L-NEW

R-NEW: specific indefinite R-UNUSED: discourse-new, context-free definite L-NEW: unrelated word

Riester & Baumann (2011) Beyond Semantics 24.2.2011 9 / 22

slide-29
SLIDE 29

R-NEW, R-UNUSED, L-NEW

R-NEW: specific indefinite R-UNUSED: discourse-new, context-free definite L-NEW: unrelated word

(5) A man came in. Another man left.

L-NEW L-GIVEN R-NEW R-NEW

Riester & Baumann (2011) Beyond Semantics 24.2.2011 9 / 22

slide-30
SLIDE 30

R-NEW, R-UNUSED, L-NEW

R-NEW: specific indefinite R-UNUSED: discourse-new, context-free definite L-NEW: unrelated word

(5) A man came in. Another man left.

L-NEW L-GIVEN R-NEW R-NEW

(6) George came in. Mary likes George.

L-NEW L-NEW L-GIVEN R-UNUSED R-UNUSED R-GIVEN

Riester & Baumann (2011) Beyond Semantics 24.2.2011 9 / 22

slide-31
SLIDE 31

R-NEW, R-UNUSED, L-NEW

R-NEW: specific indefinite R-UNUSED: discourse-new, context-free definite L-NEW: unrelated word

(5) A man came in. Another man left.

L-NEW L-GIVEN R-NEW R-NEW

(6) George came in. Mary likes George.

L-NEW L-NEW L-GIVEN R-UNUSED R-UNUSED R-GIVEN

(7) The man who stole my wallet is very tall.

L-NEW L-NEW R-UNUSED R-UNUSED

Riester & Baumann (2011) Beyond Semantics 24.2.2011 9 / 22

slide-32
SLIDE 32

R-BRIDGING, L-ACCESSIBLE

Prince (1981), Chafe (1976), Lambrecht (1994): assume an intermediate class in between GIVEN and NEW: inferable / accessible information

Riester & Baumann (2011) Beyond Semantics 24.2.2011 10 / 22

slide-33
SLIDE 33

R-BRIDGING, L-ACCESSIBLE

Prince (1981), Chafe (1976), Lambrecht (1994): assume an intermediate class in between GIVEN and NEW: inferable / accessible information

L-ACCESSIBLE: hyponym, meronym

Riester & Baumann (2011) Beyond Semantics 24.2.2011 10 / 22

slide-34
SLIDE 34

R-BRIDGING, L-ACCESSIBLE

Prince (1981), Chafe (1976), Lambrecht (1994): assume an intermediate class in between GIVEN and NEW: inferable / accessible information

L-ACCESSIBLE: hyponym, meronym R-BRIDGING: definite, context-dependent, non-coreferential

expression

Riester & Baumann (2011) Beyond Semantics 24.2.2011 10 / 22

slide-35
SLIDE 35

R-BRIDGING, L-ACCESSIBLE

Prince (1981), Chafe (1976), Lambrecht (1994): assume an intermediate class in between GIVEN and NEW: inferable / accessible information

L-ACCESSIBLE: hyponym, meronym R-BRIDGING: definite, context-dependent, non-coreferential

expression Bill saw a house. The door was open.

L-NEW L-NEW L-ACCESSIBLE R-UNUSED R-NEW R-BRIDGING

John was murdered. The harpoon was lying nearby.

L-NEW L-NEW R-UNUSED R-BRIDGING

Riester & Baumann (2011) Beyond Semantics 24.2.2011 10 / 22

slide-36
SLIDE 36

From information status to information structure

Corpus data: radio news

A strong earthquake has hit central Japan.

H* L+H* H* H* H+!H* L-% Ein starkes Erdbeben hat Zentral- Japan erschüttert.

L-NEW L-NEW L-NEW L-NEW L-NEW R-UNUSED R-NEW L-NEW L-NEW

The authorities issued a Tsunami warning for the southwest.

H* L+H* L+H* L-% Die Behörden gaben eine Tsunami-Warnung für den Südwesten heraus.

L-NEW L-NEW L-NEW L-NEW L-NEW R-BRIDGING R-BRIDGING R-NEW L-NEW L-NEW Riester & Baumann (2011) Beyond Semantics 24.2.2011 11 / 22

slide-37
SLIDE 37

From information status to information structure

Corpus data: radio news

A strong earthquake has hit central Japan.

H* L+H* H* H* H+!H* L-% Ein starkes Erdbeben hat Zentral- Japan erschüttert.

L-NEW L-NEW L-NEW L-NEW L-NEW R-UNUSED R-NEW L-NEW L-NEW

The authorities issued a Tsunami warning for the southwest.

H* L+H* L+H* L-% Die Behörden gaben eine Tsunami-Warnung für den Südwesten heraus.

L-NEW L-NEW L-NEW L-NEW L-NEW R-BRIDGING R-BRIDGING R-NEW L-NEW L-NEW Riester & Baumann (2011) Beyond Semantics 24.2.2011 11 / 22

slide-38
SLIDE 38

From information status to information structure

Corpus data: radio news

A strong earthquake has hit central Japan.

H* L+H* H* H* H+!H* L-% Ein starkes Erdbeben hat Zentral- Japan erschüttert.

L-NEW L-NEW L-NEW L-NEW L-NEW R-UNUSED R-NEW L-NEW L-NEW

The authorities issued a Tsunami warning for the southwest.

H* L+H* L+H* L-% Die Behörden gaben eine Tsunami-Warnung für den Südwesten heraus.

L-NEW L-NEW L-NEW L-NEW L-NEW R-BRIDGING R-BRIDGING R-NEW L-NEW L-NEW Riester & Baumann (2011) Beyond Semantics 24.2.2011 11 / 22

slide-39
SLIDE 39

Overview RefLex scheme

R-Level (κ = 0.70) L-Level (κ = 0.78) Units: DP , PP , that-CP Units: AP , AdvP , NP , VP , S Label Description Label Description

R-GIVEN

corefential

L-GIVEN

word identity / anaphor synonym / hypernym / holonym / superset

R-BRIDGING

non-coreferential

L-ACCESSIBLE

hyponym / meronym / context-dependent subset / otherwise expression related

R-UNUSED

definite

L-NEW

unrelated expression discourse-new (within last five context-free clauses) expression

R-NEW

specific indefinite

R-GENERIC

generic definite

  • r indefinite

OTHER

e.g. cataphors

Riester & Baumann (2011) Beyond Semantics 24.2.2011 12 / 22

slide-40
SLIDE 40

Annotation in SALTO (Burchardt et al. 2006)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 13 / 22

slide-41
SLIDE 41

Current situation

Annotation of information structure (on all syntactic levels) is currently hampered a bit for technical reasons. (and so far can

  • nly be done by expert annotators)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 14 / 22

slide-42
SLIDE 42

Current situation

Annotation of information structure (on all syntactic levels) is currently hampered a bit for technical reasons. (and so far can

  • nly be done by expert annotators)

Annotation of information status (RefLex) (only on DPs and PPs and the words contained in them) is more robust (and can easily be done by untrained annotators)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 14 / 22

slide-43
SLIDE 43

Spoken-written corpora (German) annotated so far

Genre Amount of data Annotations Stuttgart Deutschlandfunk ∼3500 sentences GToBI radio news (3 days) (∼8,500 accents), bulletins R-level, recursive (∼10.000 labels) Cologne Read stories 134 intonation phrases R-level, L-level (115 sentences)

  • n term expressions,

Spontaneous 374 intonation phrases elicited alternatives, monologues GToBI

Riester & Baumann (2011) Beyond Semantics 24.2.2011 15 / 22

slide-44
SLIDE 44

Database solution: poster by Kerstin Eckart et al., DGfS-CL poster session, today 1pm!)

Riester & Baumann (2011) Beyond Semantics 24.2.2011 16 / 22

slide-45
SLIDE 45

Spoken-written corpora (German) annotated so far

Genre Amount of data Annotations Stuttgart Deutschlandfunk ∼3500 sentences GToBI radio news (3 days) (∼8,500 accents), bulletins R-level, recursive (∼10.000 labels) Cologne Read stories 134 intonation phrases R-level, L-level (115 sentences)

  • n term expressions,

Spontaneous 374 intonation phrases elicited alternatives, monologues GToBI

Riester & Baumann (2011) Beyond Semantics 24.2.2011 17 / 22

slide-46
SLIDE 46

Results

Riester & Baumann (2011) Beyond Semantics 24.2.2011 18 / 22

slide-47
SLIDE 47

Some results (Baumann & Riester, 2011b)

adapted from Ladd / Büring: (8) a. If you need a dentist, why don’t you go to Dr Cremer? b. Because I HATE [the [butcher]L-NEW]R-GIVEN. Prediction: R-GIVEN, L-NEW expressions are deaccented.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 19 / 22

slide-48
SLIDE 48

Some results (Baumann & Riester, 2011b)

adapted from Ladd / Büring: (8) a. If you need a dentist, why don’t you go to Dr Cremer? b. Because I HATE [the [butcher]L-NEW]R-GIVEN. Prediction: R-GIVEN, L-NEW expressions are deaccented. Not confirmed in our German data! n No accent Secondary accent H* phrase acc., L*, !H* Spontaneous speech 17 18 % 35 % 47 % Read data 25 16 % 68 % 16 % Some, but not all of these cases can be explained by elicited alternatives (F-features) and occurrence in predications.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 19 / 22

slide-49
SLIDE 49

Some results (Baumann & Riester, 2011b)

adapted from Ladd / Büring: (8) a. If you need a dentist, why don’t you go to Dr Cremer? b. Because I HATE [the [butcher]L-NEW]R-GIVEN. Prediction: R-GIVEN, L-NEW expressions are deaccented. Not confirmed in our German data! n No accent Secondary accent H* phrase acc., L*, !H* Spontaneous speech 17 18 % 35 % 47 % Read data 25 16 % 68 % 16 % Some, but not all of these cases can be explained by elicited alternatives (F-features) and occurrence in predications. Interestingly high amount of secondary accents.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 19 / 22

slide-50
SLIDE 50

Results II

from Büring (2007): (9) a. Why do you study Italian? b. Because I’m MARried to [an [Italian]L-GIVEN]R-NEW. Prediction: R-NEW, L-GIVEN expressions are deaccented.

Riester & Baumann (2011) Beyond Semantics 24.2.2011 20 / 22

slide-51
SLIDE 51

Results II

from Büring (2007): (9) a. Why do you study Italian? b. Because I’m MARried to [an [Italian]L-GIVEN]R-NEW. Prediction: R-NEW, L-GIVEN expressions are deaccented. Not confirmed! n No accent Secondary accent H* phrase acc., L*, !H* Spontaneous speech 13 23 % 39 % 38 % Read data 10 0 % 90 % 10 %

Riester & Baumann (2011) Beyond Semantics 24.2.2011 20 / 22

slide-52
SLIDE 52

Next steps

Annotate more data

Riester & Baumann (2011) Beyond Semantics 24.2.2011 21 / 22

slide-53
SLIDE 53

Next steps

Annotate more data Apply RefLex scheme to radio news

Riester & Baumann (2011) Beyond Semantics 24.2.2011 21 / 22

slide-54
SLIDE 54

Next steps

Annotate more data Apply RefLex scheme to radio news Annotate L-labels on all relevant constituents in the tree (focus-background structure) for a smaller part of the corpus

Riester & Baumann (2011) Beyond Semantics 24.2.2011 21 / 22

slide-55
SLIDE 55

Next steps

Annotate more data Apply RefLex scheme to radio news Annotate L-labels on all relevant constituents in the tree (focus-background structure) for a smaller part of the corpus Discourse structure and information structure: mutual benefits?

Riester & Baumann (2011) Beyond Semantics 24.2.2011 21 / 22

slide-56
SLIDE 56

Thank you!

Riester & Baumann (2011) Beyond Semantics 24.2.2011 22 / 22