Modeling Other Talkers for Improved Dialog Act Recognition in - - PowerPoint PPT Presentation

modeling other talkers for improved dialog act
SMART_READER_LITE
LIVE PREVIEW

Modeling Other Talkers for Improved Dialog Act Recognition in - - PowerPoint PPT Presentation

Introduction Our Approach Experiments Summary Modeling Other Talkers for Improved Dialog Act Recognition in Meetings Kornel Laskowski 1 & Elizabeth Shriberg 2 , 3 1 Carnegie Mellon University, Pittsburgh PA, USA 2 SRI International, Menlo


slide-1
SLIDE 1

Introduction Our Approach Experiments Summary

Modeling Other Talkers for Improved Dialog Act Recognition in Meetings

Kornel Laskowski1 & Elizabeth Shriberg2,3

1Carnegie Mellon University, Pittsburgh PA, USA 2SRI International, Menlo Park CA, USA 3International Computer Science Institute, Berkeley CA, USA

10 September, 2008

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 1/16

slide-2
SLIDE 2

Introduction Our Approach Experiments Summary

Suppose you’re given ...

SPKR A: SPKR B: SPKR C: SPKR D:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 2/16

slide-3
SLIDE 3

Introduction Our Approach Experiments Summary

Suppose you’re given ...

SPKR A: SPKR B: SPKR C: SPKR D:

TALKSPURT

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 2/16

slide-4
SLIDE 4

Introduction Our Approach Experiments Summary

Suppose you’re given ...

SPKR A: SPKR B: SPKR C: SPKR D:

TASK: segment into dialog acts and classify into dialog act types

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 2/16

slide-5
SLIDE 5

Introduction Our Approach Experiments Summary

Suppose you’re given ...

SPKR A: SPKR B: SPKR C: SPKR D:

TASK: segment into dialog acts and classify into dialog act types

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 2/16

slide-6
SLIDE 6

Introduction Our Approach Experiments Summary

Why use only speech/non-speech information?

sensitive data in which word information must be masked for privacy reasons

Wyatt et al, “Capturing spontaneous conversation and social dynamics: A privacy-sensitive data collection effort”, 2007.

noisy data where word recognition performs poorly image-only data in which speech activity has to be inferred from video only resource-poor languages in which ASR and/or lexical DA recognizers may be unavailable contexts requiring speed: SAD is faster than ASR

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 3/16

slide-7
SLIDE 7

Introduction Our Approach Experiments Summary

Why do we care about DAs?

Because sometimes, we want to discard specific DA types Example 1: summarization systems

retain only speech implementing propositional content

to detect the absence of specific DA types Example 2: spoken dialogue systems

change strategy when active listening cues not offered

to detect the presence of specific DA types Example 3: discourse analysis systems

atypical flooring behavior may indicate grounding problems

DA segmentation important even when DA classification is not

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 4/16

slide-8
SLIDE 8

Introduction Our Approach Experiments Summary

DA Types in ICSI Meetings

Propositional Content DA Types statement, s (85%) question, q (6.6%) “Short” DA Types Feedback Types (5.4%) backchannel, b (2.8%)

acknowledgment, bk (1.5%)

assert, aa (1.1%) Floor Mechanism Types (3.6%) floor holder, fh (2.7%) floor grabber, fg (0.6%) hold, h (0.3%)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 5/16

slide-9
SLIDE 9

Introduction Our Approach Experiments Summary

Goal of This Work

SPKR A: SPKR B: SPKR C: SPKR D:

Use only speech activity patterns to segment and classify DAs.

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 6/16

slide-10
SLIDE 10

Introduction Our Approach Experiments Summary

Previous Research on DA Recognition in Meetings

lots of work, e.g.

Ang, Liu & Shriberg, ICASSP 2005. Ji & Bilmes, ICASSP 2005. Zimmermann, Stolcke & Shriberg, ICASSP 2006. Dielmann & Renals, MLMI 2007.

relying on one or more of

true DA boundaries (i.e., DA classification only) word identities (true or ASR) word boundaries (true or ASR)

work in which DA boundaries, word boundaries, and word identities are not assumed has not been done

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 7/16

slide-11
SLIDE 11

Introduction Our Approach Experiments Summary

Previous Research on Talkspurt Modeling in Meetings

also lots of work, e.g.

Brdiczka, Maisonnasse & Reignier, ICMI 2005. Rienks, Zhang, Gatica-Perez & Post, ICMI 2005. Laskowski, Ostendorf & Schultz, SIGdial 2007. Favre, Salamin, Dines & Vinciarelli, ICMI 2008.

collect and model statistics over long observation intervals explicit modeling of speech activity for segmenting and classifying talk in individual talkspurts (and from other participants) has not been done

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 8/16

slide-12
SLIDE 12

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR A: SPKR B: SPKR C: SPKR D:

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-13
SLIDE 13

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B:

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-14
SLIDE 14

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B:

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-15
SLIDE 15

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-16
SLIDE 16

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-17
SLIDE 17

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-18
SLIDE 18

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-19
SLIDE 19

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-20
SLIDE 20

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-21
SLIDE 21

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-22
SLIDE 22

Introduction Our Approach Experiments Summary

Talkspurt (TS) Boundaries = DA Boundaries

SPKR B: TALKSPURT DIALOG ACT

decoding the state of one participant at a time may have 1:1 correspondence between DAs and TSs and 1:1 correspondence between DA-gaps and TS-gaps but may also have TS gaps inside DAs 1:N correspondence between DAs and TSs − → explicitly model intra-DA silence

  • pposite (N:1 correspondence) may also occur

− → entertain possibility that DA boundaries occur anywhere

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 9/16

slide-23
SLIDE 23

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-24
SLIDE 24

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-25
SLIDE 25

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-26
SLIDE 26

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-27
SLIDE 27

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-28
SLIDE 28

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-29
SLIDE 29

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-30
SLIDE 30

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-31
SLIDE 31

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP EGRESS

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-32
SLIDE 32

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-33
SLIDE 33

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-34
SLIDE 34

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-35
SLIDE 35

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-36
SLIDE 36

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-37
SLIDE 37

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-38
SLIDE 38

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-39
SLIDE 39

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP ENTRY

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-40
SLIDE 40

Introduction Our Approach Experiments Summary

Proposed HMM Sub-Topology for DAs

ENTRY EGRESS NON−DA−TERMINAL DA−TERMINAL INTRA−DA TALKSPURT FRAGMENT TALKSPURT FRAGMENT TALKSPURT GAP

SPKR B:

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 10/16

slide-41
SLIDE 41

Introduction Our Approach Experiments Summary

Proposed HMM Topology for Conversational Speech

the complete topology consists of

a DA sub-topology for each of 8 DA types fully connected via inter-DA GAP subnetworks

s q h fh fg bk b aa

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 11/16

slide-42
SLIDE 42

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH1: SPKR: OTH3: OTH4:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-43
SLIDE 43

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH1: SPKR: OTH3: OTH4:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-44
SLIDE 44

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH1: SPKR: OTH3: OTH4: T/2 T/2

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-45
SLIDE 45

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH1: SPKR: OTH3: OTH4: T/2 T/2

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-46
SLIDE 46

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH1: SPKR: OTH3: T/2 T/2 OTH4: OTH2:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-47
SLIDE 47

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH3: OTH4: T/2 T/2 SPKR: OTH1:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-48
SLIDE 48

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH3: OTH4: T/2 T/2 K SPKR: OTH1:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-49
SLIDE 49

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH3: OTH4: T/2 T/2 FEATURE "VECTOR" K SPKR: OTH1:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-50
SLIDE 50

Introduction Our Approach Experiments Summary

Our HMM Observations

OTH2: OTH3: OTH4: T/2 T/2 FEATURE "VECTOR" K SPKR: OTH1:

decoding one participant (SPKR) at a time at instant t, model the thumbnail image of context

consider a temporal context of width T

want invariance under participant-index rotation

rank “OTH” participants by local speaking time

want a fixed-size feature vector: consider only K others model features using state-specific GMMs (after LDA)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 12/16

slide-51
SLIDE 51

Introduction Our Approach Experiments Summary

Experiments

How well can SAD predict DA boundaries and types?

in this work, we decided to use oracle speech activity want to know the inherent information

three specific questions

1

Do other talkers matter?

2

How many others (K) should be considered?

3

What width (T) of temporal context is needed?

K and T have a conversation analysis interpretation

talk is predominantly one-at-a-time − → K is small turns are locally managed − → T is small

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 13/16

slide-52
SLIDE 52

Introduction Our Approach Experiments Summary

Effect of Context Size (T) and Number (K) of Interlocutors

1 2 5 10 15 20 40 20 22 24 26 28 30 K = 0 K = 1 K = 2 K = 3 no context

AVERAGE F−SCORE OVER 8 CLASSES T IN SECONDS

considering K≥1 most-talkative interlocutors is always better considering the K = 1 most-talkative suffices performance for K≥1 flattens out as T − → 10 seconds

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 14/16

slide-53
SLIDE 53

Introduction Our Approach Experiments Summary

Effect of Adding Other Talkers

DA Type K = 0 K = 3 ∆F/Forig Statement s 91.4 → 91.3 −0.08 Question q 23.4 → 26.3 +12.3† Backchannel b 56.7 → 57.8 +1.9† Acknowledgment bk 12.6 → 14.9 +18.5 Assert aa 8.7 → 13.0 +49.4† Floor holder fh 21.7 → 25.6 +18.3† Floor grabber fg 10.4 → 13.7 +31.8 Hold h 1.1 → 6.3 +485.6† large improvements for all but statements and backchannels for backchannels, already doing well at K = 0

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 15/16

slide-54
SLIDE 54

Introduction Our Approach Experiments Summary

Further Results

1 by adding speech activity, we achieved improvements over a

state-of-the-art lexical DA recognizer

particularly for floor grabbers, asserts, and questions remarkable because the lexical system uses true words

2 large and significant improvements for DA-terminal

phenomena, in particular for interruption (F = 10.7% → 22.6%)

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 16/16

slide-55
SLIDE 55

Introduction Our Approach Experiments Summary

Summary

GOAL:

given only speech/non-speech activity jointly segment and classify into DAs

APPROACH:

frame-level HMM decoding consider (target speaker and) interlocutor activity

RESULTS:

can actually get a lot out of speech/non-speech it’s useful to model the other talkers sufficient to consider the single locally most-talkative interlocutor, K = 1 sufficient to consider a temporal window of T = 10 seconds additional benefit: complimentary to lexical information additional benefit: improved recognition of DA termination

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 17/16

slide-56
SLIDE 56

Introduction Our Approach Experiments Summary

THANK YOU

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 18/16

slide-57
SLIDE 57

Introduction Our Approach Experiments Summary

Lexical & ∆F DA Type Lexical VocInt (% rel) Floor grabber fg 24.5 → 27.0 +9.8* Hold h 41.5 → 42.3 +2.0* Floor holder fh 63.5 → 64.5 +1.5 Backchannel b 77.0 → 77.9 +1.1* Acknowledgment bk 56.3 → 56.0 −0.5 Assert aa 40.0 → 42.0 +5.0*† Question q 39.8 → 42.5 +6.8*† Statement s 93.3 → 93.5 +0.2*† Interruption 21.9 → 34.1 +56.0*† Abandonment 13.0 → 14.4 +10.3 † Termination 69.1 → 69.6 +0.7 †

  • K. Laskowski & E. Shriberg

Interspeech 2009, Brighton, UK 19/16