An interactive timeline for Speech Database Browsing Benoit Favre - - PowerPoint PPT Presentation

an interactive timeline for speech database browsing
SMART_READER_LITE
LIVE PREVIEW

An interactive timeline for Speech Database Browsing Benoit Favre - - PowerPoint PPT Presentation

Introduction Speech Database Browsing Prototype Conclusion An interactive timeline for Speech Database Browsing Benoit Favre SRI STAR Lab Seminar Series 2007-08-02 1 / 24 Introduction Speech Database Browsing Prototype Conclusion


slide-1
SLIDE 1

Introduction Speech Database Browsing Prototype Conclusion

An interactive timeline for Speech Database Browsing

Benoit Favre SRI – STAR Lab Seminar Series 2007-08-02

1 / 24

slide-2
SLIDE 2

Introduction Speech Database Browsing Prototype Conclusion

Who am I?

Benoit Favre

PhD “Automatic Speech Summarization”, at LIA Postdoc at ICSI until March 2008 (sentence segmentation) favre@icsi.berkeley.edu

Former lab: Laboratoire Informatique d’Avignon (LIA)

http://www.lia.univ-avignon.fr – English coming soon Speech group (∼10 permanent and 20 PhD students)

Dialogue systems (Renato De Mori) Speaker id/diarization (Alize toolkit, Jean-Fran¸ cois Bonastre) STT: French and resource-sparse languages Voice/Language pathologies

2 / 24

slide-3
SLIDE 3

Introduction Speech Database Browsing Prototype Conclusion

Who am I?

Benoit Favre

PhD “Automatic Speech Summarization”, at LIA Postdoc at ICSI until March 2008 (sentence segmentation) favre@icsi.berkeley.edu

Former lab: Laboratoire Informatique d’Avignon (LIA)

http://www.lia.univ-avignon.fr – English coming soon Speech group (∼10 permanent and 20 PhD students)

Dialogue systems (Renato De Mori) Speaker id/diarization (Alize toolkit, Jean-Fran¸ cois Bonastre) STT: French and resource-sparse languages Voice/Language pathologies

2 / 24

slide-4
SLIDE 4

Introduction Speech Database Browsing Prototype Conclusion

Outline

1

Introduction

2

Speech Database Browsing Context Interactive timeline

3

Prototype Demo Implementation Performance

4

Conclusion

3 / 24

slide-5
SLIDE 5

Introduction Speech Database Browsing Prototype Conclusion

Outline

1

Introduction

2

Speech Database Browsing Context Interactive timeline

3

Prototype Demo Implementation Performance

4

Conclusion

4 / 24

slide-6
SLIDE 6

Introduction Speech Database Browsing Prototype Conclusion

Application context: spoken information retrieval

SMS: text based query (eg. “baseball results”) Generate a spoken summary of the news Audio delivered by MMS

SMS MMS

5 / 24

slide-7
SLIDE 7

Introduction Speech Database Browsing Prototype Conclusion

Application context: spoken information retrieval

SMS: text based query (eg. “baseball results”) Generate a spoken summary of the news Audio delivered by MMS

SMS MMS

5 / 24

slide-8
SLIDE 8

Introduction Speech Database Browsing Prototype Conclusion

Application context: spoken information retrieval

SMS: text based query (eg. “baseball results”) Generate a spoken summary of the news Audio delivered by MMS

SMS MMS

5 / 24

slide-9
SLIDE 9

Introduction Speech Database Browsing Prototype Conclusion

Approaches

Knowledge rich

Database of information items Text generation Speech synthesis

Open domain (data driven)

Collect broadcast news (or/and other sources) Select informative segments (sentences) Segment playback

Hybrid

Fill the knowledge base from collected BN Contextualize the segment playback with speech synthesis ...

6 / 24

slide-10
SLIDE 10

Introduction Speech Database Browsing Prototype Conclusion

Approaches

Knowledge rich

Database of information items Text generation Speech synthesis

Open domain (data driven)

Collect broadcast news (or/and other sources) Select informative segments (sentences) Segment playback

Hybrid

Fill the knowledge base from collected BN Contextualize the segment playback with speech synthesis ...

6 / 24

slide-11
SLIDE 11

Introduction Speech Database Browsing Prototype Conclusion

Approaches

Knowledge rich

Database of information items Text generation Speech synthesis

Open domain (data driven)

Collect broadcast news (or/and other sources) Select informative segments (sentences) Segment playback

Hybrid

Fill the knowledge base from collected BN Contextualize the segment playback with speech synthesis ...

6 / 24

slide-12
SLIDE 12

Introduction Speech Database Browsing Prototype Conclusion

Approaches

Knowledge rich

Database of information items Text generation Speech synthesis

Open domain (data driven)

Collect broadcast news (or/and other sources) Select informative segments (sentences) Segment playback

Hybrid

Fill the knowledge base from collected BN Contextualize the segment playback with speech synthesis ...

6 / 24

slide-13
SLIDE 13

Introduction Speech Database Browsing Prototype Conclusion

From text to speech summarization

Rich transcription

Acoustic segmentation, diarization Speech-to-text transcript Information extraction

Summarization by sentence selection

Impact of STT errors (and other RT errors) Require accurate sentence boundaries Perception of “cut-and-paste”

Audio only features

Speaker state and identity Emphasis Speech quality

7 / 24

slide-14
SLIDE 14

Introduction Speech Database Browsing Prototype Conclusion

From text to speech summarization

Rich transcription

Acoustic segmentation, diarization Speech-to-text transcript Information extraction

Summarization by sentence selection

Impact of STT errors (and other RT errors) Require accurate sentence boundaries Perception of “cut-and-paste”

Audio only features

Speaker state and identity Emphasis Speech quality

7 / 24

slide-15
SLIDE 15

Introduction Speech Database Browsing Prototype Conclusion

From text to speech summarization

Rich transcription

Acoustic segmentation, diarization Speech-to-text transcript Information extraction

Summarization by sentence selection

Impact of STT errors (and other RT errors) Require accurate sentence boundaries Perception of “cut-and-paste”

Audio only features

Speaker state and identity Emphasis Speech quality

7 / 24

slide-16
SLIDE 16

Introduction Speech Database Browsing Prototype Conclusion

My work at LIA

Setup a rich transcription processing chain

Speeral toolkit for STT Alize platform for diarization Word lattice based NE tagging CRF based Sentence Segmentation

Build and evaluate a text summarization system

MMR-LSA summarization system Document Understanding Conference (DUC) evaluation Impact on audio: simulate ASR

Study possible user interactions

Speech database browsing Interactive timeline

Next PhD student: Audio only features

8 / 24

slide-17
SLIDE 17

Introduction Speech Database Browsing Prototype Conclusion

My work at LIA

Setup a rich transcription processing chain

Speeral toolkit for STT Alize platform for diarization Word lattice based NE tagging CRF based Sentence Segmentation

Build and evaluate a text summarization system

MMR-LSA summarization system Document Understanding Conference (DUC) evaluation Impact on audio: simulate ASR

Study possible user interactions

Speech database browsing Interactive timeline

Next PhD student: Audio only features

8 / 24

slide-18
SLIDE 18

Introduction Speech Database Browsing Prototype Conclusion

My work at LIA

Setup a rich transcription processing chain

Speeral toolkit for STT Alize platform for diarization Word lattice based NE tagging CRF based Sentence Segmentation

Build and evaluate a text summarization system

MMR-LSA summarization system Document Understanding Conference (DUC) evaluation Impact on audio: simulate ASR

Study possible user interactions

Speech database browsing Interactive timeline

Next PhD student: Audio only features

8 / 24

slide-19
SLIDE 19

Introduction Speech Database Browsing Prototype Conclusion

My work at LIA

Setup a rich transcription processing chain

Speeral toolkit for STT Alize platform for diarization Word lattice based NE tagging CRF based Sentence Segmentation

Build and evaluate a text summarization system

MMR-LSA summarization system Document Understanding Conference (DUC) evaluation Impact on audio: simulate ASR

Study possible user interactions

Speech database browsing Interactive timeline

Next PhD student: Audio only features

8 / 24

slide-20
SLIDE 20

Introduction Speech Database Browsing Prototype Conclusion

Outline

1

Introduction

2

Speech Database Browsing Context Interactive timeline

3

Prototype Demo Implementation Performance

4

Conclusion

9 / 24

slide-21
SLIDE 21

Introduction Speech Database Browsing Prototype Conclusion Context

Constraints

Continuous audio archives (BN, Meetings...)

“Decades” of recordings Multiple sources

Why isn’t “raw” summarization suitable?

Reintroduce context Track the source

Information retrieval → exploration

Structure discovery Temporal vs Topical structure

Speech is bound to time

Wait to hear more No static representation

10 / 24

slide-22
SLIDE 22

Introduction Speech Database Browsing Prototype Conclusion Context

Constraints

Continuous audio archives (BN, Meetings...)

“Decades” of recordings Multiple sources

Why isn’t “raw” summarization suitable?

Reintroduce context Track the source

Information retrieval → exploration

Structure discovery Temporal vs Topical structure

Speech is bound to time

Wait to hear more No static representation

10 / 24

slide-23
SLIDE 23

Introduction Speech Database Browsing Prototype Conclusion Context

Constraints

Continuous audio archives (BN, Meetings...)

“Decades” of recordings Multiple sources

Why isn’t “raw” summarization suitable?

Reintroduce context Track the source

Information retrieval → exploration

Structure discovery Temporal vs Topical structure

Speech is bound to time

Wait to hear more No static representation

10 / 24

slide-24
SLIDE 24

Introduction Speech Database Browsing Prototype Conclusion Context

Constraints

Continuous audio archives (BN, Meetings...)

“Decades” of recordings Multiple sources

Why isn’t “raw” summarization suitable?

Reintroduce context Track the source

Information retrieval → exploration

Structure discovery Temporal vs Topical structure

Speech is bound to time

Wait to hear more No static representation

10 / 24

slide-25
SLIDE 25

Introduction Speech Database Browsing Prototype Conclusion Interactive timeline

Multiscale playhead

Synchronous multiscale timeline

Slices representing years, months, days... Dragging one slice synchronize the others Easy “time travel” at every granularity

Annotation

05/13/2003 8h31 pm

playhead : current time location

March April May June July 2001 2002 2003 2004 2005

  • Sun. 11
  • Mon. 12
  • Tue. 13
  • Wed. 14
  • Thu. 15

go forward in time go backward in time

11 / 24

slide-26
SLIDE 26

Introduction Speech Database Browsing Prototype Conclusion Interactive timeline

Multiscale playhead

Synchronous multiscale timeline Annotation

Need for structure information Topic/Event labels Example from Wikipedia (Iraq war)

05/13/2003 8h31 pm

March April May June July 2001 2002 2003 2004 2005

  • Sun. 11
  • Mon. 12
  • Tue. 13
  • Wed. 14
  • Thu. 15

Invasion Iraq disarmament crisis The insurgency expands Coalition Provisional Authority and the Iraq Survey Group "End of major combat" 11 / 24

slide-27
SLIDE 27

Introduction Speech Database Browsing Prototype Conclusion Interactive timeline

Automatic Annotation

Constraints

Reflect a user query Highlight regions of interest Interactive

Approach

Relevance density (information retrieval) Anchorage points (automatic summarization)

12 / 24

slide-28
SLIDE 28

Introduction Speech Database Browsing Prototype Conclusion Interactive timeline

Automatic Annotation

Constraints

Reflect a user query Highlight regions of interest Interactive

Approach

Relevance density (information retrieval) Anchorage points (automatic summarization)

  • Wed. 14

2001 2003 2004 2005

05/13/2003 8h31 pm

March April May June July

  • Sun. 11
  • Mon. 12
  • Tue. 13
  • Thu. 15

2002

12 / 24

slide-29
SLIDE 29

Introduction Speech Database Browsing Prototype Conclusion Interactive timeline

Automatic Annotation

Constraints

Reflect a user query Highlight regions of interest Interactive

Approach

Relevance density (information retrieval) Anchorage points (automatic summarization)

  • Wed. 14

2001 2003 2004 2005

05/13/2003 8h31 pm

March April May June July

  • Sun. 11
  • Mon. 12
  • Tue. 13
  • Thu. 15

2002

12 / 24

slide-30
SLIDE 30

Introduction Speech Database Browsing Prototype Conclusion

Outline

1

Introduction

2

Speech Database Browsing Context Interactive timeline

3

Prototype Demo Implementation Performance

4

Conclusion

13 / 24

slide-31
SLIDE 31

Introduction Speech Database Browsing Prototype Conclusion Demo

Screen capture (and demo if lucky)

14 / 24

slide-32
SLIDE 32

Introduction Speech Database Browsing Prototype Conclusion Implementation

Information density

n highest-relevant sentences Okapi IR model [Robertson et al], P(R|D, Q) P(R|D, Q) ∼

  • w

Pw(1 − Pw) Pw(1 − Pw) ∼

  • w

log f (w, D, Λ) Stop-word removal Context modeling (interpolation with neighboring sentences)

time

15 / 24

slide-33
SLIDE 33

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(0)

16 / 24

slide-34
SLIDE 34

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(0)

16 / 24

slide-35
SLIDE 35

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(0)

16 / 24

slide-36
SLIDE 36

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(1)

16 / 24

slide-37
SLIDE 37

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(2)

16 / 24

slide-38
SLIDE 38

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(3)

16 / 24

slide-39
SLIDE 39

Introduction Speech Database Browsing Prototype Conclusion Implementation

Anchorage points: Maximal Marginal Relevance (MMR)

Select the m highest-representative sentences Greedy sentence selection [Goldstein et al]

ˆ (s)k+1 = argmax

si / ∈mmrk

  • λcoverage(si, q) − (1 − λ) max

sj∈mmrk redundacy(si, sj)

  • Duration based stopping criterion

(4)

16 / 24

slide-40
SLIDE 40

Introduction Speech Database Browsing Prototype Conclusion Implementation

Latent Semantic Analysis (LSA)

Similarity between sentences (Generalized VSM) “Chris purchased a BMW” “Mr. Jones bought a car”

sentence 1 sentence 2 sentence 1 word 1 word 2 word 1 word 2 dim 1 dim 2

Cooccurrence matrix (lexicon × lexicon, sliding window)

Train on a big corpus [Peters et al] Reduce the matrix by SVD, X ∗ = UΣkV T Project sentences, s∗ = Σ−1

k UTs

Cosine similarity, cosine(a, b) =

a·b |a||b|

17 / 24

slide-41
SLIDE 41

Introduction Speech Database Browsing Prototype Conclusion Implementation

Latent Semantic Analysis (LSA)

Similarity between sentences (Generalized VSM) “Chris purchased a BMW” “Mr. Jones bought a car”

sentence 1 sentence 2 sentence 1 word 1 word 2 word 1 word 2 dim 1 dim 2

Cooccurrence matrix (lexicon × lexicon, sliding window)

Train on a big corpus [Peters et al] Reduce the matrix by SVD, X ∗ = UΣkV T Project sentences, s∗ = Σ−1

k UTs

Cosine similarity, cosine(a, b) =

a·b |a||b|

17 / 24

slide-42
SLIDE 42

Introduction Speech Database Browsing Prototype Conclusion Performance

Performance

ESTER 2005 Evaluation (French BN) Task Perf. Measure Speech detection 99 F1-m Speech+Music det. 92 F1-m Music detection 54 F1-m Diarization 19 %err STT 22 WER Sentence Segmentation 68 F1-m Named Entities 63 F1-m

18 / 24

slide-43
SLIDE 43

Introduction Speech Database Browsing Prototype Conclusion Performance

Document Understanding Evaluation

Multidocument, user oriented, text summarization

50 topics, 25 newswire documents per topic Human judgments (linguistic quality and responsiveness) Automatic judgments (not a trivial at all)

ROUGE

Recall in n-grams with a set of hand written summaries Correlated with Human judgements

Hypothesis Multiple References NB( ) NB( ) Rouge = 19 / 24

slide-44
SLIDE 44

Introduction Speech Database Browsing Prototype Conclusion Performance

Document Understanding Evaluation

Multidocument, user oriented, text summarization

50 topics, 25 newswire documents per topic Human judgments (linguistic quality and responsiveness) Automatic judgments (not a trivial at all)

ROUGE

Recall in n-grams with a set of hand written summaries Correlated with Human judgements

Hypothesis Multiple References NB( ) NB( ) Rouge = 19 / 24

slide-45
SLIDE 45

Introduction Speech Database Browsing Prototype Conclusion Performance

DUC Results on text documents

LIA submission at DUC 2006, 2007

Fusion of up to 7 (sentence ranking) systems A lot of heuristics, linguistic pre/post processing

Rouge-2

Fusion MMR-LSA S3 S2 S5 S4

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Baseline 20 / 24

slide-46
SLIDE 46

Introduction Speech Database Browsing Prototype Conclusion Performance

Simulating a spoken content

Simulated STT on DUC documents

Uniform random errors Worst case for a summarizer

Conditions

Noisy: word errors appear in the summary Cleaned: only sentence selection is affected

Degradation WER R2 Noisy R2 Cleaned None 0.0 0.08407 0.08407 Replace OOV 1.0 0.08255

  • 1.8%

0.08318

  • 1.0%

Remove OOV 1.0 0.08283

  • 1.4%

0.08279

  • 1.5%

Replace NE 10.4 0.06741

  • 19.8%

0.08029

  • 4.4%

Remove NE 10.4 0.07211

  • 14.2%

0.07991

  • 4.9%

Random errors 10.0 0.07440

  • 11.5%

0.08232

  • 2.0%

21 / 24

slide-47
SLIDE 47

Introduction Speech Database Browsing Prototype Conclusion Performance

Simulating a spoken content

Simulated STT on DUC documents

Uniform random errors Worst case for a summarizer

Conditions

Noisy: word errors appear in the summary Cleaned: only sentence selection is affected

Degradation WER R2 Noisy R2 Cleaned None 0.0 0.08407 0.08407 Replace OOV 1.0 0.08255

  • 1.8%

0.08318

  • 1.0%

Remove OOV 1.0 0.08283

  • 1.4%

0.08279

  • 1.5%

Replace NE 10.4 0.06741

  • 19.8%

0.08029

  • 4.4%

Remove NE 10.4 0.07211

  • 14.2%

0.07991

  • 4.9%

Random errors 10.0 0.07440

  • 11.5%

0.08232

  • 2.0%

21 / 24

slide-48
SLIDE 48

Introduction Speech Database Browsing Prototype Conclusion Performance

Simulating a spoken content

Simulated STT on DUC documents

Uniform random errors Worst case for a summarizer

Conditions

Noisy: word errors appear in the summary Cleaned: only sentence selection is affected

Degradation WER R2 Noisy R2 Cleaned None 0.0 0.08407 0.08407 Replace OOV 1.0 0.08255

  • 1.8%

0.08318

  • 1.0%

Remove OOV 1.0 0.08283

  • 1.4%

0.08279

  • 1.5%

Replace NE 10.4 0.06741

  • 19.8%

0.08029

  • 4.4%

Remove NE 10.4 0.07211

  • 14.2%

0.07991

  • 4.9%

Random errors 10.0 0.07440

  • 11.5%

0.08232

  • 2.0%

21 / 24

slide-49
SLIDE 49

Introduction Speech Database Browsing Prototype Conclusion Performance

Rouge-2 / WER

0.076 0.077 0.078 0.079 0.08 0.081 0.082 0.083 0.084 0.085 10 20 30 40 50 60 70 80 Rouge 2 WER Randomly reranked clean sentences Noisy Cleaned

Head-Baseline: Rouge2 = 0.049 Random-Baseline: Rouge2 = 0.055

22 / 24

slide-50
SLIDE 50

Introduction Speech Database Browsing Prototype Conclusion

Outline

1

Introduction

2

Speech Database Browsing Context Interactive timeline

3

Prototype Demo Implementation Performance

4

Conclusion

23 / 24

slide-51
SLIDE 51

Introduction Speech Database Browsing Prototype Conclusion

Conclusion and future work

Improving speech database browsing

Multi-scale interactive timeline Annotation using IR and Automatic Summarization techniques

Future work

Evaluation (ergonomics and relevance) Topical dimension: representation, exploration Label formulation Timeline of discourse → Timeline of events Indirect/Passive querying

24 / 24

slide-52
SLIDE 52

Introduction Speech Database Browsing Prototype Conclusion

Conclusion and future work

Improving speech database browsing

Multi-scale interactive timeline Annotation using IR and Automatic Summarization techniques

Future work

Evaluation (ergonomics and relevance) Topical dimension: representation, exploration Label formulation Timeline of discourse → Timeline of events Indirect/Passive querying

24 / 24