Prosody-Based Unsupervised Speech Summarization with Two-Layer - - PowerPoint PPT Presentation

prosody based unsupervised speech summarization with two
SMART_READER_LITE
LIVE PREVIEW

Prosody-Based Unsupervised Speech Summarization with Two-Layer - - PowerPoint PPT Presentation

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze


slide-1
SLIDE 1

Sujay Kumar Jauhar Yun-Nung (Vivian) Chen Florian Metze

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk

The 6th International Joint Conference on Natural Language Processing – Oct. 14-18, 2013

{sjauhar, yvchen, fmetze}@cs.cmu.edu

Language Technologies Institute School of Computer Science Carnegie Mellon University

slide-2
SLIDE 2

Outline

2

Introduction Approach Experiments Conclusion

slide-3
SLIDE 3

Outline

Introduction Approach Experiments Conclusion

O Motivation O Extractive Summarization

3

slide-4
SLIDE 4

Outline

Introduction Approach Experiments Conclusion

O Motivation O Extractive Summarization

4

slide-5
SLIDE 5

Motivation

O Speech Summarization

O Spoken documents are more difficult to browse than texts

 easy to browse, save time, easily get the key points O Prosodic Features

O Speakers may use prosody to implicitly convey the

importance of the speech

5

slide-6
SLIDE 6

Outline

Introduction Approach Experiments Conclusion

O Motivation O Extractive Summarization

6

slide-7
SLIDE 7

Extractive Summarization (1/2)

O Extractive Speech Summarization

O Select the indicative utterances in a spoken document O Cascade the utterances to form a summary

1st utterance 2nd utterance 3rd utterance 4th utterance : : n-th utterance : :

Extractive Summary 7

slide-8
SLIDE 8

Extractive Summarization (2/2)

O Selection of Indicative Utterances

O Each utterance U in a spoken document d is given an

importance score I(U, d)

O Select the indicative utterances based on I(U,d) O The number of utterances selected as summary is decided

by a predefined ratio

n i

t t t t U    

2 1

   

 

n i i d

t s d U

1

] , [ , I    

utterance term term statistical measure (ex. TF-IDF) Importance score

8

slide-9
SLIDE 9

Outline

Introduction Approach Experiments Conclusion

O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

9

slide-10
SLIDE 10

Outline

Introduction Approach Experiments Conclusion

O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

10

slide-11
SLIDE 11

Prosodic Feature Extraction

O For each pre-segmented audio file, we extract

O number of syllables O number of pauses O duration time: speaking time including pauses O phonation time: speaking time excluding pauses O speaking rate: #syllable / duration time O articulation rate: #syllable / phonation time O fundamental frequency measured in Hz: avg, max, min O energy measured in Pa2/sec O intensity measured in dB

11

slide-12
SLIDE 12

Outline

Introduction Approach Experiments Conclusion

O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

12

slide-13
SLIDE 13

Graph Construction (1/3)

O Utterance-Layer

O Each node is the

utterance in the meeting document

13

U1 U2 U3 U4 U5 U6 U7

Utterance-Layer

slide-14
SLIDE 14

Graph Construction (2/3)

O Utterance-Layer

O Each node is the

utterance in the meeting document O Prosody-Layer

O Each node is a

prosodic feature

14

U1 U2 U3 U4 U5 U6 U7

Utterance-Layer

P1 P2 P3 P4 P5 P6

Prosody-Layer

slide-15
SLIDE 15

Graph Construction (3/3)

O Utterance-Layer

O Each node is the

utterance in the meeting document O Prosody-Layer

O Each node is a

prosodic feature O Between-Layer

Relation

15

U1 U2 U3 U4 U5 U6 U7

Utterance-Layer

P1 P2 P3 P4 P5 P6

Prosody-Layer

O The weight of the edge is the normalized value of the

prosodic feature extracted from the utterance

slide-16
SLIDE 16

Outline

Introduction Approach Experiments Conclusion

O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

16

slide-17
SLIDE 17

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

utterance scores at (t+1)-th iteration 17

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

slide-18
SLIDE 18

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

  • riginal importance of utterances

18

O Original importance

O Utterance: equal weight

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

slide-19
SLIDE 19

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

scores propagated from prosody nodes weighted by prosodic values 19

O Original importance

O Utterance: equal weight

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

slide-20
SLIDE 20

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

prosody scores at (t+1)-th iteration 20

O Original importance

O Utterance: equal weight

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

slide-21
SLIDE 21

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

  • riginal importance of prosodic features

21

O Original importance

O Utterance: equal weight O Prosody: equal weight

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

slide-22
SLIDE 22

O Mathematical Formulation

Two-Layer Mutual Reinforced Random Walk (1/2)

22

O Original importance

O Utterance: equal weight O Prosody: equal weight

U1 U2 U3 U4 U5 U6 U7 Utterance-Layer P1 P2 P3 P4 P5 P6 Prosody-Layer

scores propagated from utterances weighted by prosodic values

slide-23
SLIDE 23

Two-Layer Mutual Reinforced Random Walk (2/2)

23

O Mathematical Formulation Utterance node U can get higher score when

  • More important prosodic features with higher weights

corresponding to utterance U

slide-24
SLIDE 24

Two-Layer Mutual Reinforced Random Walk (2/2)

24

O Mathematical Formulation Utterance node U can get higher score when

  • More important prosodic features with higher weights

corresponding to utterance U Prosody node P can get higher score when

  • More important utterances have higher weights corresponding to

the prosodic feature P  Unsupervised learn important utterances/prosodic features

slide-25
SLIDE 25

Outline

Introduction Approach Experiments Conclusion

O Experimental Setup O Evaluation Metrics O Results O Analysis

25

slide-26
SLIDE 26

Outline

Introduction Approach Experiments Conclusion

O Experimental Setup O Evaluation Metrics O Results O Analysis

26

slide-27
SLIDE 27

O CMU Speech Meeting Corpus

O 10 meetings from 2006/04 – 2006/06 O #Speaker: 6 (total), 2-4 (each meeting) O WER = 44%

O Reference Summaries

O Manually labeled by two annotators as three

“noteworthiness” level (1-3)

O Extract utterances with level 3 as reference summaries

O Parameter Setting

O α = 0.9 O Extractive summary ratio = 10%, 20%, 30%

Experimental Setup

27

slide-28
SLIDE 28

Outline

Introduction Approach Experiments Conclusion

O Experimental Setup O Evaluation Metrics O Results O Analysis

28

slide-29
SLIDE 29

O ROUGE

O ROUGE-1 O F-measure of matched unigram between extracted

summary and reference summary

O ROUGE-L (Longest Common Subsequence) O F-measure of matched LCS between extracted summary

and reference summary O Average Relevance Score

O Average noteworthiness scores for the extracted

utterances

Evaluation Metrics

29

slide-30
SLIDE 30

Outline

Introduction Approach Experiments Conclusion

O Experimental Setup O Evaluation Metrics O Results O Analysis

30

slide-31
SLIDE 31

O Longest

O the longest utterances based on #tokens

O Begin

O the utterances that appear in the beginning

O Latent Topic Entropy (LTE)

O Estimate the “focus” of an utterance O Lower topic entropy represents more topically informative

O TFIDF

O Average TFIDF scores of all words in the utterances

Baseline

31

slide-32
SLIDE 32

2.20 2.25 2.30 2.35 2.40 2.45 2.50 Longest Begin LTE TFIDF Proposed

  • Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

32

10% For 10% summaries, Begin performs best and proposed performs comparable results

slide-33
SLIDE 33

2.20 2.25 2.30 2.35 2.40 2.45 2.50 Longest Begin LTE TFIDF Proposed

  • Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

33

10% & 20% For 20% summaries, proposed approach outperforms all of the baselines

slide-34
SLIDE 34

2.20 2.25 2.30 2.35 2.40 2.45 2.50 Longest Begin LTE TFIDF Proposed

  • Avg. Relevance

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-1

30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 Longest Begin LTE TFIDF Proposed

ROUGE-L

Results

34

10% & 20% & 30% For 30% summaries, proposed approach outperforms all of the baselines

slide-35
SLIDE 35

Outline

Introduction Approach Experiments Conclusion

O Experimental Setup O Evaluation Metrics O Results O Analysis

35

slide-36
SLIDE 36

O Based on converged scores for prosodic features

O Predictive features O number of pauses O min pitch O avg pitch O intensity O Least predictive features O the duration time O the number of syllables O the energy

Analysis

36

slide-37
SLIDE 37

Outline

Introduction Approach Experiments Conclusion

O Two-layer mutually reinforced random walk integrates

prosodic knowledge into an unsupervised model for speech summarization

O We show the first attempt at performing unsupervised

speech summarization without using lexical information

O Compared to some lexically derived baselines, the proposed

approach outperforms all of them but one scenario

37

slide-38
SLIDE 38

38