prosody based unsupervised speech summarization with two
play

Prosody-Based Unsupervised Speech Summarization with Two-Layer - PowerPoint PPT Presentation

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze


  1. Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze Language Processing – Oct. 14-18, 2013 Language Technologies Institute School of Computer Science Carnegie Mellon University

  2. 2 Outline Introduction Approach Experiments Conclusion

  3. 3 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization

  4. 4 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization

  5. 5 Motivation O Speech Summarization O Spoken documents are more difficult to browse than texts  easy to browse, save time, easily get the key points O Prosodic Features O Speakers may use prosody to implicitly convey the importance of the speech

  6. 6 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization

  7. 7 Extractive Summarization (1/2) O Extractive Speech Summarization O Select the indicative utterances in a spoken document O Cascade the utterances to form a summary 1st utterance 2nd utterance 3rd utterance Extractive 4th utterance Summary : : n-th utterance : :

  8. 8 Extractive Summarization (2/2) O Selection of Indicative Utterances O Each utterance U in a spoken document d is given an importance score I(U, d) O Select the indicative utterances based on I(U,d) O The number of utterances selected as summary is decided by a predefined ratio utterance term      U t t t t 1 2 i n n            I U , d [ s t i d , ] Importance score  i 1 term statistical measure (ex. TF-IDF)

  9. 9 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

  10. 10 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

  11. 11 Prosodic Feature Extraction O For each pre-segmented audio file, we extract O number of syllables O number of pauses O duration time: speaking time including pauses O phonation time: speaking time excluding pauses O speaking rate: #syllable / duration time O articulation rate: #syllable / phonation time O fundamental frequency measured in Hz: avg, max, min O energy measured in Pa 2 /sec O intensity measured in dB

  12. 12 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

  13. 13 Graph Construction (1/3) O Utterance-Layer O Each node is the utterance in the meeting document U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  14. 14 Graph Construction (2/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  15. 15 Graph Construction (3/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 O Between-Layer Utterance-Layer Relation O The weight of the edge is the normalized value of the prosodic feature extracted from the utterance

  16. 16 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

  17. 17 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation utterance scores at (t+1)-th iteration P 2 P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  18. 18 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of utterances O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  19. 19 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from prosody nodes weighted by prosodic values O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  20. 20 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation prosody scores at (t+1)-th iteration O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  21. 21 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of prosodic features O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  22. 22 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from utterances O Original importance weighted by prosodic values P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

  23. 23 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U

  24. 24 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U Prosody node P can get higher score when • More important utterances have higher weights corresponding to the prosodic feature P  Unsupervised learn important utterances/prosodic features

  25. 25 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

  26. 26 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

  27. 27 Experimental Setup O CMU Speech Meeting Corpus O 10 meetings from 2006/04 – 2006/06 O #Speaker: 6 (total), 2-4 (each meeting) O WER = 44% O Reference Summaries O Manually labeled by two annotators as three “noteworthiness” level (1 -3) O Extract utterances with level 3 as reference summaries O Parameter Setting O α = 0.9 O Extractive summary ratio = 10%, 20%, 30%

  28. 28 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

  29. 29 Evaluation Metrics O ROUGE O ROUGE-1 O F-measure of matched unigram between extracted summary and reference summary O ROUGE-L (Longest Common Subsequence) O F-measure of matched LCS between extracted summary and reference summary O Average Relevance Score O Average noteworthiness scores for the extracted utterances

  30. 30 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

  31. 31 Baseline O Longest O the longest utterances based on #tokens O Begin O the utterances that appear in the beginning O Latent Topic Entropy (LTE) O Estimate the “focus” of an utterance O Lower topic entropy represents more topically informative O TFIDF O Average TFIDF scores of all words in the utterances

  32. 32 10% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 10% summaries, Begin performs best and proposed performs comparable results

  33. 33 10% & 20% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 20% summaries, proposed approach outperforms all of the baselines

  34. 34 10% & 20% & 30% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 30% summaries, proposed approach outperforms all of the baselines

  35. 35 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

  36. 36 Analysis O Based on converged scores for prosodic features O Predictive features O number of pauses O min pitch O avg pitch O intensity O Least predictive features O the duration time O the number of syllables O the energy

  37. 37 Outline Introduction Approach Experiments Conclusion O Two-layer mutually reinforced random walk integrates prosodic knowledge into an unsupervised model for speech summarization O We show the first attempt at performing unsupervised speech summarization without using lexical information O Compared to some lexically derived baselines, the proposed approach outperforms all of them but one scenario

  38. 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend