Recent Advances in Automatic Speech Summarization Sadaoki Furui - PowerPoint PPT Presentation

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science Tokyo Institute of Technology

Outline • Introduction • Speech-to-text & speech-to-speech summarization • Summarization methods – Sentence extraction-based methods – Sentence compaction-based methods – Combination of sentence extraction and sentence compaction – Sentence segmentation • Evaluation schemes – Extrinsic and intrinsic evaluations – SumACCY – ROUGE – Experimental results • Conclusions

Major speech recognition applications • Conversational systems for accessing information services (e.g. automatic flight status or stock quote information systems) • Systems for transcribing, understanding and information extraction from ubiquitous speech documents (e.g. broadcast news, meetings, lectures, presentations and voicemails) Spoken Document Retrieval (SDR)

User Audio clips Audio Clip Requests QUERY & RETRIEVAL Query Index Retrieval Web Server Segmentation/Cluster info. Metadata Audio Metadata Rich Construction Archive Transcription Audio Fetching & ENROLLMENT Transcoder Transcription Speech Recognition & Audio Tagging AUDIO ENTRY Audio Segmentation & Clustering Spoken Document Transcriber Spoken document retrieval system at Univ. Colorado Boulder

ASR transcription Word level Word level Word level Word level Spoken document Name Enti Name Entity Detection ty Detection Name Entity Detection Name Enti ty Detection retrieval (SDR) People, locations, Entit En En Entit tity lev tity lev level level organizations Segm Segmentation & entation & Diarization Diarization Diarization Segmentation & Segm entation & Diarization Style chunks, speaker turns, Building block level ilding block level Building block level ilding block level paragraphs Information Ext Information Extraction Information Extraction Information Ext action action Machine translation Machine translation Multiple languages Titles, key concepts, Concept level oncept level Concept level oncept level relationships Docum Docum ocument Summ ocument Summ ent Summarization ent Summarization arization arization Concise abstract of Topic level Topic level Topic level Topic level desired length Anal nalysi ysis & Organization s & Organization Anal nalysi ysis & Organization s & Organization Information retrieval & St Struct St Struct ructure level ructure level ure level ure level (J. Hansen, 2005) brow sing

Speech transcription and summarization for spoken document retrieval (SDR) • Although speech is the most natural and effective method of communication between human beings, it is not easy to quickly review, retrieve and reuse speech documents if they are simply recorded as audio signal. • Therefore, transcribing speech is expected to become a crucial capability for the coming IT era. • Speech summarization which extracts important information and removes redundant and incorrect information is necessary for transcribing spontaneous speech. • Efficient speech summarization saves time for reviewing speech documents and improves the efficiency of document retrieval . • Summarization results can be presented by either text or speech .

Classification of speech summarization methods Audience Generic summarization � User-focused summarization � Query-focused summarization � Topic-focused summarization � Function Indicative summarization � Informative summarization � Extracts vs. abstracts Extract: consists wholly of portions from the source � Abstract: contains material which is not present in the source � Output modality Speech-to-text summarization � Speech-to-speech summarization � Single vs. multiple documents

0205-08 Indicative vs. informative summarization Information extraction Summarization Summarization Speech understanding Indicative summarization Topics Sentence(s) Abstract Target Presentation Summarized summarization utterance(s) Informative Raw utterance(s) summarization

Fundamental problems with speech summarization • Disfluencies, repetitions, word fragments, etc. • Difficulties of sentence segmentation • More spontaneous parts of speech (e.g. interviews in broadcast news) are less amenable to standard text summarization • Speech recognition errors

Speech-to-text/speech summarization Speech-to-text summarization: a) The documents can be easily looked through b) The part of the documents that is interesting for users can be easily extracted c) Information extraction and retrieval techniques can be easily applied to the documents Speech-to-speech summarization: a) Wrong information due to speech recognition errors can be avoided b) Prosodic information such as the emotion of speakers that is conveyed only by speech can be presented

Speech-to-speech summarization • Simply presenting concatenated speech segments that are extracted from original speech, or • Synthesizing summarized text using a speech synthesizer. – Since state-of-the-art speech synthesizers still cannot produce completely natural speech, the former method can easily produce better quality summarizations, and it does not have the problem of synthesizing wrong messages due to speech recognition errors. – The major problem is how to avoid unnatural noisy sound caused by the concatenation.

Speech-to-text summarization methods • Sentence extraction-based methods – LSA-based methods – MMR-based methods – Feature-based methods • Sentence compaction-based methods • Combination of sentence extraction and sentence compaction

Sentence clustering using SVD Information of sentence sentence i i Information of Information of word word j Information of j N sentences σ 1 σ 2 T j V M content Σ A U = words σ Ν i Right singular Right singular Target Matrix Target Matrix Left singular Left singular Singular Singular vector matrix vector matrix vector matrix value matrix vector matrix value matrix SVD semantically clusters content words and sentences SVD semantically clusters content words and sentences Deriving a latent semantic structure from a presentation speech represented by the Deriving a latent semantic structure from a presentation speech represented by the A matrix A matrix A Element a a mn of the matrix A Element mn of the matrix = ⋅ a f log ( F / F ) mn mn A m f : Number of occurrences of a content word ( m ) in the sentence ( n ) : mn F m : Number of occurrences of a content word ( m ) in a large corpus

LSA-based sentence extraction - 1 One of the summarization techniques using the SVD (Gong et al, 2001) 001) One of the summarization techniques using the SVD (Gong et al, 2 Each singular vector represents a salient topic Each singular vector represents a salient topic The singular vector with the largest corresponding singular value represents ue represents The singular vector with the largest corresponding singular val the topic that is the most salient in the presentation speech h the topic that is the most salient in the presentation speec ⎡ ⎤ L v v v 11 21 N 1 ⎢ ⎥ M M M M ⎢ ⎥ Choose a sentence having the largest Choose a sentence having the largest Τ ⎢ ⎥ = L V v v v index within the singular vector k k index within the singular vector 1 k 2 k Nk ⎢ ⎥ M M M M ⎢ ⎥ The sentence best describes the topic The sentence best describes the topic ⎢ ⎥ represented by the singular vector L represented by the singular vector ⎣ ⎦ v v v 1 N 2 N NN Extracted sentences best describe the topics represented by the singular vectors singular vectors Extracted sentences best describe the topics represented by the and are semantically different from each other. and are semantically different from each other.

Drawbacks to the LSA-based method - 1 • Dimensionality is tied to summary length and that good sentence candidates may not be chosen if they do not “win” in any dimension. • When singular vectors are selected incrementally, as the number of vectors being selected increases, the chances that non-relevant topics get included in a summary also increases. LSA-based method -2

Recent Advances in Automatic Speech Summarization Sadaoki Furui - PowerPoint PPT Presentation

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Outline Introduction Speech-to-text & speech-to-speech summarization Summarization methods

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Transcriber driving strategies for transcription aid system Gr egory Senay, Georges Linar`

Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis Rachel

Speech Transcrip-on with Crowdsourcing Crowdsourcing and Human Computa2on Instructor: Chris

New Directions For Neurorehabilitation Karunesh Ganguly, MD PhD Assistant Professor, Department

InSite: Enabling Transparency With Searchable, Shareable, Interactive Transcripts IAnnotate 2018,

GET MORE FROM YOUR VIDEO SERMONS More Information @ ministrywebsitedesigns.com WAY VIDEO MATTERS

Days 3&4: ELAN Our class Google Drive folder: Lesson 1 bit.ly/DigLangDocLSA2019 Andrea

Editing a XVth century political treatise using the computer: a back-and-forth between meaning and

Recent Advances in Automatic Speech Summarization Sadaoki Furui - PowerPoint PPT Presentation

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Outline Introduction Speech-to-text & speech-to-speech summarization Summarization methods

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Transcriber driving strategies for transcription aid system Gr egory Senay, Georges Linar`

Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis Rachel

Speech Transcrip-on with Crowdsourcing Crowdsourcing and Human Computa2on Instructor: Chris

New Directions For Neurorehabilitation Karunesh Ganguly, MD PhD Assistant Professor, Department

InSite: Enabling Transparency With Searchable, Shareable, Interactive Transcripts IAnnotate 2018,

GET MORE FROM YOUR VIDEO SERMONS More Information @ ministrywebsitedesigns.com WAY VIDEO MATTERS

Days 3&amp;4: ELAN Our class Google Drive folder: Lesson 1 bit.ly/DigLangDocLSA2019 Andrea

Editing a XVth century political treatise using the computer: a back-and-forth between meaning and

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Days 3&4: ELAN Our class Google Drive folder: Lesson 1 bit.ly/DigLangDocLSA2019 Andrea