Analyzing Hidden Representations in End-to-End Automatic Speech - PowerPoint PPT Presentation

Sep 26, 2022 •245 likes •349 views

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems 1 Leda Sar January 31, 2019 1 Belinkov and Glass, NIPS 2017 1 / 9 Introduction End-to-End (E2E) directly maps acoustic features to symbol (character or word)

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems 1 Leda Sarı January 31, 2019 1 Belinkov and Glass, NIPS 2017 1 / 9
Introduction End-to-End (E2E) directly maps acoustic features to symbol (character or word) sequences Connectionist temporal classification (CTC) � Sequence-to-sequence learning (seq2seq) Question : If and to what extent E2E models implicitly learn phonetic representations internally Goal : Make interpretations of the hidden layer activations in an E2E ASR system Use a pretrained model = ⇒ get frame level features = ⇒ evaluate representations and compare layers 2 / 9
E2E ASR Model It is based on DeepSpeech2 architecture (CNN and RNN layers) Maps acoustics to character sequence using CTC Inputs are spectrograms If x is the input spectrogram, evaluate ASR k t ( x ) output of the k -th layer at the t -th input frame Trained on LibriSpeech with PyTorch implementation of Baidu DeepSpeech2 model Figure: ASR Network architecture 2 2Belinkov and Glass, NIPS 2017 3 / 9
Phoneme Classifier Input: Features from different layers of the DeepSpeech2 Output: Phoneme label Single hidden layer with ReLU nonlinearity Kept simple because the goal is to evaluate the features not achieving the best phoneme recognition Phoneme recognition is performed on TIMIT 4 / 9
Results - Phoneme Classification Accuracy Top layers of the deeper model focus more on modeling character sequences Stride effects time resolution ⇒ better frame accuracy 5 / 9
Results - Clustering k-means (k=500) cluster layer activations Plot the cluster centers using t-SNE Assign phone label based on majority voting 6 / 9
Sound Classes Coarse classes: affricates, fricatives, nasals, semivowels, stops and vowels Train the classifier to predict these classes Better classification accuracy as compared to phonemes Class based comparison between rnn5 and the input layer rnn5 is better at distinguishing between different nasals Affricates are better predicted at rnn5 7 / 9
Sound Classes - Confusions Maximum confusions are between: 1 semivowels/vowels 2 affricates/stops 3 affricates/fricatives 8 / 9
Summary 1 Empirically evaluate the quality of hidden representations with phoneme classification 2 First CNN better represents the phonetic information than the 2nd CNN layer 3 After certain number of RNN layers, accuracy drops = ⇒ top layers do not preserve all the phonetic information 4 Relatively similar coarse classes are confused more 9 / 9

Recommend

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with Laser Guide Star Adaptive Optics Laser Guide Star Adaptive Optics Laser Guide Star Adaptive Optics Laser Guide Star

525 views • 17 slides

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Discrete Markov Processes Hidden Markov Models Inferences from HMMs Training an HMM Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models

471 views • 8 slides

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Hidden Markov Models Hidden Markov Models DepmixS4 DepmixS4 Examples Examples Conclusions Conclusions Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1 & Maarten Speekenbrink 2 DepmixS4 1

121 views • 10 slides

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients Input f But, it is not always on Hidden s f Introducing gates: Input f Allow or disallow input Hidden Allow or

644 views • 28 slides

What do hidden representations learn? Other animals dont like onions (but primates do) Plaut

What do hidden representations learn? Other animals dont like onions (but primates do) Plaut and Shallice (1993) Mapped orthography to semantics (unrelated similarities) Compared similarities among hidden representations to those among

126 views • 9 slides

61A Lecture 16 Announcements String Representations String Representations 4 String

61A Lecture 16 Announcements String Representations String Representations 4 String Representations An object value should behave like the kind of data it is meant to represent 4 String Representations An object value should behave like the

1.22k views • 96 slides

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

S6287 PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA application with the Windows Performance Monitor Richard Wilton Department of Physics and Astronomy Johns Hopkins University S6287: Analyzing

284 views • 25 slides

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

DataCamp Analyzing Survey Data in R ANALYZING SURVEY DATA IN R What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing Survey Data in R Survey data Have you ever found yourself analyzing a dataset that

415 views • 27 slides

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US Census Data in R TIGER/Line Shapefiles DataCamp Analyzing US Census Data in

473 views • 29 slides

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

DataCamp Analyzing Social Media Data in Python ANALYZING SOCIAL MEDIA DATA IN PYTHON Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data in Python DataCamp Analyzing Social Media Data in Python

660 views • 25 slides

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output B Hidden Hidden Input WHY

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output B Hidden Hidden Input WHY RNNs/LSTMs? Can we operate over sequences of inputs? Limitations of vanilla Neural Networks z t Output Outputs a fixed size vector. B Hidden

737 views • 32 slides

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. We call the observed event

744 views • 13 slides

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

CSCE 471/871 Lecture 3: CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden Markov Models Stephen Scott Markov Chains Stephen Scott Hidden Markov Models Specifying an HMM sscott@cse.unl.edu 1

443 views • 26 slides

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

The dual space of a nilpotent Lie group Index sets and representations Index sets and representations Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets and representations June 22 2013 Index

1.5k views • 104 slides

Writing reliable end to end tests End to end browser tests They take a long time to run. Around

Writing reliable end to end tests End to end browser tests They take a long time to run. Around 4-12 hours Long feedback cycles Tough to read or modify Flaky Not part of the development life cycle Unit tests are End to end important but they End

532 views • 38 slides

Wayne Snyder Computer Science Department Boston University Today: Analyzing Rhythm Analyzing

CS 591 S1 Computational Audio Wayne Snyder Computer Science Department Boston University Today: Analyzing Rhythm Analyzing rhythm: basic notions and motivations Onset detection, beat tracking Rhythm analysis Tempo Estimation Time

690 views • 65 slides

Start unit testing your infrastructure now! Eric Nieuwenhuijsen Start unit testing your

Software Development Done Right Start unit testing your infrastructure now! Eric Nieuwenhuijsen Start unit testing your infrastructure now! This presentation Testing practices applied to Infrastructure as Code What? Why? How? 2 Start unit

496 views • 25 slides

E2E Lightpath Services Workshop for Campuses April-May 2012 More... NREN Regional networks IT

E2E Lightpath Services Workshop for Campuses April-May 2012 More... NREN Regional networks IT managers Metropolitan networks Campus networks Access networks LANs Policy makers Campus IT stuff Application Networkers designer developer

242 views • 9 slides

End-to-End principle End-to-end Principle Broad networking principle First implementation

End-to-End principle End-to-end Principle Broad networking principle First implementation in French CYCLADES network (after ARPA) (1970) Articulated in its most recognizable form by Saltzer, Reed, Clark (1981) [paper] Guidance on

123 views • 9 slides

Perspec'ves on End-to-End Vo'ng Systems Ronald L.

Verifiably! Bob Ballot Bob 42 Box Ballot Sue 31 Perspec'ves on End-to-End Vo'ng Systems Ronald L. Rivest MIT CSAIL NIST E2E Workshop George Washington

783 views • 31 slides

Help Your Developers Help Themselves Scott Stancil @hoverduck CHAPTER 1 The Problem

Help Your Developers Help Themselves Scott Stancil @hoverduck CHAPTER 1 The Problem Challenges The project I work on wasnt built with end-to-end testing in mind. Here are a few of the challenges we face: The tests live in a separate

535 views • 19 slides

Lagrangian Relaxa,on for MAP Inference SPFLODD October 8,

Lagrangian Relaxa,on for MAP Inference SPFLODD October 8, 2013 Outline An elegant example of a relaxa,on to TSP A common problem in NLP:

529 views • 28 slides

its impact on mobile virtual network operators Athanasios Lioumpas, Ph.D. C y t a H e l l a s ,

Network slicing: A key technology for 5G and its impact on mobile virtual network operators Athanasios Lioumpas, Ph.D. C y t a H e l l a s , R & D d e p a r t m e n t T h e s s a l o n i k i , G r e e c e 2 0 1 7 contents mobile

324 views • 29 slides

Joseph Jaeger Igors Stepanovs Alice and Bob want E2E secure communication But what about E2E

Optimal Channel Security Against Fine-Grained State Compromise: The Safety of Messaging Joseph Jaeger Igors Stepanovs Alice and Bob want E2E secure communication But what about E2E Tools? Plenty of Theory Symmetric encryption Asymmetric

477 views • 18 slides