LID-senone Extraction via Deep Neural Networks for End-to-End - PowerPoint PPT Presentation

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1 Yan Song 1 Ian Mcloughlin 2 Li-Rong Dai 1 Zhong-Fu Ye 1,3 1 National Engineering Laboratory of Speech and Language Information Processing University of Science and Technology of China, China 2 School of Computing, University of Kent, Medway, UK 3 State Key Laboratory of Mathematical Engineering and Advanced Computing, China Presented ed b by Profes essor Ian n McLou oughlin 2016. 2016.06.22

Outline • Introduction • Proposed Method • Experiments and Analysis • Conclusion and Feature Work

Introduction – background • What is Language Identification? • Extract utterance representation from a given speech • State-of-the-art Method • GMM/i-vector • Unsupervised fashion • Deep Learning Method • Natural advantages of supervised training

Introduction – existing method • Improved i-vector Method via Deep Learning • Deep bottleneck network based i-vector representation for language identification (Song et.al ) • Study of senone-based deep neural network approaches for spoken language recognition (Ferrer et.al ) • End-to-End Neural Network • Automatic language identification using deep eep n neu eural n networks (Lopez-Moreno et.al ) • Automatic language identification using lon long s shor ort-term me memor ory r y recurrent neu eural n networks (Gonzalez-Dominguez et.al ) • An end-to-end approach to language identification in short utterances using con onvolu lution ional al n neural al n networ orks (Lozano-Diez et.al )

Proposed Method – motivation and structure • Convolutional Neural Network • convolutional layers: feature extractor at frame level • pooling layers: map frame level features to utterance representation • Structure • DNN layer: transform acoustic features to a compact representation frame by frame • convolutional layer: transform BN features into units discriminative to languages

Proposed Method – structure details • LID-feature • general acoustic features contain too much useless information, may degrade performance • deep bottleneck features (DBF) are discriminate on phones, not on languages • LID-features are discriminative on languages, and irrelevant between dimensions (large conv kernel) • Spatial Pyramid Pooling • spans features from frame level to utterance level • deals with arbitrary input sizes • obtain statistical information at different time scales

Proposed Method – incremental training strategy • LID-features cannot be extracted directly from general acoustic features • lack of training data • features should be bonded with phones at a frame level, so the target cannot be languages • Incremental Training Strategy • transfer learning from large-scale corpus • incremental training with language corpus

Proposed Method – LID-senone and its statistics discriminate at frame discriminate on level utterance level only few LID-senones can be activated

Proposed Method – hybrid temporal evaluation • 30s/10s/3s neural networks are trained independently • 30s speech could be segmented into 10s/3s and use corresponding networks • 10s speech could be segmented into 3s and use the corresponding network

Experiments and Analysis • Dataset • six most confusable languages from NIST LRE 09 (Dari, Farsi, Russian, Ukrainian, Hindi and Urdu) • training duration about 150 hours • evaluation on 30s/10s/3s • Performance indicators: Equal Error Rate (EER) • System • baseline1: BN-GMM/i-vector • baseline2: BN-DNN/i-vector • proposed network1: LID-net • proposed network2: LID-HT-net, LID-net with hybrid temporal evaluation

Experiments and Analysis • Evaluation of Different Convolutional Filter Sizes n c changes • As a consequence, a filter size of 50x21 is selected for all of the following experiments.

Experiments and Analysis • Evaluation of Convolutional Layer Complexity complexity o of conv. l layer er changes es • The performance improves when the complexity increases

Experiments and Analysis • Hybrid Temporal Evaluation • the final LID-net performs well compared with the two baseline systems • i-vector use both zeroth order and first order Baum-Welch statistics. In LID-net, the SPP layer only uses zeroth order Baum-Welch statistics

Outline • Introduction • Proposed Method • Experiments and Analysis • Conclusion and Future Work

Conclusion and Feature Work • Conclusion • we have proposed a comprehensive task-aware network spanning frame to utterance level • an incremental training strategy scheme has been introduced to address over- fitting issues in the deep structure • hybrid temporal evaluation is proposed for various time scales in the same test dataset • Future Work • consider a more comprehensive network rather than relying on three independent networks • Can we incorporate first order B-W statistics?

LID-senone Extraction via Deep Neural Networks for End-to-End - PowerPoint PPT Presentation

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1 Yan Song 1 Ian Mcloughlin 2 Li-Rong Dai 1 Zhong-Fu Ye 1,3 1 National Engineering Laboratory of Speech and Language Information Processing University of

(LID) General LID Principles LID is an ecologically LID is an ecologically- -friendly

LIDs WHAT IS A LID? WHAT IS A LID? LID LO LOCA CAL IM IMPRO PROVEM VEMENT ENT DIS

LID Projects LID Projects Low Impact Design Dave Lagorio LEED AP 1 LID -Definition One of

LID & MS4 Stormwater Permit Nexus Presented by: Eileen E. Dunn, ADEQ LID Basics and Beyond:

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

What is Flexsil-lid? Flexsil-lid is an Australian invention designed to reduce the use and

Applying LID Practices in Arizonas MS4 permit Marie Light LID Workshop Pima County Regional

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Steady Flow: Lid-Driven Cavity Flow This tutorial demonstrates the performance of STAR-CCM+ in

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Python Best Practices on Blue Waters Roland Haas, Victor Anisimov (NCSA) Email:

LID-senone Extraction via Deep Neural Networks for End-to-End - PowerPoint PPT Presentation

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification Ma Jin 1 Yan Song 1 Ian Mcloughlin 2 Li-Rong Dai 1 Zhong-Fu Ye 1,3 1 National Engineering Laboratory of Speech and Language Information Processing University of

(LID) General LID Principles LID is an ecologically LID is an ecologically- -friendly

LIDs WHAT IS A LID? WHAT IS A LID? LID LO LOCA CAL IM IMPRO PROVEM VEMENT ENT DIS

LID Projects LID Projects Low Impact Design Dave Lagorio LEED AP 1 LID -Definition One of

LID &amp; MS4 Stormwater Permit Nexus Presented by: Eileen E. Dunn, ADEQ LID Basics and Beyond:

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

What is Flexsil-lid? Flexsil-lid is an Australian invention designed to reduce the use and

Applying LID Practices in Arizonas MS4 permit Marie Light LID Workshop Pima County Regional

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Steady Flow: Lid-Driven Cavity Flow This tutorial demonstrates the performance of STAR-CCM+ in

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in

Wavelets Shai Avidan Tel Aviv University Slide Credits (partial list) Rick Szeliski

Impact, Risks, and Mitigating Controls for Trafficking and Money Laundering in Art and Antiquities

3D Viewing &amp; Clipping Where do geometries come from? Where do geometries come from? Pin-hole

Python 1 Python Python is high-level programming language for general-purpose programming.

Python: Functions Functions Mathematical functions f(x) = x 2 f(x,y) = x 2 + y 2 In programming

Some Python Basics Getting user input, random numbers, review of if-statements, and Lists...

Python Best Practices on Blue Waters Roland Haas, Victor Anisimov (NCSA) Email:

LID & MS4 Stormwater Permit Nexus Presented by: Eileen E. Dunn, ADEQ LID Basics and Beyond:

3D Viewing & Clipping Where do geometries come from? Where do geometries come from? Pin-hole