Multilingual acoustic word embedding models for processing - PowerPoint PPT Presentation

Multilingual acoustic word embedding models for processing zero-resource languages ICASSP 2020 Herman Kamper 1 , Yevgen Matusevych 2 , Sharon Goldwater 2 1 Stellenbosch University, South Africa 2 University of Edinburgh, UK http://www.kamperh.com/

Background: Why acoustic word embeddings? • Current speech recognition methods require large labelled data sets • Zero-resource speech processing aims to develop methods that can discover linguistic structure from unlabelled speech [Dunbar et al., ASRU’17] • Example applications: Unsupervised term discovery, query-by-example • Problem: Need to compare speech segments of variable duration 1 / 11

Acoustic word embeddings Embedding space with z ∈ R M X (1) z (1) z (2) X (2) 2 / 11

Example application: Query-by-example search z ( q ) Query: Nearest Hits Embed neighbour Search database: z (1) , . . . , z ( N ) Embed all segments/ utterances [Levin et al., ICASSP’15] 3 / 11

Supervised and unsupervised acoustic embeddings • Growing body of work on acoustic word embeddings • Supervised and unsupervised methods • Unsupervised methods can be applied in zero-resource settings • But there is still a large performance gap 4 / 11

Supervised and unsupervised acoustic embeddings CAE-RNN 50 • Growing body of work on acoustic word embeddings 40 Average precision (%) • Supervised and unsupervised methods 30 • Unsupervised methods can be applied in 20 zero-resource settings • But there is still a large performance gap 10 0 Unsupervised Supervised [Kamper, ICASSP’19] 4 / 11

Unsupervised monolingual acoustic word embeddings f 1 f 2 f 3 f T X X x 1 x 2 x 3 x T [Chung et al., Interspeech’16; Kamper, ICASSP’19] 5 / 11

Unsupervised monolingual acoustic word embeddings f 1 f 2 f 3 f T ′ X ′ discovered pair X x 1 x 2 x 3 x T [Chung et al., Interspeech’16; Kamper, ICASSP’19] 5 / 11

Supervised multilingual acoustic word embeddings Russian Polish яблоки acoustic word бежать embedding biec jab � l ka z French X courir pommes x 1 x 2 x 3 x T 6 / 11

Experimental setup • Training data: Six well-resourced languages Czech (CS), French (FR), Polish (PL), Portuguese (PT), Russian (RU), Thai (TH) • Test data: Six languages treated as zero-resource Spanish (ES), Hausa (HA), Croatian (HR), Swedish (SV), Turkish (TR), Mandarin (ZH) • Evaluation: Same-different isolated word discrimination • Embeddings: M = 130 for all models • Baselines: — Downsampling: 10 equally spaced MFCCs flattened — Dynamic time warping (DTW) alignment cost between test segments 7 / 11

1. Is multilingual supervised > monolingual unsupervised? Test results on Spanish Baselines 70 Unsupervised Multilingual 60 Average precision (%) 50 40 30 20 10 0 DTW Downsample CAE-RNN CAE-RNN ClassifierRNN (UTD) (Multiling.) (Multiling.) 8 / 11

1. Is multilingual supervised > monolingual unsupervised? Test results on Hausa 50 Baselines Unsupervised 40 Multilingual Average precision (%) 30 20 10 0 DTW Downsample CAE-RNN CAE-RNN ClassifierRNN (UTD) (Multiling.) (Multiling.) 8 / 11

2. Does training on more languages help? Development results on Croatian 50 CAE-RNN ClassifierRNN 40 Average precision (%) 30 20 10 0 HR (UTD) RU RU+CS RU+CS+FR Multilingual Training set 9 / 11

3. Is the choice of training language important? Evaluation language ES HA HR SV TR ZH 41.6 51.1 41.0 28.7 37.0 42.6 CS FR 42.6 41.8 30.4 25.3 32.5 35.8 Training language 41.1 43.7 35.8 25.5 33.7 39.5 PL 45.9 46.2 36.4 26.6 34.1 39.6 PT RU 35.0 39.7 31.3 22.3 29.7 37.1 28.5 44.5 29.9 17.9 23.6 36.2 TH 10 / 11

Conclusions and future work Conclusions: • Proposed to train a supervised multilingual acoustic word embedding model on well-resourced languages and then apply to zero-resource languages • Multilingual CAE-RNN and ClassifierRNN consistently outperform unsupervised models trained on zero-resource languages 11 / 11

Conclusions and future work Conclusions: • Proposed to train a supervised multilingual acoustic word embedding model on well-resourced languages and then apply to zero-resource languages • Multilingual CAE-RNN and ClassifierRNN consistently outperform unsupervised models trained on zero-resource languages Future work: • Different models both for multilingual and unsupervised training • Analysis to understand the difference between CAE-RNN and ClassifierRNN • Does language conditioning help during decoding? 11 / 11

https://arxiv.org/abs/2002.02109 https://github.com/kamperh/globalphone_awe

Multilingual acoustic word embedding models for processing - PowerPoint PPT Presentation

Multilingual acoustic word embedding models for processing zero-resource languages ICASSP 2020 Herman Kamper 1 , Yevgen Matusevych 2 , Sharon Goldwater 2 1 Stellenbosch University, South Africa 2 University of Edinburgh, UK

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

The Center for Acoustic Neuroma Translabyrinthine Resection of Acoustic Neuroma Indications 1 -

VARIFLEX operable walls Introduction Acoustic overview Acoustic selection table Types of VX

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Multilingual and Multitask Learning in seq2seq Models CMSC 470 Marine Carpuat Multilingual

Securing Real-Time Microcontroller Systems through Customized Memory View Switching + * Chung

TAMA Data Analysis 8th GWDAW, Milwaukee WI, USA, 16th Dec. 2003 Nobuyuki Kanda Department of

Systems Engineering Department Matthew G. Feemster Associate Professor USNA Overview We are

Database Systems IIB: DBMS-Implementation Chapter 4: Disks Prof. Dr. Stefan Brass

Segments, Residuals and Embeddings for Few-Example Video Event Detection Dennis Koelma and Cees

Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings Max Vladymyrov and Miguel

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

EMC BWE PANDA Services and Mounting 27-Apr-15 HIM - EMC BW Endcap 1 Boundaries 27-Apr-15

Sambuz

Useful Links

Newsletter

Mail Us

Multilingual acoustic word embedding models for processing - PowerPoint PPT Presentation

Multilingual acoustic word embedding models for processing zero-resource languages ICASSP 2020 Herman Kamper 1 , Yevgen Matusevych 2 , Sharon Goldwater 2 1 Stellenbosch University, South Africa 2 University of Edinburgh, UK

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Acoustic Modeling: Tied-state HMMs &amp; DNN-based models Lecture 7 CS 753 Instructor: Preethi

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

Multilingual App Toolkit Standards and multilingual software development 29, April 2015 Jan

The Center for Acoustic Neuroma Translabyrinthine Resection of Acoustic Neuroma Indications 1 -

VARIFLEX operable walls Introduction Acoustic overview Acoustic selection table Types of VX

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Multilingual and Multitask Learning in seq2seq Models CMSC 470 Marine Carpuat Multilingual

Securing Real-Time Microcontroller Systems through Customized Memory View Switching + * Chung

TAMA Data Analysis 8th GWDAW, Milwaukee WI, USA, 16th Dec. 2003 Nobuyuki Kanda Department of

Systems Engineering Department Matthew G. Feemster Associate Professor USNA Overview We are

Database Systems IIB: DBMS-Implementation Chapter 4: Disks Prof. Dr. Stefan Brass

Segments, Residuals and Embeddings for Few-Example Video Event Detection Dennis Koelma and Cees

Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings Max Vladymyrov and Miguel

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

EMC BWE PANDA Services and Mounting 27-Apr-15 HIM - EMC BW Endcap 1 Boundaries 27-Apr-15

Sambuz

Useful Links

Newsletter

Mail Us

Acoustic Modeling: Tied-state HMMs & DNN-based models Lecture 7 CS 753 Instructor: Preethi