Cross-language mapping for small-vocabulary ASR in under-resourced - PowerPoint PPT Presentation

Cross-language mapping for small-vocabulary ASR in under-resourced languages: Investigating the impact of source language choice Anjana Vakil and Alexis Palmer Department of Computational Linguistics and Phonetics University of Saarland, Saarbr¨ ucken, Germany SLTU’14, St. Petersburg 15 May 2014

Outline Small-vocabulary recognition: Why & how Cross-language pronunciation mapping The Salaam method (Qiao et al. 2010) Our contribution: Impact of source language choice Data & method Experimental results Conclusions Ongoing & future work 1 / 13

Small-vocabulary recognition: Why & how Goal: Enable non-experts to quickly develop basic speech-driven applications in any Under-Resourced Language (URL) ◮ Training/adapting recognizer takes data, expertise ◮ Many applications use ≤ 100 terms (e.g. Bali et al. 2013) Strategy: Use existing HRL recognizer for small-vocab recognition in URLs (Sherwani 2009; Qiao et al. 2010) 2 / 13

HRL recognizer Small-vocabulary recognition: Why & how Key: Mapped pronunciation lexicon Terms in target lg. (URL) → Pronunciations in source lg. (HRL) Yoruba English i> → igb@ | ib@ | ...? igba gba 3 / 13

Small-vocabulary recognition: Why & how Key: Mapped pronunciation lexicon Terms in target lg. (URL) → Pronunciations in source lg. (HRL) Yoruba English i> → igb@ | ib@ | ...? igba gba ≈ + HRL recognizer 3 / 13

Cross-language pronunciation mapping The Salaam Method (Qiao et al. 2010) ◮ Requires ≥ 1 sample per term (a few minutes of audio) ◮ Mimics phone decoding ◮ “Super-wildcard” recognition grammar: term → {∗| ∗ ∗| ∗ ∗∗} 10 0 ( ∗ = any source-language phoneme) ◮ Iterative training algorithm finds confidence-ranked matches igba → ibæ@ , ibõ@ , ibE@ , . . . ◮ Accuracy: ≈ 80-98% for ≤ 50 terms 4 / 13

Impact of source language choice Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy 5 / 13

Impact of source language choice Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy Experiment ◮ Target language: Yoruba ◮ Source languages: English (US), French (France) 5 / 13

Impact of source language choice Phonemic segments of Yoruba Found in Found in English French i ɛ ɔ e u h b t d k ɡ a o f s ʃ ɾ ɛ̃ ɔ̃/ã m l j w ĩ ũ ɟ k ͡ p ɡ ͡ b 6 / 13

Data & method Data ◮ 25 Yoruba terms (subset of Qiao et al. 2010 dataset) ◮ 5 samples/term from 2 speakers (1 male, 1 female) ◮ Telephone quality (8 kHz) 7 / 13

Data & method Data ◮ 25 Yoruba terms (subset of Qiao et al. 2010 dataset) ◮ 5 samples/term from 2 speakers (1 male, 1 female) ◮ Telephone quality (8 kHz) Method ◮ Generate Fr./En. lexicons with Salaam (Qiao et al. 2010) • Microsoft Speech Platform ( msdn.microsoft.com/library/hh361572 ) • 1, 3, and 5 pronunciations per term 7 / 13

Data & method Data ◮ 25 Yoruba terms (subset of Qiao et al. 2010 dataset) ◮ 5 samples/term from 2 speakers (1 male, 1 female) ◮ Telephone quality (8 kHz) Method ◮ Generate Fr./En. lexicons with Salaam (Qiao et al. 2010) • Microsoft Speech Platform ( msdn.microsoft.com/library/hh361572 ) • 1, 3, and 5 pronunciations per term ◮ Compare mean word recognition accuracy • Same-speaker: Leave-one-out • Cross-speaker: Train M > Test F; F > M • t -tests for significance ( α = 0.05) 7 / 13

Results Same-speaker accuracy 1 Pronunciation 3 Pronunciations 5 Pronunciations Word recognition accuracy (%) 90 80 70 English French English French English French 80.0 75.2 80.0 77.2 81.6 80.0 p = 0 . 20 p = 0 . 34 p = 0 . 59 8 / 13

Results Cross-Speaker Accuracy Word Recognition Accuracy (%) 80 80 M > F F > M 75 75 English French 70 70 65 65 60 60 55 55 1 pron. 3 prons. 5 prons. En. mean 63.2 71.6 73.6 Fr. mean 60.0 64.8 61.6 p (* ≤ . 05) 0.41 0.04* 0.04* 9 / 13

Results Accuracy by word type ( nasal ) English French Best duro ogba ogba iba shii mejo ogoji ogoji mesan lehin beeni tunse . . . . . . iba mesan igba ookan ogorun sun meta meji bere sun Worst meji igba 10 / 13

Conclusions Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy Predicted: French accuracy > English accuracy Observed: French accuracy ≤ English accuracy 11 / 13

Conclusions Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy Predicted: French accuracy > English accuracy Observed: French accuracy ≤ English accuracy Possible explanations: ◮ Source languages may be too similar w.r.t. target language 11 / 13

Conclusions Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy Predicted: French accuracy > English accuracy Observed: French accuracy ≤ English accuracy Possible explanations: ◮ Source languages may be too similar w.r.t. target language ◮ Better metric needed for evaluating source-target match 11 / 13

Conclusions Hypothesis More phoneme overlap between source/target languages → Easier pronunciation-mapping → Higher recognition accuracy Predicted: French accuracy > English accuracy Observed: French accuracy ≤ English accuracy Possible explanations: ◮ Source languages may be too similar w.r.t. target language ◮ Better metric needed for evaluating source-target match ◮ Baseline recognizer accuracy may play a role 11 / 13

Ongoing & future work lex4all : Pronunciation Lex icons for A ny L ow-resource L anguage (Vakil et al. 2014) http://lex4all.github.io/lex4all Planned experiments: ◮ More source-target language pairs ◮ Discriminative training (Chan and Rosenfeld 2012) ◮ Algorithm modifications 12 / 13

References K. Bali, S. Sitaram, S. Cuendet, and I. Medhi. “A Hindi speech recognizer ◮ for an agricultural video search application”. In: ACM DEV . 2013. H. Y. Chan and R. Rosenfeld. “Discriminative pronunciation learning for ◮ speech recognition for resource scarce languages”. In: ACM DEV . 2012. F. Qiao, J. Sherwani, and R. Rosenfeld. “Small-vocabulary speech ◮ recognition for resource-scarce languages”. In: ACM DEV . 2010. J. Sherwani. “Speech interfaces for information access by low literate ◮ users”. PhD thesis. Carnegie Mellon University, 2009. A. Vakil, M. Paulus, A. Palmer, and M. Regneri. “lex4all: A ◮ language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition”. In: ACL 2014: System Demonstrations . 2014. Thank you! Thanks also to: Roni Rosenfeld, Mark Qiao, Hao Yee Chan, Dietrich Klakow, Manfred Pinkal 13 / 13

Cross-language mapping for small-vocabulary ASR in under-resourced - PowerPoint PPT Presentation

Cross-language mapping for small-vocabulary ASR in under-resourced languages: Investigating the impact of source language choice Anjana Vakil and Alexis Palmer Department of Computational Linguistics and Phonetics University of Saarland,

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Vocabulary and Reading in Secondary School (VaRiSS) Jessie Ricketts Royal Holloway Vocabulary

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Teaching Vocabulary Pre-Teaching Vocabulary + Pre-Teaching Vocabulary: An Example for 2 nd -5 th

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

PeeringDB 2.0 Arnold Nipper arnold@peeringdb.com 30 Aug - 1 Sep, 2016 AfPIF 2016, Dar es

Technology Transfer Alliance Collaboration Platform Diana Rwegasira University of Dar es

self-serving elite behaviour? Evidence from a randomized survey experiment in Tanzania Ivar

Growing African cities: Tony Venables Dept of Economics, University of Oxford &

Internet Governance in July & August 2017 29 August 2017 Main events in July & August

Internet Governance in June 2017 27 June 2017 Main events in June 30 May-2 June International

CCRN Survey David Gaffney MDPhD Huntsman Cancer Institute University of Utah Cervix Cancer

Asia Pacific Advanced Network (APAN) 32 nd Meeting 32 nd Meeting New Delhi India New Delhi,

Cross-language mapping for small-vocabulary ASR in under-resourced - PowerPoint PPT Presentation

Cross-language mapping for small-vocabulary ASR in under-resourced languages: Investigating the impact of source language choice Anjana Vakil and Alexis Palmer Department of Computational Linguistics and Phonetics University of Saarland,

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

VOCABULARY ATI TEAS ENGLISH AND LANGUAGE USAGE VOCABULARY Vocabulary questions on this part of

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Vocabulary and Reading in Secondary School (VaRiSS) Jessie Ricketts Royal Holloway Vocabulary

Building Science Vocabulary: Seeds of Science Roots of Reading Goal Review our model for

Teaching Vocabulary Pre-Teaching Vocabulary + Pre-Teaching Vocabulary: An Example for 2 nd -5 th

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

PeeringDB 2.0 Arnold Nipper arnold@peeringdb.com 30 Aug - 1 Sep, 2016 AfPIF 2016, Dar es

Technology Transfer Alliance Collaboration Platform Diana Rwegasira University of Dar es

self-serving elite behaviour? Evidence from a randomized survey experiment in Tanzania Ivar

Growing African cities: Tony Venables Dept of Economics, University of Oxford &amp;

Internet Governance in July &amp; August 2017 29 August 2017 Main events in July &amp; August

Internet Governance in June 2017 27 June 2017 Main events in June 30 May-2 June International

CCRN Survey David Gaffney MDPhD Huntsman Cancer Institute University of Utah Cervix Cancer

Asia Pacific Advanced Network (APAN) 32 nd Meeting 32 nd Meeting New Delhi India New Delhi,

Growing African cities: Tony Venables Dept of Economics, University of Oxford &

Internet Governance in July & August 2017 29 August 2017 Main events in July & August