Generating segment-level foreign-accented synthetic speech with - PowerPoint PPT Presentation

Generating segment-level foreign-accented synthetic speech with natural speech prosody Gustav Eje HENTER, Jaime LORENZO-TRUEBA, Xin WANG, Mariko KONDO, Junichi YAMAGISHI gustav@nii.ac.jp Digital Content and Media Sciences Research Division, National Institute of Informatics (NII), Tokyo, Japan Sunday 18 th February, 2018 G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 1 / 28

Synopsis • We generate foreign-accented synthetic speech audio • . . . with native prosody • . . . and finely controllable accent • . . . using deep learning and multilingual speech synthesis • . . . from non-accented speech data alone G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 2 / 28

Overview 1. Introduction 2. Method 3. Experiment 3.1 Setup 3.2 Evaluation and results 4. Conclusion G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 3 / 28

Studying foreign accent What makes speech sound foreign-accented? • A question of speech perception research • Empirical method: Measure how listeners respond to speech stimuli with carefully controlled differences • Knowledge about accent perception can inform, e.g., foreign-language instruction G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 4 / 28

Cues to foreign accent What makes speech sound foreign-accented? • Supra-segmental properties • Intonation and pauses (Kang et al., 2010) • Nuclear stress (Hahn, 2004) • Duration (Tajima et al., 1997) • Speech rate (Munro and Derwing, 2001) • And more. . . • Segmental properties • Pronunciation errors G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 5 / 28

Cues to foreign accent What makes speech sound foreign-accented? • Supra-segmental properties • Intonation and pauses (Kang et al., 2010) • Nuclear stress (Hahn, 2004) • Duration (Tajima et al., 1997) • Speech rate (Munro and Derwing, 2001) • And more. . . • Segmental properties • Pronunciation errors • This is often the most important aspect according to listeners! (Derwing and Munro, 1997) G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 5 / 28

Studying segmental foreign accent • Need speech stimuli isolating and interpolating segmental effects • Without supra-segmental effects • Only specific segments should be affected G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 6 / 28

Studying segmental foreign accent • Need speech stimuli isolating and interpolating segmental effects • Without supra-segmental effects • Only specific segments should be affected • Method 1: Record deliberate mispronunciations • Difficult to elicit G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 6 / 28

Studying segmental foreign accent • Need speech stimuli isolating and interpolating segmental effects • Without supra-segmental effects • Only specific segments should be affected • Method 1: Record deliberate mispronunciations • Difficult to elicit • Method 2: Cross-language splicing • Labour intensive • Join artefacts G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 6 / 28

Studying segmental foreign accent • Need speech stimuli isolating and interpolating segmental effects • Without supra-segmental effects • Only specific segments should be affected • Method 1: Record deliberate mispronunciations • Difficult to elicit • Method 2: Cross-language splicing • Labour intensive • Join artefacts • Method 3: Synthesise stimuli • Data-driven, automated approach • No joins G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 6 / 28

Our approach • Methods for synthesising foreign-accented stimuli • Multilingual HMM-based TTS (García Lecumberri et al., 2014) • Multilingual deep learning (this presentation!) G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 7 / 28

Our approach • Methods for synthesising foreign-accented stimuli • Multilingual HMM-based TTS (García Lecumberri et al., 2014) • Multilingual deep learning (this presentation!) • We extend (García Lecumberri et al., 2014) in two ways: G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 7 / 28

Our approach • Methods for synthesising foreign-accented stimuli • Multilingual HMM-based TTS (García Lecumberri et al., 2014) • Multilingual deep learning (this presentation!) • We extend (García Lecumberri et al., 2014) in two ways: • Improvement 1: Deep learning • Improved signal quality (Watts et al., 2016), thus replicating more perceptual cues • Flexible in inputs and outputs • Allows easy control of the output synthesis (Watts et al., 2015; Luong et al., 2017) G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 7 / 28

Our approach • Methods for synthesising foreign-accented stimuli • Multilingual HMM-based TTS (García Lecumberri et al., 2014) • Multilingual deep learning (this presentation!) • We extend (García Lecumberri et al., 2014) in two ways: • Improvement 1: Deep learning • Improved signal quality (Watts et al., 2016), thus replicating more perceptual cues • Flexible in inputs and outputs • Allows easy control of the output synthesis (Watts et al., 2015; Luong et al., 2017) • Improvement 2: Use reference prosody (pitch and duration) • Can be taken from natural speech or predicted by a separate system • Allows us to impose native-like suprasegmental properties G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 7 / 28

Overview 1. Introduction 2. Method 3. Experiment 3.1 Setup 3.2 Evaluation and results 4. Conclusion G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 8 / 28

Building the synthesiser Traditional text-to-speech: Text Quinphones MGCs Text analysis Speech Other Acoustic BAPs Vocoder features model Duration Durations F0, VUV model G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 9 / 28

Building the synthesiser Speech synthesis with arbitrary prosody: Text Quinphones MGCs Text Acoustic analysis Other model BAPs Speech Durations Vocoder features Prosody generator F0, VUV G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 9 / 28

Building the synthesiser Speech synthesis with natural prosody: Text Quinphones MGCs Text Acoustic analysis Other model BAPs Speech Durations Vocoder Speech features analysis Natural F0, VUV + HTK speech G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 9 / 28

“Cyborg speech” G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 10 / 28

“Cyborg speech” • “A being with both organic and biomechatronic body parts” • Our acoustic parameters are a chimeric combination of man and machine G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 10 / 28

Making it foreign • Segmental foreign accent through multilingual speech synthesis: • Teach a single model to synthesise several languages natively • Interpolate specific phones in the spoken language towards phones in the accent language • Maintain the same voice across languages • In this case by using data from a multilingually native speaker G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 11 / 28

Making it foreign • Segmental foreign accent through multilingual speech synthesis: • Teach a single model to synthesise several languages natively • Interpolate specific phones in the spoken language towards phones in the accent language • Maintain the same voice across languages • In this case by using data from a multilingually native speaker • Running example: American English and Japanese • Combilex GAM (Richmond et al., 2009): 54 English phones • Open JTalk (Oura et al., 2010): 44 Japanese phones • Combined phoneset: 54 + 44 = 98 phones G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 11 / 28

Synthesising foreign accent Cyborg speech: Text Quinphones MGCs Text Acoustic model analysis Other BAPs Speech Durations Vocoder Speech features analysis Natural F0, VUV + HTK speech G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 12 / 28

Synthesising foreign accent Bilingual cyborg speech synthesis: Language flag DBLSTM Text Bilingual quinphones bilingual MGCs Language- acoustic dependent model text Other Durations BAPs Bilingual Vocoder Speech analysis features speech analysis Native F0, VUV + HTK speech G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 12 / 28

Synthesising foreign accent Foreign-accented speech synthesis: Language flag CONTROL DBLSTM Text Bilingual quinphones bilingual MGCs Language- acoustic dependent model text Other Durations BAPs Accented Vocoder Speech analysis features speech analysis Native F0, VUV + HTK speech G. E. Henter et al. (NII) Generating foreign accent 2018-02-18 12 / 28

Generating segment-level foreign-accented synthetic speech with - PowerPoint PPT Presentation

Generating segment-level foreign-accented synthetic speech with natural speech prosody Gustav Eje HENTER, Jaime LORENZO-TRUEBA, Xin WANG, Mariko KONDO, Junichi YAMAGISHI gustav@nii.ac.jp Digital Content and Media Sciences Research Division,

Acoustic modelling of English-accented and Afrikaans-accented South African English H. Kamper,

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

EBLL Response in HCV Units Segment 1: The Basics EBLL Response in in HCV Units Segment 1:

PCEP Extensions for Service Segment Support in Segment Routing

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

Synthetic Biology and Rational Design Keith Shearwin University of Adelaide Synthetic biology

Modular Synthetic Receptor System Interfaced with Nano Breadboard Synthetic receptor scheme

When we tolerate a morphosyntactic error: an ERP study on non-native accented speech Sendy

GSCB-Workshop on Ground Segment Evolution Strategy N. Hanowski, EOP-G, Ground Segment and

EBLL Response in PBV Units Segment 1: The Basics EBLL L Response e in n PBV V Units

Computational Geometry Lecture 2: Line segment intersection for map overlay 1 Computational

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

CENG 342 Digital Systems Hexadecimal Digit to Seven-segment LED Decoder Larry Pyeatt

Enrica Bellocchi 21 November 2016, Universidad Autnoma de Madrid 1 SELGIFS

SUPERWINDS FROM MASSIVE STAR-FORMING CLUMPS AT Z~2 Sarah

FOURTH QUARTER 2017 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

GakuNin VO Platform GakuNin mAP - Takeshi NISHIMURA (GakuNin, NII, Japan) TERENA VAMP

NAREGI CA Updates NAREGI CA Updates st F2F Meeting in Beijing APGrid PMA 1 st - APGrid PMA 1 -

Management of the Soviet Space Program INST 154 Apollo at 50 First Orbit Jules Verne

Optimal Distributed Covering Algorithms Ran Ben-Basat 1 , Guy Even 2 , Ken-ichi Kawarabayashi 3 ,

UOB Group 1 st Quarter 2016 Financial Highlights Lee Wai Fai Group Chief Financial Officer 28

Sambuz

Useful Links

Newsletter

Mail Us