How do you pronounce your name? Improving G2P with transliterations - - PowerPoint PPT Presentation

how do you pronounce your name improving g2p with
SMART_READER_LITE
LIVE PREVIEW

How do you pronounce your name? Improving G2P with transliterations - - PowerPoint PPT Presentation

How do you pronounce your name? Improving G2P with transliterations Aditya Bhargava and Grzegorz Kondrak University of Alberta ACL-HLT 2011 Introduction Name pronunciations can be fickle Speech synthesis systems must handle them


slide-1
SLIDE 1

How do you pronounce your name? Improving G2P with transliterations

Aditya Bhargava and Grzegorz Kondrak University of Alberta ACL-HLT 2011

slide-2
SLIDE 2

Introduction

  • Name pronunciations can be fickle

– Speech synthesis systems must handle them – Best G2P system can't account for how I decide

my name is pronounced

  • Existing transliterations encode this info

– Ample data that can be easily mined from the

Web

slide-3
SLIDE 3

Objective: apply transliterations

Gershwin / w n/? d ͡ ʒʌɹʃwɪn/@ ɪn/@ / w n/? ɡʌɹʃwɪn/@ ɪn/@ ...? ガーシュウィン Гершвин

slide-4
SLIDE 4

Applying transliterations

  • Assume existing G2P base systems

– Produce n-best output lists

  • Assume available transliteration
  • Pick candidate output that is “most similar” to

transliteration

slide-5
SLIDE 5

Data

  • G2P: Combilex

– Provides “name” annotations

  • Transliterations: NEWS Shared Task 2010

English-to-Hindi data

  • Intersect data
slide-6
SLIDE 6

Base systems

  • Festival (Black et al., 1998)

– CARTs – Popular end-to-end speech synthesis

  • Sequitur (Bisani and Ney, 2008)

– Generative joint n-grams – G2P only

  • DirecTL+ (Jiampojamarn et al., 2008)

– Discriminative phrasal decoding – G2P only

slide-7
SLIDE 7

Similarity

  • Similarity measures:

– ALINE phoneme-to-phoneme aligner score

  • Rule-based G2P converter for Hindi

– M2M-Aligner alignment system score

  • Extension of learned edit distance algorithm
  • Two overall approaches:

– Use highest similarity score – Combine similarity score with system score

slide-8
SLIDE 8

Similarity: results

Festival Sequitur DirecTL+ 10 20 30 40 50 60 70 80 Base ALINE M2M ALINE+Base M2M+Base Word accuracy

slide-9
SLIDE 9

Similarity: results

Festival Sequitur DirecTL+ 10 20 30 40 50 60 70 80 Base ALINE M2M ALINE+Base M2M+Base Word accuracy

slide-10
SLIDE 10

Similarity: results

Festival Sequitur DirecTL+ 10 20 30 40 50 60 70 80 Base ALINE M2M ALINE+Base M2M+Base Word accuracy

slide-11
SLIDE 11

Similarity: results

Festival Sequitur DirecTL+ 10 20 30 40 50 60 70 80 Base ALINE M2M ALINE+Base M2M+Base Word accuracy

slide-12
SLIDE 12

Similarity: post mortem

  • Difficult to do!
  • Can't follow transliterations exactly

– Differences in scripts – Differences in languages (phonologies) – Noisy data

  • Need to smooth out this volatility
  • Limited to one language
slide-13
SLIDE 13

SVM re-ranking

  • Many features

– Similarity scores (M2M-Aligner) – Score differences – N-grams based on alignments

between transcriptions and transliterations

  • Similar to features used in

DirecTL+

slide-14
SLIDE 14

SVM re-ranking

  • Many features

– Similarity scores (M2M-Aligner) – Score differences – N-grams based on alignments

between transcriptions and transliterations

  • Similar to features used in

DirecTL+

ガ | ー | シュ | イ | ン

| | | w | n ɡ ɜ ʃwɪn/@ ɪn/@

slide-15
SLIDE 15

SVM re-ranking

  • Allows many languages

– English-to-{Bengali, Chinese, Hindi, Thai,

Japanese, Kannada, Korean, Russian, Tamil}

– Features repeated for each transliteration

slide-16
SLIDE 16

SVM re-ranking

slide-17
SLIDE 17

SVM re-ranking

Festival Sequitur DirecTL+ 50 55 60 65 70 75 80 Base SVM-score SVM-ngram SVM-all Word accuracy

slide-18
SLIDE 18

SVM re-ranking

Festival Sequitur DirecTL+ 50 55 60 65 70 75 80 Base SVM-score SVM-ngram SVM-all Word accuracy

slide-19
SLIDE 19

SVM re-ranking

Festival Sequitur DirecTL+ 50 55 60 65 70 75 80 Base SVM-score SVM-ngram SVM-all Word accuracy

slide-20
SLIDE 20

SVM re-ranking

Festival Sequitur DirecTL+ 50 55 60 65 70 75 80 Base SVM-score SVM-ngram SVM-all Word accuracy

slide-21
SLIDE 21

Analysis

  • SVM re-ranking gives significant improvements
  • Festival and Sequitur get higher improvement

– The better the base system, the harder it is to

re-rank

– n-gram features styled after DirecTL+

  • This benefits Festival and Sequitur
  • Similar features in a novel direction can lead

to improved performance

slide-22
SLIDE 22

Analysis

  • N-gram features most useful

– Granular features – Includes unable-to-align feature

Bacchus

k ʧ क カ к

X

slide-23
SLIDE 23

Multiple languages

≤1 ≤2 ≤3 ≤4 ≤5 ≤6 ≤7 ≤8 ≤9 0.5 1 1.5 2 2.5 3 3.5 4 Number of available transliterations Absolute improvement in word accuracy

slide-24
SLIDE 24

Future work

  • Apply same re-ranking approach to different

tasks (e.g. transliteration) and different data (e.g. transcriptions)

– Very successful results so far

  • Leverage noisy web transcriptions
  • Incorporate supplemental information directly in

system

slide-25
SLIDE 25

Conclusion

  • First use of transliterations for G2P
  • Basic similarity-based methods don't work
  • SVM re-ranking improves all tested base

systems

  • Multiple languages are vital
  • Relevant scripts, etc. are online