Leveraging supplemental representations for sequential transduction - - PowerPoint PPT Presentation

leveraging supplemental representations for sequential
SMART_READER_LITE
LIVE PREVIEW

Leveraging supplemental representations for sequential transduction - - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science


slide-1
SLIDE 1

Leveraging supplemental representations for sequential transduction

Aditya Bhargava1 Grzegorz Kondrak2

1Department of Computer Science

University of Toronto

2Department of Computing Science

University of Alberta

NAACL-HLT 2012

1 / 31

slide-2
SLIDE 2

Pronunciation-based tasks

  • rthography

Dickens डक स ディケンズ Диккенс Ντίκενς ⁞

transliterations

/dɪkɪnz/ dIkInz D IH K AH N Z dIk@nz d I k x n z ⁞

transcriptions

2 / 31

slide-3
SLIDE 3

Pronunciation-based tasks

  • rthography

Dickens डक स ディケンズ Диккенс Ντίκενς ⁞

transliterations

/dɪkɪnz/ dIkInz D IH K AH N Z dIk@nz d I k x n z ⁞

transcriptions

2 / 31

slide-4
SLIDE 4

Pronunciation-based tasks

  • rthography

Dickens डक स ディケンズ Диккенс Ντίκενς ⁞

transliterations

/dɪkɪnz/ dIkInz D IH K AH N Z dIk@nz d I k x n z ⁞

transcriptions

MTL G2P BTL TTS SR P2G

2 / 31

slide-5
SLIDE 5

Pronunciation-based tasks

  • rthography

Dickens डक स ディケンズ Диккенс Ντίκενς ⁞

transliterations

/dɪkɪnz/ dIkInz D IH K AH N Z dIk@nz d I k x n z ⁞

transcriptions

MTL G2P BTL TTS SR P2G

2 / 31

slide-6
SLIDE 6

Overview

x supplemental data for y

x ∈ {transcription, transliteration} y ∈ {G2P, MTL}

Rerank outputs from existing system

Features similar to base system, but applied to supplemental data n-grams, alignment/similarity scores

Same approach for system combination

Use another G2P/MTL system’s outputs as supplemental data

3 / 31

slide-7
SLIDE 7

Overview

Excellent results

Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only

4 / 31

slide-8
SLIDE 8

Overview

Excellent results

Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only

4 / 31

slide-9
SLIDE 9

Overview

Excellent results (mostly)

Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only

4 / 31

slide-10
SLIDE 10

Reranking method

From ACL 2011 Looks specifically at transliterations as supplemental data for G2P of names

Names are hard(er) Transliteration is generally applied to named entities Encodes relevant pronunciation information

Using supplemental data, rerank n-best output list of G2P base system Additional findings:

Simple similarity-based methods don’t work Multiple languages are helpful

5 / 31

slide-11
SLIDE 11

Reranking method

Here, we experiment with:

1

Transcriptions as supplemental data for both G2P and MTL

2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data

6 / 31

slide-12
SLIDE 12

Reranking method

Here, we experiment with:

1

Transcriptions as supplemental data for both G2P and MTL

2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data

6 / 31

slide-13
SLIDE 13

Reranking method

Here, we experiment with:

1

Transcriptions as supplemental data for both G2P and MTL

2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data

6 / 31

slide-14
SLIDE 14

Reranking method

Here, we experiment with:

1

Transcriptions as supplemental data for both G2P and MTL

2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data

6 / 31

slide-15
SLIDE 15

Reranking method

Here, we experiment with:

1

Transcriptions as supplemental data for both G2P and MTL

2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data

6 / 31

slide-16
SLIDE 16

Related work

G2P systems

Neural networks, instance-based learning, . . . . . ., joint n-gram models (Sequitur), online discriminative learning (DirecTL+)

MTL systems

Similarly many approaches Lately Sequitur and DirecTL+ have performed quite well at NEWS

7 / 31

slide-17
SLIDE 17

Related work

Using heterogeneous data

Pivot through a third language for transliteration Mostly useful for low-resource environments Hard to incorporate more languages Linear combination of system scores

8 / 31

slide-18
SLIDE 18

Method

input word Sudan 9 / 31

slide-19
SLIDE 19

Method

input word Sudan base system 9 / 31

slide-20
SLIDE 20

Method

input word Sudan base system n-best outputs sud@n sud{n ⁞ sud#n 9 / 31

slide-21
SLIDE 21

Method

input word Sudan base system n-best outputs sud@n sud{n ⁞ sud#n supplemental representations sudAn S UW D AE N スーダン सूडान Судан ⁞ re-ranker 9 / 31

slide-22
SLIDE 22

Method

input word Sudan base system n-best outputs sud@n sud{n ⁞ sud#n supplemental representations sudAn S UW D AE N スーダン सूडान Судан ⁞ re-ranker re-ranked n-best list sud#n sUd#n ⁞ sud@n 9 / 31

slide-23
SLIDE 23

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

10 / 31

slide-24
SLIDE 24

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-25
SLIDE 25

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-26
SLIDE 26

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-27
SLIDE 27

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-28
SLIDE 28

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-29
SLIDE 29

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-30
SLIDE 30

Method

Gershwin input /ɡɜːʃwɪn/ /d͡ʒɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n-best outputs

गशिवन

ガーシュウィン Гершвин transliterations

(/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/)

10 / 31

slide-31
SLIDE 31

Data and base systems

Transcriptions from Combilex and CELEX Transliterations from NEWS 2011

Experiment on English-to-Japanese transliteration

80/10/10 train/dev/test split Sequitur and DirecTL+ as base systems

11 / 31

slide-32
SLIDE 32

G2P experiments

Supplemental transliterations

input McGee candidate outputs m@kJi m@gi ... m@CJi

12 / 31

slide-33
SLIDE 33

G2P experiments

Supplemental transliterations

input McGee candidate outputs m@kJi m@gi ... m@CJi

12 / 31

slide-34
SLIDE 34

G2P experiments

Supplemental transliterations

input McGee candidate outputs m@kJi m@gi ... m@CJi supplemental मगी マギー Макги

12 / 31

slide-35
SLIDE 35

G2P experiments: names

Supplemental transliterations

Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked

13 / 31

slide-36
SLIDE 36

G2P experiments: full set

Supplemental transliterations

Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked

14 / 31

slide-37
SLIDE 37

G2P experiments: core vocab

Supplemental transliterations

Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked

15 / 31

slide-38
SLIDE 38

G2P experiments

Supplemental transcriptions

(word/name)

input Sudan

(CELEX)

candidate outputs sud@n sud{n ... sud#n

16 / 31

slide-39
SLIDE 39

G2P experiments

Supplemental transcriptions

(word/name)

input Sudan

(CELEX)

candidate outputs sud@n sud{n ... sud#n

16 / 31

slide-40
SLIDE 40

G2P experiments

Supplemental transcriptions

(word/name)

input Sudan

(CELEX)

candidate outputs sud@n sud{n ... sud#n

(Combilex)

supplemental sudAn

16 / 31

slide-41
SLIDE 41

G2P experiments: baselines

Supplemental transcriptions

MERGE

1

Convert Combilex to CELEX

2 Merge with CELEX 3 Train on combined set

P2P: phoneme-to-phoneme converter

1

Intersect Combilex and CELEX

2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and

convert it to CELEX format

17 / 31

slide-42
SLIDE 42

G2P experiments: baselines

Supplemental transcriptions

MERGE

1

Convert Combilex to CELEX

2 Merge with CELEX 3 Train on combined set

P2P: phoneme-to-phoneme converter

1

Intersect Combilex and CELEX

2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and

convert it to CELEX format

17 / 31

slide-43
SLIDE 43

G2P experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 70 80 90 100 Word accuracy (%)

Base MERGE P2P Reranked

18 / 31

slide-44
SLIDE 44

G2P experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 70 80 90 100 Word accuracy (%)

Base MERGE P2P Reranked

18 / 31

slide-45
SLIDE 45

G2P experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 70 80 90 100 Word accuracy (%)

Base MERGE P2P Reranked

18 / 31

slide-46
SLIDE 46

G2P experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 70 80 90 100 Word accuracy (%)

Base MERGE P2P Reranked

18 / 31

slide-47
SLIDE 47

MTL experiments

Supplemental transliterations

input John Petrucci candidate outputs जॉन पटरृसी जॉन पटरृची ... जॉन पटरृकॎसी

19 / 31

slide-48
SLIDE 48

MTL experiments

Supplemental transliterations

input John Petrucci candidate outputs जॉन पटरृसी जॉन पटरृची ... जॉन पटरृकॎसी

19 / 31

slide-49
SLIDE 49

MTL experiments

Supplemental transliterations

input John Petrucci candidate outputs जॉन पटरृसी जॉन पटरृची ... जॉन पटरृकॎसी supplemental ジョン ペトルーシ Джон Петруччи

19 / 31

slide-50
SLIDE 50

MTL experiments

Supplemental transliterations

Wikipedia example John Petrucci article exists in English & Japanese, but not Hindi Want to automatically generate stub article in Hindi

Need transliteration of name

Start from English, use Japanese (etc.) TLs to help generate Hindi TL

20 / 31

slide-51
SLIDE 51

MTL experiments

Supplemental transliterations: results

Sequitur DirecTL+ 40 50 60 70 Word accuracy (%) Base Reranked

21 / 31

slide-52
SLIDE 52

MTL experiments

Supplemental transliterations: results

Sequitur DirecTL+ 40 50 60 70 Word accuracy (%) Base Reranked

21 / 31

slide-53
SLIDE 53

MTL experiments

Supplemental transcriptions

input Sudan candidate outputs ズーダン スーダン ... スユーダン

22 / 31

slide-54
SLIDE 54

MTL experiments

Supplemental transcriptions

input Sudan candidate outputs ズーダン スーダン ... スユーダン

22 / 31

slide-55
SLIDE 55

MTL experiments

Supplemental transcriptions

input Sudan candidate outputs ズーダン スーダン ... スユーダン supplemental sud#n (CELEX) sudAn (Combilex)

22 / 31

slide-56
SLIDE 56

MTL experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 50 60 70 80 Word accuracy (%) Base Reranked

23 / 31

slide-57
SLIDE 57

MTL experiments

Supplemental transcriptions: results

Sequitur DirecTL+ 50 60 70 80 Word accuracy (%) Base Reranked

23 / 31

slide-58
SLIDE 58

Analysis

Method works across base systems, but magnitude of improvement varies Sequitur sees higher improvements

1

Lower base score

2 Higher oracle reranker score 3 Reranking features are similar to those used in DirecTL+

24 / 31

slide-59
SLIDE 59

Analysis

Feature similarity indicates DirecTL+’s improvement comes from the supplemental representations, not new features

25 / 31

slide-60
SLIDE 60

Analysis

Using transcriptions and transliterations simultaneously doesn’t provide any additional benefit

26 / 31

slide-61
SLIDE 61

System combination cirino

27 / 31

slide-62
SLIDE 62

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+

27 / 31

slide-63
SLIDE 63

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+ reranker

27 / 31

slide-64
SLIDE 64

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+ reranker

チリノ シリノ チリーノ シリーノ チージーノ ... チリノー

Sequitur

27 / 31

slide-65
SLIDE 65

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+ reranker

チリノ シリノ チリーノ シリーノ チージーノ ... チリノー

Sequitur

27 / 31

slide-66
SLIDE 66

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+ reranker

チリノ シリノ チリーノ シリーノ チージーノ ... チリノー

Sequitur

チリーノ シリノ チリノ シリーノ チリノ ... チシーリノ

27 / 31

slide-67
SLIDE 67

System combination cirino

シリーノ チリーノ シリノ チリノ キリノ ... チシーリノ

DirecTL+ reranker

チリノ シリノ チリーノ シリーノ チージーノ ... チリノー

Sequitur

チリーノ シリノ チリノ シリーノ チリノ ... チシーリノ シリノ チリノ シリーノ チリーノ キリノ ... チリノー

reranker

27 / 31

slide-68
SLIDE 68

System combination

Baseline

Linear combination baseline

Merge the base system lists Linearly combine system scores Manually tune linear parameter on training data

28 / 31

slide-69
SLIDE 69

System combination

Results

Sequitur DirecTL+ 40 50 60 Word accuracy (%) Base LinComb Reranked

29 / 31

slide-70
SLIDE 70

System combination

Results

Sequitur DirecTL+ 40 50 60 Word accuracy (%) Base LinComb Reranked

29 / 31

slide-71
SLIDE 71

System combination

Results

Sequitur DirecTL+ 40 50 60 Word accuracy (%) Base LinComb Reranked

29 / 31

slide-72
SLIDE 72

Summary

Reranking approach effectively leverages supplemental transcriptions and transliterations for G2P and MTL Improvements across two base systems demonstrates that there is inherently useful information in the supplemental representations Treating another system’s output as supplemental data works, but so does a linear combination

30 / 31

slide-73
SLIDE 73

Future work

Reranking is post hoc; direct integration might be more effective Incorporate supplemental information rather than data Other (noisy?) supplemental sources

Wikipedia IPA transcriptions Ad hoc approximately-phonetic re-spellings

31 / 31