leveraging supplemental representations for sequential
play

Leveraging supplemental representations for sequential transduction - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science


  1. Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science �

  2. Pronunciation-based tasks ⁞ 2 / 31 ⁞ orthography Dickens transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  3. Pronunciation-based tasks ⁞ 2 / 31 ⁞ orthography Dickens transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  4. Pronunciation-based tasks ⁞ ⁞ 2 / 31 orthography Dickens MTL G2P BTL P2G SR TTS transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  5. Pronunciation-based tasks ⁞ ⁞ 2 / 31 orthography Dickens MTL G2P BTL P2G SR TTS transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  6. Overview x supplemental data for y Rerank outputs from existing system Features similar to base system, but applied to supplemental data n -grams, alignment/similarity scores Same approach for system combination Use another G2P/MTL system’s outputs as supplemental data 3 / 31 x ∈ { transcription, transliteration } y ∈ { G2P, MTL } �

  7. Overview Excellent results Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  8. Overview Excellent results Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  9. Overview Excellent results (mostly) Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  10. Reranking method From ACL 2011 Looks specifically at transliterations as supplemental data Names are hard(er) Transliteration is generally applied to named entities Encodes relevant pronunciation information Using supplemental data, rerank n -best output list of G2P base system Additional findings: Simple similarity-based methods don’t work Multiple languages are helpful 5 / 31 for G2P of names �

  11. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  12. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  13. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  14. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  15. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  16. Related work G2P systems learning (DirecTL+) MTL systems Similarly many approaches Lately Sequitur and DirecTL+ have performed quite well at NEWS 7 / 31 Neural networks, instance-based learning, . . . . . . , joint n -gram models (Sequitur), online discriminative �

  17. Related work Using heterogeneous data Pivot through a third language for transliteration Mostly useful for low-resource environments Hard to incorporate more languages Linear combination of system scores 8 / 31 �

  18. Method 9 / 31 input word Sudan �

  19. Method 9 / 31 input word Sudan base system �

  20. Method ⁞ 9 / 31 input word n -best outputs Sudan base system sud@n sud{n sud#n �

  21. Method 9 / 31 ⁞ ⁞ input word n -best outputs Sudan base system sud@n re-ranker sud{n sud#n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

  22. Method ⁞ ⁞ ⁞ 9 / 31 input word n -best outputs re-ranked n -best list Sudan base system sud@n re-ranker sud#n sud{n sUd#n sud#n sud@n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

  23. Method 10 / 31 input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs �

  24. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  25. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  26. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  27. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  28. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  29. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  30. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  31. Data and base systems Transcriptions from Combilex and CELEX Transliterations from NEWS 2011 Experiment on English-to-Japanese transliteration 80/10/10 train/dev/test split Sequitur and DirecTL+ as base systems 11 / 31 �

  32. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi 12 / 31 �

  33. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi 12 / 31 �

  34. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi supplemental 12 / 31 मगी マギー Макги �

  35. G2P experiments: names Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 13 / 31 �

  36. G2P experiments: full set Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 14 / 31 �

  37. G2P experiments: core vocab Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 15 / 31 �

  38. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n 16 / 31 �

  39. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n 16 / 31 �

  40. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n (Combilex) supplemental sudAn 16 / 31 �

  41. G2P experiments: baselines Supplemental transcriptions MERGE 1 Convert Combilex to CELEX 2 Merge with CELEX 3 Train on combined set P2P: phoneme-to-phoneme converter 1 Intersect Combilex and CELEX 2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and convert it to CELEX format 17 / 31 �

  42. G2P experiments: baselines Supplemental transcriptions MERGE 1 Convert Combilex to CELEX 2 Merge with CELEX 3 Train on combined set P2P: phoneme-to-phoneme converter 1 Intersect Combilex and CELEX 2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and convert it to CELEX format 17 / 31 �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend