Leveraging supplemental representations for sequential transduction - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science �

Pronunciation-based tasks ⁞ 2 / 31 ⁞ orthography Dickens transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

Pronunciation-based tasks ⁞ ⁞ 2 / 31 orthography Dickens MTL G2P BTL P2G SR TTS transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

Overview x supplemental data for y Rerank outputs from existing system Features similar to base system, but applied to supplemental data n -grams, alignment/similarity scores Same approach for system combination Use another G2P/MTL system’s outputs as supplemental data 3 / 31 x ∈ { transcription, transliteration } y ∈ { G2P, MTL } �

Overview Excellent results Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

Overview Excellent results (mostly) Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

Reranking method From ACL 2011 Looks specifically at transliterations as supplemental data Names are hard(er) Transliteration is generally applied to named entities Encodes relevant pronunciation information Using supplemental data, rerank n -best output list of G2P base system Additional findings: Simple similarity-based methods don’t work Multiple languages are helpful 5 / 31 for G2P of names �

Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

Related work G2P systems learning (DirecTL+) MTL systems Similarly many approaches Lately Sequitur and DirecTL+ have performed quite well at NEWS 7 / 31 Neural networks, instance-based learning, . . . . . . , joint n -gram models (Sequitur), online discriminative �

Related work Using heterogeneous data Pivot through a third language for transliteration Mostly useful for low-resource environments Hard to incorporate more languages Linear combination of system scores 8 / 31 �

Method 9 / 31 input word Sudan �

Method 9 / 31 input word Sudan base system �

Method ⁞ 9 / 31 input word n -best outputs Sudan base system sud@n sud{n sud#n �

Method 9 / 31 ⁞ ⁞ input word n -best outputs Sudan base system sud@n re-ranker sud{n sud#n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

Method ⁞ ⁞ ⁞ 9 / 31 input word n -best outputs re-ranked n -best list Sudan base system sud@n re-ranker sud#n sud{n sUd#n sud#n sud@n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

Method 10 / 31 input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs �

10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

Data and base systems Transcriptions from Combilex and CELEX Transliterations from NEWS 2011 Experiment on English-to-Japanese transliteration 80/10/10 train/dev/test split Sequitur and DirecTL+ as base systems 11 / 31 �

G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi 12 / 31 �

G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi supplemental 12 / 31 मगी マギー Макги �

G2P experiments: names Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 13 / 31 �

G2P experiments: full set Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 14 / 31 �

G2P experiments: core vocab Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 15 / 31 �

G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n 16 / 31 �

G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n (Combilex) supplemental sudAn 16 / 31 �

G2P experiments: baselines Supplemental transcriptions MERGE 1 Convert Combilex to CELEX 2 Merge with CELEX 3 Train on combined set P2P: phoneme-to-phoneme converter 1 Intersect Combilex and CELEX 2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and convert it to CELEX format 17 / 31 �

Leveraging supplemental representations for sequential transduction - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

SUPPLEMENTAL FINANCIAL OVERVIEW SUPPLEMENTAL FINANCIAL OVERVIEW SUPPLEMENTAL FINANCIAL OVERVIEW

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

61A Lecture 16 Announcements String Representations String Representations 4 String

Transportation Asset Management Supplemental Asset Class Targets March 2018 Supplemental Asset

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Sequential Circuits Combinational circuits : current input output Sequential circuit :

1 Sequential data analysis Sequential data analysis Objects and operators Objects and operators

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Lecture 14: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Manufacturing of gene therapy products common issues and advices Maria Cristina Galli, Ph.D.

VANSPED & KOENIG-TRANS Presentation VANSPED About Us VANSPED Today VANSPED Services

Measuring Performance Overhead of Trans-encrypting HTTP Adaptive Streaming Abe Wiersma BSc. July

Reimagine SamTrans January 2020 Overview What is Reimagine SamTrans? Project Purpose and

Microfluidic Pumps MEMS 1082 Matt Southwick, Michael Gale, Leo Li Pump Mechanisms

life-threatening diseases Corporate Presentation January 2019 Forward-Looking Statements This

Sono Chemical Reactor Design for Biodiesel Production via Transesterification Mohammed Noorul

HOLLYFRONTIER Renewables Update June 2020 Disclosure Statement Statements made during the course

Leveraging supplemental representations for sequential transduction - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

SUPPLEMENTAL FINANCIAL OVERVIEW SUPPLEMENTAL FINANCIAL OVERVIEW SUPPLEMENTAL FINANCIAL OVERVIEW

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

61A Lecture 16 Announcements String Representations String Representations 4 String

Transportation Asset Management Supplemental Asset Class Targets March 2018 Supplemental Asset

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Sequential Circuits Combinational circuits : current input output Sequential circuit :

1 Sequential data analysis Sequential data analysis Objects and operators Objects and operators

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Lecture 14: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

BERT4Rec : Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Manufacturing of gene therapy products common issues and advices Maria Cristina Galli, Ph.D.

VANSPED &amp; KOENIG-TRANS Presentation VANSPED About Us VANSPED Today VANSPED Services

Measuring Performance Overhead of Trans-encrypting HTTP Adaptive Streaming Abe Wiersma BSc. July

Reimagine SamTrans January 2020 Overview What is Reimagine SamTrans? Project Purpose and

Microfluidic Pumps MEMS 1082 Matt Southwick, Michael Gale, Leo Li Pump Mechanisms

life-threatening diseases Corporate Presentation January 2019 Forward-Looking Statements This

Sono Chemical Reactor Design for Biodiesel Production via Transesterification Mohammed Noorul

HOLLYFRONTIER Renewables Update June 2020 Disclosure Statement Statements made during the course

VANSPED & KOENIG-TRANS Presentation VANSPED About Us VANSPED Today VANSPED Services