CRF Word Alignment & Noisy Channel Translation January 31, 2013 - PowerPoint PPT Presentation

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13

Last Time ... X Translation Translation Alignment p ( p ( ) = ) , Alignment Tuesday, February 19, 13

Last Time ... X Translation Translation Alignment p ( p ( ) = ) , Alignment X Alignment p ( p ( Translation | Alignment ) ) × = Alignment Tuesday, February 19, 13

Last Time ... X Translation Translation Alignment p ( p ( ) = ) , Alignment X Alignment p ( p ( Translation | Alignment ) ) × = | {z } | {z } Alignment { z }| { m z }| X Y p ( e | f , m ) = p ( a | f , m ) × p ( e i | f a i ) i =1 a ∈ [0 ,n ] m Tuesday, February 19, 13

MAP alignment IBM Model 4 alignment Our model's alignment Tuesday, February 19, 13

A few tricks... p(f|e) p(e|f) Tuesday, February 19, 13

Another View With this model: m X Y p ( e | f , m ) = p ( a | f , m ) × p ( e i | f a i ) i =1 a ∈ [0 ,n ] m The problem of word alignment is as: a ∗ = arg a ∈ [0 ,n ] m p ( a | e , f , m ) max Tuesday, February 19, 13

Another View With this model: m X Y p ( e | f , m ) = p ( a | f , m ) × p ( e i | f a i ) i =1 a ∈ [0 ,n ] m The problem of word alignment is as: a ∗ = arg a ∈ [0 ,n ] m p ( a | e , f , m ) max Can we model this distribution directly? Tuesday, February 19, 13

Markov Random Fields (MRFs) p ( A, B, C, X, Y, Z ) = A B C p ( A ) × p ( B | A ) × p ( C | B ) × p ( X | A ) p ( Y | B ) p ( Z | C ) X Y Z Tuesday, February 19, 13

Markov Random Fields (MRFs) p ( A, B, C, X, Y, Z ) = A B C p ( A ) × p ( B | A ) × p ( C | B ) × p ( X | A ) p ( Y | B ) p ( Z | C ) X Y Z p ( A, B, C, X, Y, Z ) = 1 Z × A B C Ψ 1 ( A, B ) × Ψ 2 ( B, C ) × Ψ 3 ( C, D ) × Ψ 4 ( X ) × Ψ 5 ( Y ) × Ψ 6 ( Z ) X Y Z Tuesday, February 19, 13

Markov Random Fields (MRFs) p ( A, B, C, X, Y, Z ) = A B C p ( A ) × p ( B | A ) × p ( C | B ) × p ( X | A ) p ( Y | B ) p ( Z | C ) X Y Z p ( A, B, C, X, Y, Z ) = 1 Z × A B C Ψ 1 ( A, B ) × Ψ 2 ( B, C ) × Ψ 3 ( C, D ) × Ψ 4 ( X ) × Ψ 5 ( Y ) × Ψ 6 ( Z ) X Y Z “Factors” Tuesday, February 19, 13

Computing Z X X Z = Ψ 1 ( x, y ) Ψ 2 ( x ) Ψ 3 ( y ) X Y x ∈ X y ∈ X X = { a , b , c } When the graph has certain X ∈ X structures (e.g., chains), you can Y ∈ X factor to get polytime DP algorithms. X X Z = Ψ 2 ( x ) Ψ 1 ( x, y ) Ψ 3 ( y ) x ∈ X y ∈ X Tuesday, February 19, 13

Log-linear models p ( A, B, C, X, Y, Z ) = 1 Z × A B C Ψ 1 ( A, B ) × Ψ 2 ( B, C ) × Ψ 3 ( C, D ) × Ψ 4 ( X ) × Ψ 5 ( Y ) × Ψ 6 ( Z ) X Y Z X Ψ 1 , 2 , 3 ( x, y ) = exp w k f k ( x, y ) k Tuesday, February 19, 13

Log-linear models p ( A, B, C, X, Y, Z ) = 1 Z × A B C Ψ 1 ( A, B ) × Ψ 2 ( B, C ) × Ψ 3 ( C, D ) × Ψ 4 ( X ) × Ψ 5 ( Y ) × Ψ 6 ( Z ) X Y Z X Ψ 1 , 2 , 3 ( x, y ) = exp w k f k ( x, y ) k Weights (learned) Tuesday, February 19, 13

Log-linear models p ( A, B, C, X, Y, Z ) = 1 Z × A B C Ψ 1 ( A, B ) × Ψ 2 ( B, C ) × Ψ 3 ( C, D ) × Ψ 4 ( X ) × Ψ 5 ( Y ) × Ψ 6 ( Z ) X Y Z X Ψ 1 , 2 , 3 ( x, y ) = exp w k f k ( x, y ) k Weights (learned) Feature functions (specified) Tuesday, February 19, 13

Random Fields • Benefits • Potential functions can be defined with respect to arbitrary features (functions) of the variables • Great way to incorporate knowledge • Drawbacks • Likelihood involves computing Z • Maximizing likelihood usually requires computing Z (often over and over again!) Tuesday, February 19, 13

Conditional Random Fields • Use MRFs to parameterize a conditional distribution. Very easy: let feature functions look at anything they want in the “input” Tuesday, February 19, 13

Conditional Random Fields • Use MRFs to parameterize a conditional distribution. Very easy: let feature functions look at anything they want in the “input” 1 X X p ( y | x ) = Z w ( y ) exp w k f k ( F, x ) F ∈ G k Tuesday, February 19, 13

Conditional Random Fields • Use MRFs to parameterize a conditional distribution. Very easy: let feature functions look at anything they want in the “input” 1 X X p ( y | x ) = Z w ( y ) exp w k f k ( F, x ) F ∈ G k All factors in the graph of y Tuesday, February 19, 13

Parameter Learning • CRFs are trained to maximize conditional likelihood Y w MLE = arg max ˆ p ( y i | x i ; w ) w ( x i , y i ) ∈ D • Recall we want to directly model p ( a | e , f ) • The likelihood of what alignments? Tuesday, February 19, 13

Parameter Learning • CRFs are trained to maximize conditional likelihood Y w MLE = arg max ˆ p ( y i | x i ; w ) w ( x i , y i ) ∈ D • Recall we want to directly model p ( a | e , f ) • The likelihood of what alignments? Gold reference alignments! Tuesday, February 19, 13

CRF for Alignment • One of many possibilities, due to Blunsom & Cohn (2006) | e | 1 X X p ( a | e , f ) = Z w ( e , f ) exp w k f ( a i , a i − 1 , i, e , f ) i =1 k • a has the same form as in the lexical translation models (still make a one-to-many assumption) • w k are the model parameters • f k are the feature functions Tuesday, February 19, 13

CRF for Alignment • One of many possibilities, due to Blunsom & Cohn (2006) | e | 1 X X p ( a | e , f ) = Z w ( e , f ) exp w k f ( a i , a i − 1 , i, e , f ) i =1 k • a has the same form as in the lexical translation models (still make a one-to-many assumption) • w k are the model parameters • f k are the feature functions O ( n 2 m ) ≈ O ( n 3 ) Tuesday, February 19, 13

Model • Labels (one per target word) index source sentence • Train model (e,f) and (f,e) [inverting the reference alignments] Tuesday, February 19, 13

Experiments Tuesday, February 19, 13

pervez musharrafs langer abschied Identical word pervez musharraf ’s long goodbye Identical word 17 Tuesday, February 19, 13

pervez musharrafs langer abschied Matching prefix pervez musharraf ’s long goodbye Identical word Matching prefix 18 Tuesday, February 19, 13

pervez musharrafs langer abschied Matching suffix pervez musharraf ’s long goodbye Identical word Matching prefix Matching suffix 19 Tuesday, February 19, 13

pervez musharrafs langer abschied Orthographic similarity pervez musharraf ’s long goodbye Identical word Matching prefix Matching suffix Orthographic similarity 20 Tuesday, February 19, 13

pervez musharrafs langer abschied In dictionary pervez musharraf ’s long goodbye Identical word In dictionary Matching prefix ... Matching suffix Orthographic similarity 21 Tuesday, February 19, 13

pervez musharrafs langer abschied pervez musharraf ’s long goodbye Identical word In dictionary Matching prefix ... Matching suffix Orthographic similarity 21 Tuesday, February 19, 13

Lexical Features • Word word indicator features ↔ • Various word word co-occurrence scores ↔ • IBM Model 1 probabilities ( t → s , s → t ) • Geometric mean of Model 1 probabilities • Dice’s coefficient [binned] • Products of the above Tuesday, February 19, 13

Lexical Features • ↔ Word class word class indicator • NN translates as NN ( NN_NN =1 ) • NN does not translate as MD ( NN_MD =1 ) • Identical word feature • 2010 = 2010 ( IdentWord =1 IdentNum =1 ) • Identical prefix feature • Obama ~ Obamu ( IdentPrefix =1 ) • Orthographic similarity measure [binned] • Al-Qaeda ~ Al-Kaida ( OrthoSim050_080=1 ) Tuesday, February 19, 13

Other Features • Compute features from large amounts of unlabeled text • Does the Model 4 alignment contain this alignment point? • What is the Model 1 posterior probability of this alignment point? Tuesday, February 19, 13

Results Tuesday, February 19, 13

Summary Tuesday, February 19, 13

Summary Unfortunately, you need gold alignments ! Tuesday, February 19, 13

Putting the pieces together p ( e ) p ( e | f , m ) p ( e , a | f , m ) p ( a | e , f ) Tuesday, February 19, 13

Putting the pieces together • We have seen how to model the following: p ( e ) p ( e | f , m ) p ( e , a | f , m ) p ( a | e , f ) Tuesday, February 19, 13

Putting the pieces together • We have seen how to model the following: p ( e ) p ( e | f , m ) p ( e , a | f , m ) p ( a | e , f ) • Goal: a better model of that knows about p ( e | f , m ) p ( e ) Tuesday, February 19, 13

One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘ This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode . ’ Warren Weaver to Norbert Wiener, March, 1947 Tuesday, February 19, 13

CRF Word Alignment & Noisy Channel Translation January 31, 2013 - PowerPoint PPT Presentation

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13 Last Time ... X Translation Translation Alignment p ( p ( ) = ) , Alignment Tuesday, February 19, 13 Last Time ... X Translation Translation

CRF Word Alignment & Noisy Channel Translation Machine Translation Lecture 6 Instructor:

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

CT CTC-CRF CRF CRF-based sin ingle-stage acoustic ic modeli ling wit ith CT CTC topology

Statistical Machine Translation The Main Idea Treat translation as a noisy channel problem:

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Machine Translation: Word Alignment Problem Marcello Federico FBK, Trento - Italy 2013 M.

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Discriminative word alignment by learning the Discriminative word alignment by learning the

Noisy Channel Coding: Correlated Random Variables & Communication over a Noisy Channel Toni

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Phrase-Based Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Operating Systems Review ENCE 360 High level Concepts What are three conceptual pieces

Processes without Partitions Matthew Flatt University of Utah Adam Wick Robert Bruce Findler

Revision of the Binary Floor Control Protocol (BFCP) for use over an unreliable transport

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT Lecture XII Ken Birman Generalizing Ron and

Electronic Mail: SMTP & % smtp1.tex February 3, 1998 ' $ AIS 2 Electronic mail

Math236 Discrete Maths with Applications P. Ittmann UKZN, Pietermaritzburg Semester 1, 2012

tr Prr trt

vert.x Effortless asynchronous application development for the modern web and enterprise Stuart

CRF Word Alignment & Noisy Channel Translation January 31, 2013 - PowerPoint PPT Presentation

CRF Word Alignment & Noisy Channel Translation January 31, 2013 Tuesday, February 19, 13 Last Time ... X Translation Translation Alignment p ( p ( ) = ) , Alignment Tuesday, February 19, 13 Last Time ... X Translation Translation

CRF Word Alignment &amp; Noisy Channel Translation Machine Translation Lecture 6 Instructor:

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

CT CTC-CRF CRF CRF-based sin ingle-stage acoustic ic modeli ling wit ith CT CTC topology

Statistical Machine Translation The Main Idea Treat translation as a noisy channel problem:

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Machine Translation: Word Alignment Problem Marcello Federico FBK, Trento - Italy 2013 M.

Statistical Machine Translation Overview p EM algorithm Lecture 3 Improved word alignment

Discriminative word alignment by learning the Discriminative word alignment by learning the

Noisy Channel Coding: Correlated Random Variables &amp; Communication over a Noisy Channel Toni

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Phrase-Based Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Operating Systems Review ENCE 360 High level Concepts What are three conceptual pieces

Processes without Partitions Matthew Flatt University of Utah Adam Wick Robert Bruce Findler

Revision of the Binary Floor Control Protocol (BFCP) for use over an unreliable transport

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT Lecture XII Ken Birman Generalizing Ron and

Electronic Mail: SMTP &amp; % smtp1.tex February 3, 1998 ' $ AIS 2 Electronic mail

Math236 Discrete Maths with Applications P. Ittmann UKZN, Pietermaritzburg Semester 1, 2012

tr Prr trt

vert.x Effortless asynchronous application development for the modern web and enterprise Stuart

CRF Word Alignment & Noisy Channel Translation Machine Translation Lecture 6 Instructor:

Noisy Channel Coding: Correlated Random Variables & Communication over a Noisy Channel Toni

Electronic Mail: SMTP & % smtp1.tex February 3, 1998 ' $ AIS 2 Electronic mail