Decoding continued 1 Thursday, February 16, 12 Activity Build a - PowerPoint PPT Presentation

Decoding continued 1 Thursday, February 16, 12

Activity Build a translation model that we’ll use later today. Instructions • Subject is “mt-class” • The body has six lines • There is one, one- word translation per line 2 Thursday, February 16, 12

A DMINISTRATIVE - Schedule for language in 10 minutes - Leaderboard 3 Thursday, February 16, 12

T HE S TORY S O F AR ... training data learner model (parallel text) 联合国安全理事会的 decoder 五个常任理事国都 However , the sky remained clear under the strong north wind . 4 Thursday, February 16, 12

SCHEDULE - TUESDAY - stack-based decoding in conception - TODAY - stack-based decoding in practice - scoring, dynamic programming, pruning 5 Thursday, February 16, 12

D ECODING - the process of producing a translation of a sentence - Two main problems: - modeling – given a pair of sentences, how do we assign a probability to them? 他们还缺乏国际比 ( ) 赛的经验 . P = high (C → E) They still lack experience in international competitions 6 Thursday, February 16, 12

D ECODING - the process of producing a translation of a sentence - Two main problems: - modeling – given a pair of sentences, how do we assign a probability to them? 他们还缺乏国际比 ( ) 赛的经验 . P = low (C → E) This is not a good translation of the above sentence. 7 Thursday, February 16, 12

M ODEL - Noisy Channel model P ( e | f ) P ( f | e ) P ( e ) ∝ SPEECH RECOGNITION [English words] NOISE MACHINE TRANSLATION [English words] [French words] NOISE 8 Thursday, February 16, 12

M ODEL T RANSFORMS - Add weights P ( e | f ) P ( f | e ) P ( e ) ∝ P ( f | e ) λ 1 P ( e ) λ 2 ∝ 9 Thursday, February 16, 12

W EIGHTS - Why? 100 - Just like in real life, where 75 we trust people’s claims credibility differently, we will want 50 to learn how to trust 25 different models 0 Your brother Paul Hamm “I can do a backflip off this pommel horse” 10 Thursday, February 16, 12

M ODEL T RANSFORMS - Log space transform P ( e | f ) P ( f | e ) P ( e ) ∝ P ( f | e ) λ 1 P ( e ) λ 2 ∝ = λ 1 log P ( f | e ) + λ 2 log P ( e ) - Because: 0.0001 * 0.0001 * 0.0001 = 0.000000000001 log(0.0001) + log(0.0001) + log(0.0001) = -12 11 Thursday, February 16, 12

M ODEL T RANSFORMS - Generalization P ( e | f ) P ( f | e ) P ( e ) ∝ P ( f | e ) λ 1 P ( e ) λ 2 ∝ = λ 1 log P ( f | e ) + λ 2 log P ( e ) = λ 1 φ 1 ( f , e ) + λ 2 φ 2 ( f , e ) � = λ i φ i ( f , e ) i 12 Thursday, February 16, 12

M ODEL weight e ∗ , a ∗ = argmax � Pr( e , a | c ) = λ e , a i feature function search model how do we what is a good find it? translation? A better “fundamental equation” for MT 13 Thursday, February 16, 12

D ECODING - the process of producing a translation of a sentence - Two main problems: - search – given a model and a source sentence, how do we find the sentence that the model likes best? - impractical: enumerate all sentences, score them - stack decoding: assemble translations piece by piece 14 Thursday, February 16, 12

S TACK DECODING - Start with a list of hypotheses, containing only the empty hypothesis - For each stack - For each hypothesis - For each applicable word - Extend the hypothesis with the word - Place the new hypothesis on the right stack 15 Thursday, February 16, 12

F ACTORING MODELS - Stack decoding works by extending hypotheses word by word tengo + = → am - These can be arranged into a search graph representing the space we search 16 Thursday, February 16, 12

F ACTORING MODELS tengo → am hambre Yo → I → hungry hambre → hunger tengo → have 17 Thursday, February 16, 12

F ACTORING MODELS - Stack decoding works by extending hypotheses word by word tengo + = → am - These can be arranged into a search graph representing the space we search - The component models we use need to factorize over this graph, and we accumulate the score as we go 18 Thursday, February 16, 12

F ACTORING MODELS - Example hypothesis creation: tengo + = → am new old add word hypothesis hypothesis - translation model : trivial case, since all the words are translated independently hypothesis.score += P TM (am | tengo) - a function of just the word that is added 19 Thursday, February 16, 12

F ACTORING MODELS - Example hypothesis creation: tengo + = → am new old add word hypothesis hypothesis - language model : still easy, since (bigram) language models depend only on the previous word hypothesis.score += P LM (am | I) - a function of the old hyp. and the new word translation 20 Thursday, February 16, 12

D YNAMIC P ROGRAMMING - We saw Tuesday how huge the search space could get - Notice anything here? score += tengo + = → am P TM (am | tengo) + P LM (am | I) new old add word hypothesis hypothesis - (1) <s> is never used in computing the scores AND (2) <s> is implicit in the graph structure - let’s get rid of the extra state! 21 Thursday, February 16, 12

D YNAMIC P ROGRAMMING - Before ... ... The score of the - After new hypothesis is the maximum way to compute it ... 22 Thursday, February 16, 12

S TACK DECODING ( WITH DP) - Start with a list of hypotheses, containing only the empty hypothesis - For each stack - For each hypothesis - For each applicable word - Extend the hypothesis with the word - Place the new hypothesis on the right stack IF either (1) no equivalent hypothesis exists or (2) this hypothesis has a higher score. 23 Thursday, February 16, 12

M ORE GENERALLY - What is an “equivalent hypothesis”? - Hypotheses that match on the minimum necessary state: - last word (for language model computation) - the score (of the best way to get here) - the coverage vector (so we know which words we haven’t translated) 24 Thursday, February 16, 12

O LD G RAPH ( BEFORE DP) 25 Thursday, February 16, 12

P RUNING - Even with DP , there are still too many hypotheses - So we prune: - histogram pruning: keep only k items on each stack - threshold pruning: don’t keep items that have a score beyond some distance from the most probable item in the stack 26 Thursday, February 16, 12

S TACK DECODING ( WITH PRUNING ) - Start with a list of hypotheses, containing only the empty hypothesis - For each stack - For each hypothesis - For each applicable word - Extend the hypothesis with the word - If it’s the best, place the new hypothesis on the right stack (possible replacing an old one) - Prune 27 Thursday, February 16, 12

P ITFALLS - Search errors - def: not finding the model’s highest-scoring translation - this happens when the shortcuts we took excluded good hypotheses - Model errors - def: the model’s best hypothesis isn’t a good one - depends on some metric (e.g., human judgment) 28 Thursday, February 16, 12

Activity http://cs.jhu.edu/~post/mt-class/stack-decoder/ Instructions (10 minutes) In groups or alone, find the highest-scoring translation under our model under different stack size and reordering settings. Are there any search or model errors? 29 Thursday, February 16, 12

I MPORTANT CONCEPTS - generalized weighted feature function formulation - decoding as graph search - factorized models for scoring edges - dynamic programming - pruning (histogram, beam/threshold) 30 Thursday, February 16, 12

N OT DISCUSSED ( BUT IMPORTANT ) - Outside (future) cost estimates and A* search - Computational complexity 31 Thursday, February 16, 12

Decoding continued 1 Thursday, February 16, 12 Activity Build a - PowerPoint PPT Presentation

Decoding continued 1 Thursday, February 16, 12 Activity Build a translation model that well use later today. Instructions Subject is mt-class The body has six lines There is one, one- word translation per line 2

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Decoding One Out of Many Nicolas Sendrier INRIA Paris-Rocquencourt, equipe-projet SECRET

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Smaller decoding exponents: ball-collision decoding D. J. Bernstein University of Illinois at

Decoding challenge Assessing the practical hardness of syndrome decoding for code-based

Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding

Deep Learning - Theory and Practice Linear Regression, Least Squares 20-02-2020 Classification

Physics in Rare Kaon Decays Francesca Bucci (INFN, Sezione di Firenze) on behalf of the NA62

An Integrated Platform for Location- based Services Built on Java Technologies C. Kassapoglou

Stability results for a class of second-order evolution equations with intermittent delay

Motion Planning n Problem n Given start state x S , goal state x G n Asked for: a sequence

You cant mess up your website anymore. We know from experience how frustrating it can be when

Searching for patterns in the World Color Survey Gerhard J ager

(Maybe not quite all of) Linguistics in 75 minutes* Wednesday, January 21, 2015 Plan for Today:

Sambuz

Useful Links

Newsletter

Mail Us