decoding and inference with syntactic translation models
play

Decoding and Inference with Syntactic Translation Models Machine - PowerPoint PPT Presentation

Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta


  1. Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn

  2. jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta CFGs S S → NP VP NP V VP → NP VP V → jon-ga NP V NP → NP → Output:

  3. ringo-o jon-ga tabeta Synchronous CFGs S → NP VP NP V VP → V → NP → NP →

  4. ringo-o jon-ga tabeta Synchronous CFGs : (monotonic) S → NP VP 2 1 : (inverted) NP V VP → 2 1 : ate V → : John NP → : an apple NP →

  5. tabeta jon-ga ringo-o Synchronous generation S S NP VP NP VP V NP John NP V ate an apple ( ) Output: jon-ga ringo-o tabeta : John ate an apple

  6. jon-ga ringo-o tabeta Translation as parsing Parse source Project to target S S VP VP NP V NP NP NP V John ate an apple

  7. A closer look at parsing • Parsing is usually done with dynamic programming • Share common computations and structure • Represent exponential number of alternatives in polynomial space • With SCFGs there are two kinds of ambiguity • source parse ambiguity • translation ambiguity • parse forests can represent both!

  8. A closer look at parsing • Any monolingual parser can be used (most often: CKY or variants on the CKY algorithm) • Parsing complexity is O( |n 3 |) • cubic in the length of the sentence ( n 3 ) • cubic in the number of non-terminals ( |G| 3 ) • adding nonterminal types increases parsing complexity substantially! • With few NTs, exhaustive parsing is tractable

  9. Parsing as deduction Antecedents Side conditions A : u B : v φ C : w Consequent “If A and B are true with weights u and v , and phi is also true, then C is true with weight w .”

  10. Example: CKY Inputs: f = h f 1 , f 2 , . . . , f ` i Context-free grammar in Chomsky G normal form. Item form: A subtree rooted with NT type X [ X, i, j ] spanning i to j has been recognized.

  11. Example: CKY Goal: [ S, 0 , � ] Axioms: ( X → f i ) ∈ G w [ X, i − 1 , i ] : w Inference rules: [ X, i, k ] : u [ Y, k, j ] : v ( Z → XY ) ∈ G w [ Z, i, j ] : u × v × w

  12. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  13. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  14. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  15. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  16. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  17. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  18. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  19. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  20. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  21. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  22. duck S → PRP VP VP → V NP S,0,4 VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  23. S,0,4 What is this object? VP,1,4 SBAR 2,4 NP,2,4 V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  24. Semantics of hypergraphs • Generalization of directed graphs • Special node designated the “goal” • Every edge has a single head and 0 or more tails (the arity of the edge is the number of tails) • Node labels correspond to LHS’s of CFG rules • A derivation is the generalization of the graph concept of path to hypergraphs • Weights multiply along edges in the derivation, and add at nodes (cf. semiring parsing )

  25. Edge labels • Edge labels may be a mix of terminals and substitution sites (non-terminals) • In translation hypergraphs, edges are labeled in both the source and target languages • The number of substitution sites must be equal to the arity of the edge and must be the same in both languages • The two languages may have different orders of the substitution sites • There is no restriction on the number of terminal symbols

  26. la lectura la lectura Edge labels la lectura : reading de : 's 1 2 2 1 X S ayer : yesterday X de : from 1 2 1 2 { , ( ) de ayer : yesterday ’s reading } ( ) de ayer : reading from yesterday

  27. Inference algorithms • Viterbi O ( | E | + | V | ) • Find the maximum weighted derivation • Requires a partial ordering of weights • Inside - outside O ( | E | + | V | ) • Compute the marginal (sum) weight of all derivations passing through each edge/node • k -best derivations O ( | E | + | D max | k log k ) • Enumerate the k -best derivations in the hypergraph • See IWPT paper by Huang and Chiang (2005)

  28. Things to keep in mind Bound on the number of edges: | E | ∈ O ( n 3 | G | 3 ) Bound on the number of nodes: | V | ∈ O ( n 2 | G | )

  29. Decoding Again • Translation hypergraphs are a “lingua franca” for translation search spaces • Note that FST lattices are a special case • Decoding problem: how do I build a translation hypergraph?

  30. Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1

  31. jon-ga ringo-o tabeta Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1 “Lexical” rules: : ate S → : S John → : an apple S →

  32. Representational limits • Phrase-based decoding runs in exponential time • All permutations of the source are modeled (traveling salesman problem!) • Typically distortion limits are used to mitigate this • But parsing is polynomial...what’s going on?

  33. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C

  34. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar?

  35. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar? No . Synchronous CFGs cannot generally be binarized!

  36. Does this matter? • The “forbidden” pattern is observed in real data (Melamed, 2003) • Does this matter? • Learning • Phrasal units and higher rank grammars can account for the pattern • Sentences can be simplified or ignored • Translation • The pattern does exist, but how often must it exist (i.e., is there a good translation that doesn’t violate the SCFG matching property)?

  37. Tree-to-string • How do we generate a hypergraph for a tree-to- string translation model? • Simple linear-time (given a fixed translation model) top-down matching algorithm • Recursively cover “uncovered” sites in tree • Each node in the input tree becomes a node in the translation forest • For details, Huang et al. (AMTA, 2006) and Huang et al. (EMNLP , 2010)

  38. } S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 Tree-to-string grammar tabeta → ate ringo-o → an apple jon-ga → John

  39. jon-ga ringo-o tabeta S VP NP NP V S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 tabeta → ate ringo-o → an apple jon-ga → John

  40. jon-ga ringo-o tabeta S S VP 1 2 NP NP V NP VP 1 John 2 1 NP V 2 S( x 1 :NP x 2 :VP) → x 1 x 2 an apple VP( x 1 :NP x 2 :V) → x 2 x 1 ate tabeta → ate ringo-o → an apple jon-ga → John

  41. jon-ga ringo-o tabeta S S VP 1 2 NP NP V NP VP 1 John 2 1 NP V 2 S( x 1 :NP x 2 :VP) → x 1 x 2 an apple VP( x 1 :NP x 2 :V) → x 2 x 1 ate tabeta → ate ringo-o → an apple jon-ga → John

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend