compiling comp ling
play

Compiling Comp Ling Practical weighted dynamic programming and the - PDF document

An Anecdote from ACL05 Compiling Comp Ling Practical weighted dynamic programming and the Dyna language -Michael Jordan Jason Eisner Eric Goldlust Noah A. Smith HLT-EMNLP, October 2005 1 2 An Anecdote from ACL05 Conclusions to draw


  1. An Anecdote from ACL’05 Compiling Comp Ling Practical weighted dynamic programming and the Dyna language -Michael Jordan Jason Eisner Eric Goldlust Noah A. Smith HLT-EMNLP, October 2005 1 2 An Anecdote from ACL’05 Conclusions to draw from that talk 1. Mike & his students are great. 2. Graphical models are great. -Michael Jordan (because t hey’re f lexible) 3. Gibbs sampling is great. (because it works wit h nearly any graphical model) Just draw a model that actually makes sense for your problem. 4. Matlab is great. (because it f rees up Mike and his st udent s t o doodle all day and t hen execut e t heir doodles) Just do Gibbs sampling. Um, it’s only 6 lines in Matlab… 3 4 Parts of it already are … 1. Mike & his students are great. Language modeling Toolkit s 2. Graphical models are great. available; you Binary classification (e.g., SVMs) (because t hey’re f lexible) don’t have t o be Finite-state transductions 3. Gibbs sampling is great. an expert Linear-chain graphical models (because it works wit h nearly any graphical model) 4. Matlab is great. But other parts aren’t … (because it f rees up Mike and his st udent s t o Ef f icient doodle all day and t hen execut e t heir doodles) Context-free and beyond parsers and MT Machine translation syst ems are complicat ed and painf ul t o writ e 5 6 1

  2. Warning This talk: A toolkit that’s general enough for � This talk is only an advertisement! these cases. � For more details, please (stretches from finite-state to Turing machines) see the paper “Dyna” see http://dyna.org But other parts aren’t … (download + documentation) Ef f icient Context-free and beyond parsers and MT Machine translation syst ems are sign up for updates by email complicat ed and painf ul t o writ e 7 8 How you build a system (“big picture” slide) Wait a minute … cool model Didn’t I just implement something pr act ical equat ions PCFG like this last month? β β ( i , j ) ( j , k ) ( ) β = y z ( ) i , k 0 ∑ → x p N N N | N ≤ < < ≤ i j k n x y z x ... chart management / indexing cache-conscious data structures prioritize partial solutions (best-first, pruning) pseudocode (execut ion or der ) parameter management t uned C++ for width from 2 to n inside-outside formulas implement at ion for i from 0 to n-width different algorithms for training and decoding (dat a st r uct ur es, et c.) k = i+width conjugate gradient, annealing, ... for j from i+1 to k-1 … parallelization? 9 10 We thought computers were supposed to automate drudgery How you build a system (“big picture” slide) How you build a system (“big picture” slide) cool model cool model pr act ical equat ions pr act ical equat ions PCFG PCFG β β β β ( i , j ) ( j , k ) ( i , j ) ( j , k ) ( ) ( ) β = y z β = y z , ( ) , ( ) i k i k 0 ∑ → 0 ∑ → x x p N N N | N p N N N | N ≤ < < ≤ ≤ < < ≤ i j k n x y z x i j k n x y z x ... ... Dyna language specif ies t hese equat ions. Compilat ion st rat egies (we’ll come back t o t his) Most pr ogr ams j ust need t o comput e some pseudocode pseudocode values f r om ot her values. Any or der is ok. (execut ion or der ) (execut ion or der ) t uned C++ t uned C++ for width from 2 to n for width from 2 to n implement at ion implement at ion Some progr ams also need t o updat e t he for i from 0 to n-width for i from 0 to n-width (dat a st r uct ur es, et c.) (dat a st r uct ur es, et c.) out put s if t he input s change: k = i+width k = i+width � spr eadsheet s, makef iles, email r eader s for j from i+1 to k-1 for j from i+1 to k-1 � dynamic gr aph algor it hms … … � EM and ot her it er at ive opt imizat ion � leave-one-out t r aining of smoot hing par ams 11 12 2

  3. Writing equations in Dyna More interesting use of patterns � a = b * c. � int a. spar se dot pr oduct of quer y & document � a = b * c. � scalar multiplication ... + b(“yetis”)*c(“yetis”) a will be kept up to date if b or c changes. � a(I) = b(I) * c(I). + b(“zebra”)*c(“zebra”) � b += x. � pointwise multiplication b += y. equivalent to b = x+y. � a += b(I) * c(I). means a = b(I)*c(I) ∑ b is a sum of two variables. Also kept up to date. I � dot product; could be sparse � c += z(1). a “pat t er n” c += z(2). c += z(N). t he capit alized N c += z(3). � a(I,K) += b(I,J) * c(J,K). ∑ b(I,J)*c(J,K) mat ches anyt hing J c += z(“four”). � matrix multiplication; could be sparse c is a sum of all c += z(foo(bar,5)). nonzero z(…) values. � J is free on the right-hand side, so we sum over it At compile time, we don’t know how many! 13 14 Dyna vs. Prolog The CKY inside algorithm in Dyna :- double item = 0. By now you may see what we’re up to! :- bool length = false. Prolog has Horn clauses: constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). a(I,K) :- b(I,J) , c(J,K). goal += constit(“s”,0,N) if length(N). Dyna has “Horn equations”: using namespace cky; a(I,K) += b(I,J) * c(J,K). chart c; def init ion f rom ot her values has a value put in axioms c[rewrite(“s”,”np”,”vp”)] = 0.7; (values not e.g., a real number c[word(“Pierre”,0,1)] = 1; def ined by c[length(30)] = true; // 30-word sentence t he above Like Prolog: Unlike Prolog: cin >> c; // get more axioms from stdin pr ogr am) Allow nest ed t er ms Char t s, not backt r acking! t heor em Synt act ic sugar f or list s, et c. Compile � ef f icient C++ classes cout << c[goal]; // print total weight of all parses pops out Tur ing-complet e I nt egr at es wit h your C++ code 15 16 Related algorithms in Dyna? visual debugger – browse the proof forest constit(X,I,J) += word(W,I,J) * rewrite(X,W). ambiguity constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). goal += constit(“s”,0,N) if length(N). � Viterbi parsing? shared substructure � Logarithmic domain? � Lattice parsing? � Earley’s algorithm? � Binarized CKY? � Incremental (left-to-right) parsing? � Log-linear parsing? � Lexicalized or synchronous parsing? 17 18 3

  4. Related algorithms in Dyna? Related algorithms in Dyna? constit(X,I,J) max= += word(W,I,J) * rewrite(X,W). constit(X,I,J) log+= max= += word(W,I,J) + * rewrite(X,W). constit(X,I,J) max= += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). constit(X,I,J) max= log+= += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). + + goal += constit(“s”,0,N) if length(N). goal += constit(“s”,0,N) if length(N). max= log+= max= � Viterbi parsing? � Viterbi parsing? � Logarithmic domain? � Logarithmic domain? � Lattice parsing? � Lattice parsing? � Earley’s algorithm? � Earley’s algorithm? � Binarized CKY? � Binarized CKY? � Incremental (left-to-right) parsing? � Incremental (left-to-right) parsing? � Log-linear parsing? � Log-linear parsing? � Lexicalized or synchronous parsing? � Lexicalized or synchronous parsing? 19 20 Related algorithms in Dyna? Related algorithms in Dyna? constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += word(W,I,J) * rewrite(X,W). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). goal += constit(“s”,0,N) if length(N). goal += constit(“s”,0,N) if length(N). � Viterbi parsing? � Viterbi parsing? � Logarithmic domain? � Logarithmic domain? � Lattice parsing? � Lattice parsing? c[ word(“Pierre”, 0, 1) ] = 1 state(5) state(9) 0.2 � Earley’s algorithm? � Earley’s algorithm? air/0.3 � Binarized CKY? � Binarized CKY? P/0.5 8 9 2 0 . � Incremental (left-to-right) parsing? / � Incremental (left-to-right) parsing? e r e r i P � Log-linear parsing? � Log-linear parsing? 5 � Lexicalized or synchronous parsing? � Lexicalized or synchronous parsing? 21 22 Earley’s algorithm in Dyna Program transformations cool model constit(X,I,J) += word(W,I,J) * rewrite(X,W). pr act ical equat ions PCFG constit(X,I,J) += constit(Y,I,Mid) * constit(Z,Mid,J) * rewrite(X,Y,Z). β β ( i , j ) ( j , k ) ( ) β = y z , ( ) goal += constit(“s”,0,N) if length(N). i k 0 ∑ → x p N N N | N ≤ < < ≤ i j k n x y z x magic templates transformation ... Lots of equivalent ways to write (as noted by Minnen 1996) need(“s”,0) = true. a syst em of equat ions! need(Nonterm,J) |= ?constit(_/[Nonterm|_],_,J). Transf orming f rom one to another may pseudocode constit(Nonterm/Needed,I,I) improve ef f iciency. (execut ion or der ) += rewrite(Nonterm,Needed) if need(Nonterm,I). t uned C++ for width from 2 to n constit(Nonterm/Needed,I,K) implement at ion (Or, transf orm to related equations that compute for i from 0 to n-width += constit(Nonterm/[W|Needed],I,J) * word(W,J,K). (dat a st r uct ur es, et c.) gradients, upper bounds, etc. ) k = i+width constit(Nonterm/Needed,I,K) for j from i+1 to k-1 … Many parsing “tricks” can be generalized into += constit(Nonterm/[X|Needed],I,J) * constit(X/[],J,K). automatic transf ormations that help other programs, too! goal += constit(“s”/[],0,N) if length(N). 23 24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend