random generation of nondeterministic tree automata
play

Random Generation of Nondeterministic Tree Automata Thomas - PowerPoint PPT Presentation

Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart,


  1. Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Hanoi, Vietnam (TTATT 2013) A. Maletti Random Generation of NTA October 19, 2013

  2. Outline Motivation Nondeterministic Tree Automata Random Generation Analysis A. Maletti Random Generation of NTA October 19, 2013

  3. Tree Substitution Grammar with Latent Variables Experiment [S HINDO et al., ACL 2012 best paper] F1 score grammar | w | ≤ 40 full CFG = LTL 62.7 TSG [P OST , G ILDEA , 2009] = xLTL 82.6 TSG [C OHN et al., 2010] = xLTL 85.4 84.7 CFGlv [C OLLINS , 1999] = NTA 88.6 88.2 CFGlv [P ETROV , K LEIN , 2007] = NTA 90.6 90.1 CFGlv [P ETROV , 2010] = NTA 91.8 TSGlv (single) = RTG 91.6 91.1 TSGlv (multiple) = RTG 92.9 92.4 Discriminative Parsers C ARRERAS et al., 2008 91.1 C HARNIAK , J OHNSON , 2005 92.0 91.4 H UANG , 2008 92.3 91.7 A. Maletti Random Generation of NTA October 19, 2013

  4. Tree Substitution Grammar with Latent Variables Experiment [S HINDO et al., ACL 2012 best paper] F1 score grammar | w | ≤ 40 full CFG = LTL 62.7 TSG [P OST , G ILDEA , 2009] = xLTL 82.6 TSG [C OHN et al., 2010] = xLTL 85.4 84.7 CFGlv [C OLLINS , 1999] = NTA 88.6 88.2 CFGlv [P ETROV , K LEIN , 2007] = NTA 90.6 90.1 CFGlv [P ETROV , 2010] = NTA 91.8 TSGlv (single) = RTG 91.6 91.1 TSGlv (multiple) = RTG 92.9 92.4 Discriminative Parsers C ARRERAS et al., 2008 91.1 C HARNIAK , J OHNSON , 2005 92.0 91.4 H UANG , 2008 92.3 91.7 A. Maletti Random Generation of NTA October 19, 2013

  5. Berkeley Parser Example parse S NP VP DT VBZ NP DT JJ NN This is a silly sentence from http://tomato.banatao.berkeley.edu:8080/parser/parser.html A. Maletti Random Generation of NTA October 19, 2013

  6. Berkeley Parser Example productions 0 . 0035453455987323125 · 10 0 S-1 → ADJP-2 S-1 2 . 108608433271444 · 10 − 6 S-1 → ADJP-1 S-1 1 . 6367163259885093 · 10 − 4 S-1 → VP-5 VP-3 9 . 724998692152419 · 10 − 8 S-2 → VP-5 VP-3 1 . 0686659961009547 · 10 − 5 S-1 → PP-7 VP-0 0 . 012551243773149695 · 10 0 S-9 → “ NP-3 Formalism Berkeley parser = CFG (local tree grammar) + relabeling (+ weights) A. Maletti Random Generation of NTA October 19, 2013

  7. Typical NTA Sizes ◮ English B ERKELEY parser grammar 153 MB (1,133 states and 4,267,277 transitions) ◮ English EG RET parser grammar 107 MB ◮ Chinese EG RET parser grammar 98 MB EG RET = H UI Z HANG ’s C++ reimplementation of the B ERKELEY parser (Java) A. Maletti Random Generation of NTA October 19, 2013

  8. Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms A. Maletti Random Generation of NTA October 19, 2013

  9. Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data A. Maletti Random Generation of NTA October 19, 2013

  10. Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data Testing on random NTA ◮ straightforward to implement ◮ straightforward to scale A. Maletti Random Generation of NTA October 19, 2013

  11. Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data Testing on random NTA ◮ straightforward to implement ◮ straightforward to scale ◮ but what is the significance of the results? A. Maletti Random Generation of NTA October 19, 2013

  12. Outline Motivation Nondeterministic Tree Automata Random Generation Analysis A. Maletti Random Generation of NTA October 19, 2013

  13. Tree automaton Definition (T HATCHER AND W RIGHT , 1965) A tree automaton is a tuple A = ( Q , Σ , I , R ) with ◮ alphabet Q states ◮ ranked alphabet Σ terminals ◮ I ⊆ Q final states ◮ finite set R ⊆ Σ( Q ) × Q rules Remark Instead of ( ℓ, q ) we write ℓ → q A. Maletti Random Generation of NTA October 19, 2013

  14. Regular Tree Grammar Example ◮ Q = { q 0 , q 1 , q 2 , q 3 , q 4 , q 5 , q 6 } ◮ Σ = { VP , S , . . . } ◮ F = { q 0 } ◮ and the following rules: VP S S → q 4 → q 0 → q 0 q 5 q 1 q 3 q 1 q 4 q 6 q 2 A. Maletti Random Generation of NTA October 19, 2013

  15. Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ ∈ T Σ ( Q ) ξ ⇒ A ζ if there exist position w ∈ pos ( ξ ) and rule ℓ → q ∈ R ◮ ξ = ξ [ ℓ ] w ◮ ζ = ξ [ q ] w A. Maletti Random Generation of NTA October 19, 2013

  16. Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ ∈ T Σ ( Q ) ξ ⇒ A ζ if there exist position w ∈ pos ( ξ ) and rule ℓ → q ∈ R ◮ ξ = ξ [ ℓ ] w ◮ ζ = ξ [ q ] w Definition (Recognized tree language) L ( A ) = { t ∈ T Σ | ∃ f ∈ F : t ⇒ ∗ A f } A. Maletti Random Generation of NTA October 19, 2013

  17. Outline Motivation Nondeterministic Tree Automata Random Generation Analysis A. Maletti Random Generation of NTA October 19, 2013

  18. Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) A. Maletti Random Generation of NTA October 19, 2013

  19. Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) ◮ focus on generating automata uniformly at random (for estimating average-case complexity) A. Maletti Random Generation of NTA October 19, 2013

  20. Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) ◮ focus on generating automata uniformly at random (for estimating average-case complexity) ◮ generator used for evaluation of conversion from det. TWA to NTA A. Maletti Random Generation of NTA October 19, 2013

  21. Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints A. Maletti Random Generation of NTA October 19, 2013

  22. Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints ◮ focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) A. Maletti Random Generation of NTA October 19, 2013

  23. Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints ◮ focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) ◮ generator used for evaluation of emptiness checker A. Maletti Random Generation of NTA October 19, 2013

  24. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms A. Maletti Random Generation of NTA October 19, 2013

  25. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules A. Maletti Random Generation of NTA October 19, 2013

  26. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules A. Maletti Random Generation of NTA October 19, 2013

  27. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules ◮ its language contains large trees A. Maletti Random Generation of NTA October 19, 2013

  28. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules ◮ its language contains large trees A. Maletti Random Generation of NTA October 19, 2013

  29. Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules ◮ its language contains large trees ◮ its language has many M YHILL -N ERODE congruence classes → canonical NTA has many states (canonical NTA = equivalent minimal deterministic NTA) A. Maletti Random Generation of NTA October 19, 2013

  30. Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) A. Maletti Random Generation of NTA October 19, 2013

  31. Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) ◮ each state is final with probability . 5 A. Maletti Random Generation of NTA October 19, 2013

  32. Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) ◮ each state is final with probability . 5 ◮ uniform probability for binary/nullary rules A. Maletti Random Generation of NTA October 19, 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend