Random Generation of Nondeterministic Tree Automata Thomas - PowerPoint PPT Presentation

Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Hanoi, Vietnam (TTATT 2013) A. Maletti Random Generation of NTA October 19, 2013

Outline Motivation Nondeterministic Tree Automata Random Generation Analysis A. Maletti Random Generation of NTA October 19, 2013

Tree Substitution Grammar with Latent Variables Experiment [S HINDO et al., ACL 2012 best paper] F1 score grammar | w | ≤ 40 full CFG = LTL 62.7 TSG [P OST , G ILDEA , 2009] = xLTL 82.6 TSG [C OHN et al., 2010] = xLTL 85.4 84.7 CFGlv [C OLLINS , 1999] = NTA 88.6 88.2 CFGlv [P ETROV , K LEIN , 2007] = NTA 90.6 90.1 CFGlv [P ETROV , 2010] = NTA 91.8 TSGlv (single) = RTG 91.6 91.1 TSGlv (multiple) = RTG 92.9 92.4 Discriminative Parsers C ARRERAS et al., 2008 91.1 C HARNIAK , J OHNSON , 2005 92.0 91.4 H UANG , 2008 92.3 91.7 A. Maletti Random Generation of NTA October 19, 2013

Berkeley Parser Example parse S NP VP DT VBZ NP DT JJ NN This is a silly sentence from http://tomato.banatao.berkeley.edu:8080/parser/parser.html A. Maletti Random Generation of NTA October 19, 2013

Berkeley Parser Example productions 0 . 0035453455987323125 · 10 0 S-1 → ADJP-2 S-1 2 . 108608433271444 · 10 − 6 S-1 → ADJP-1 S-1 1 . 6367163259885093 · 10 − 4 S-1 → VP-5 VP-3 9 . 724998692152419 · 10 − 8 S-2 → VP-5 VP-3 1 . 0686659961009547 · 10 − 5 S-1 → PP-7 VP-0 0 . 012551243773149695 · 10 0 S-9 → “ NP-3 Formalism Berkeley parser = CFG (local tree grammar) + relabeling (+ weights) A. Maletti Random Generation of NTA October 19, 2013

Typical NTA Sizes ◮ English B ERKELEY parser grammar 153 MB (1,133 states and 4,267,277 transitions) ◮ English EG RET parser grammar 107 MB ◮ Chinese EG RET parser grammar 98 MB EG RET = H UI Z HANG ’s C++ reimplementation of the B ERKELEY parser (Java) A. Maletti Random Generation of NTA October 19, 2013

Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms A. Maletti Random Generation of NTA October 19, 2013

Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data A. Maletti Random Generation of NTA October 19, 2013

Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data Testing on random NTA ◮ straightforward to implement ◮ straightforward to scale A. Maletti Random Generation of NTA October 19, 2013

Algorithm testing Observations ◮ even efficient algorithms run slow on such data ◮ often require huge amounts of memory ◮ impossible for inefficient algorithms ◮ realistic, but difficult to use as test data Testing on random NTA ◮ straightforward to implement ◮ straightforward to scale ◮ but what is the significance of the results? A. Maletti Random Generation of NTA October 19, 2013

Tree automaton Definition (T HATCHER AND W RIGHT , 1965) A tree automaton is a tuple A = ( Q , Σ , I , R ) with ◮ alphabet Q states ◮ ranked alphabet Σ terminals ◮ I ⊆ Q final states ◮ finite set R ⊆ Σ( Q ) × Q rules Remark Instead of ( ℓ, q ) we write ℓ → q A. Maletti Random Generation of NTA October 19, 2013

Regular Tree Grammar Example ◮ Q = { q 0 , q 1 , q 2 , q 3 , q 4 , q 5 , q 6 } ◮ Σ = { VP , S , . . . } ◮ F = { q 0 } ◮ and the following rules: VP S S → q 4 → q 0 → q 0 q 5 q 1 q 3 q 1 q 4 q 6 q 2 A. Maletti Random Generation of NTA October 19, 2013

Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ ∈ T Σ ( Q ) ξ ⇒ A ζ if there exist position w ∈ pos ( ξ ) and rule ℓ → q ∈ R ◮ ξ = ξ [ ℓ ] w ◮ ζ = ξ [ q ] w A. Maletti Random Generation of NTA October 19, 2013

Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ ∈ T Σ ( Q ) ξ ⇒ A ζ if there exist position w ∈ pos ( ξ ) and rule ℓ → q ∈ R ◮ ξ = ξ [ ℓ ] w ◮ ζ = ξ [ q ] w Definition (Recognized tree language) L ( A ) = { t ∈ T Σ | ∃ f ∈ F : t ⇒ ∗ A f } A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) ◮ focus on generating automata uniformly at random (for estimating average-case complexity) A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H ÉAM et al. 2009 ◮ for deterministic tree-walking automata (and deterministic top-down tree automata) ◮ focus on generating automata uniformly at random (for estimating average-case complexity) ◮ generator used for evaluation of conversion from det. TWA to NTA A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints ◮ focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) A. Maletti Random Generation of NTA October 19, 2013

Previous Approaches H UGOT et al. 2010 ◮ for tree automata with global equality constraints ◮ focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) ◮ generator used for evaluation of emptiness checker A. Maletti Random Generation of NTA October 19, 2013

Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms A. Maletti Random Generation of NTA October 19, 2013

Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules A. Maletti Random Generation of NTA October 19, 2013

Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules ◮ its language contains large trees A. Maletti Random Generation of NTA October 19, 2013

Our Approach Goals ◮ randomly generate non-trivial NTA ◮ generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? ◮ large number of states ◮ large number of rules ◮ its language contains large trees ◮ its language has many M YHILL -N ERODE congruence classes → canonical NTA has many states (canonical NTA = equivalent minimal deterministic NTA) A. Maletti Random Generation of NTA October 19, 2013

Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) A. Maletti Random Generation of NTA October 19, 2013

Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) ◮ each state is final with probability . 5 A. Maletti Random Generation of NTA October 19, 2013

Our Approach Restrictions ◮ binary trees (all RTL can be such encoded with linear overhead) ◮ each state is final with probability . 5 ◮ uniform probability for binary/nullary rules A. Maletti Random Generation of NTA October 19, 2013

Random Generation of Nondeterministic Tree Automata Thomas - PowerPoint PPT Presentation

Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart,

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

Nondeterministic Finite Automata Nondeterminism gives a machine multiple options for its moves.

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Algorithms for Tree Automata with Constraints Random Generation of Hard Instances of the

Algorithms for Tree Automata with Constraints Random Generation of Hard Instances of the

Nondeterministic Finite Automata CSCI 3130 Formal Languages and Automata Theory Siu On CHAN

Nondeterministic Finite Automata CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall

Collapsing Nondeterministic Automata Ashutosh Bhatia Nitin Rai Sep 12, 2005 FACTS of NFA and

Nondeterministic Finite Automata Nondeterminism Subset Construction 1 Nondeterminism A

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Modular Tree Automata Deriving Modular Recursion Schemes from Tree Automata Patrick Bahr

Applications of Tree Automata Theory Lecture I: Tree Automata Andreas Maletti Institute of

The State Automata Formalism Untimed models of discrete event systems Languages Regular

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

The L2 Impact on the Acquisition of Dutch: The L2 Distance Effect Job Schepens 1, 2 Frans van der

Tuesday 31 st of March Class of 2024 Year 9 GCSE Preferences Meeting WELCOME! **PLEASE TURN

Neologisms Harvesting & Understanding Marcel K oster 06/08/2010 1 / 24 Introduction

Elicitation in linguistic fieldwork or how to capture a speakers view of the world Annika

Information Retrieval and Web Search Salvatore Orlando Bing Liu. Web Data Mining: Exploring

Class 1: Class 1: What is Introduction Introduction Computer Science ? CS1120 Fall 2010

Cliticization of Serbian Personal Pronouns and Auxiliary Verbs A Dependency-Based Account DEPLING

13 January 2017 Overview of Briefing Introduction of Subject Teachers Class Expectations

Random Generation of Nondeterministic Tree Automata Thomas - PowerPoint PPT Presentation

Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart,

3.10: Nondeterministic Finite Automata In this section, we study the second of our more restricted

Nondeterministic Finite Automata Nondeterminism gives a machine multiple options for its moves.

Multiple tree automata a new model of tree automata Gwendal Collet (TU Wien), Julien David (LIPN)

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Algorithms for Tree Automata with Constraints Random Generation of Hard Instances of the

Algorithms for Tree Automata with Constraints Random Generation of Hard Instances of the

Nondeterministic Finite Automata CSCI 3130 Formal Languages and Automata Theory Siu On CHAN

Nondeterministic Finite Automata CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Fall

Collapsing Nondeterministic Automata Ashutosh Bhatia Nitin Rai Sep 12, 2005 FACTS of NFA and

Nondeterministic Finite Automata Nondeterminism Subset Construction 1 Nondeterminism A

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Modular Tree Automata Deriving Modular Recursion Schemes from Tree Automata Patrick Bahr

Applications of Tree Automata Theory Lecture I: Tree Automata Andreas Maletti Institute of

The State Automata Formalism Untimed models of discrete event systems Languages Regular

3.9: Empty-string Finite Automata In this and the following two sections, we will study three

The L2 Impact on the Acquisition of Dutch: The L2 Distance Effect Job Schepens 1, 2 Frans van der

Tuesday 31 st of March Class of 2024 Year 9 GCSE Preferences Meeting WELCOME! **PLEASE TURN

Neologisms Harvesting &amp; Understanding Marcel K oster 06/08/2010 1 / 24 Introduction

Elicitation in linguistic fieldwork or how to capture a speakers view of the world Annika

Information Retrieval and Web Search Salvatore Orlando Bing Liu. Web Data Mining: Exploring

Class 1: Class 1: What is Introduction Introduction Computer Science ? CS1120 Fall 2010

Cliticization of Serbian Personal Pronouns and Auxiliary Verbs A Dependency-Based Account DEPLING

13 January 2017 Overview of Briefing Introduction of Subject Teachers Class Expectations

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Neologisms Harvesting & Understanding Marcel K oster 06/08/2010 1 / 24 Introduction