Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - PDF document

ACL2010—Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College jrogers@cs.earlham.edu http://cs.earlham.edu/~jrogers/slides/acl2010talk.ho.pdf Regular Models of Long-Distance Dependencies “. . . we wish to escape the linear tyranny of these n -gram models and HMM tagging models, and to start to explore more complex notions of grammar.” —Manning and Sch¨ utze, 1999 Samala (Chumash): Slide 2 ts ]) do not occur after [+anterior] (e.g., [ S ], [ > [-anterior] (e.g., [s], [ > tS ]) [ S tojonowonowa S ] ‘it stood upright’ *[ S tojonowonowa s ] Σ ∗ · ([ S ] + [ > tS ]) · Σ ∗ · ([s] + [ > ts ]) · Σ ∗

ACL2010—Heinz and Rogers 2 n -gram Models of Language 0.4 a a ♯ b 0.1 0.3 a c a 0.2 0.3 0.0 b ♯ b F ♯ 0.2 b 0.4 a 0.2 c 0.5 c 0.4 b 0.5 0.0 0.0 ♯ ♯ c Slide 3 0.0 c 0.5 � Pr L ( σ 1 · · · σ n ) = Pr L ( σ 1 | ♯ ) · [Pr L ( σ i | σ i − 1)] · Pr L ( ♯ | σ n ) 1 <i ≤ n F k ( w ) def = { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } k ( w ) def { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } F M = { } � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ F M k ( ♯ · w · ♯ ) Strictly k -Local Languages (SL k ) a a b ♯ a c a b ♯ b F ♯ b a c c b ♯ ♯ Slide 4 c c T M def = { vσ ∈ F k ( ♯ · Σ ∗ · ♯ ) | δ ( v, σ ) ↓} L ( M ) = { w ∈ Σ ∗ | F k ( w ) ⊆ T M } L ∈ SL k def ⇐ ⇒ L is L ( M ) for some k -scanner M L ∈ SL def ⇐ ⇒ ( ∃ k )[ L ∈ SL k ]

ACL2010—Heinz and Rogers 3 Subsequences v is a subsequence of w : v ⊑ w def ⇒ v = σ 1 · · · σ k and w ∈ Σ ∗ · σ 1 · Σ ∗ · · · Σ ∗ · σ k · Σ ∗ ⇐ P k ( w ) def P ≤ k ( w ) def = { v ∈ Σ k | v ⊑ w } � = [ P i ( w )] Slide 5 0 <i ≤ k k ( w ) def P M = { { v ⊑ w } } Would like: � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ P M ≤ k ( w ) Initial Model 0.1 0.5 a a { ε, b } { ε, a, b } c 0.4 0.2 b b c 0.0 0.1 0.2 0.3 b 0.0 0.2 0.2 b 0.3 0.2 0.3 0.0 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a 0.3 0.3 0.2 0.4 c c Slide 6 0.3 c 0.2 0.4 0.2 c 0.5 0.0 0.2 b c 0.4 a 0.5 b { ε, c } a { ε, a, c } 0.5 c 0.1 0.3 0.0 Q = P ( P ≤ k (Σ ∗ )) Let w = v · σ · u , q = ˆ δ ( { ε } , v ): T ( q, σ ) = Pr L ( σ | P ≤ k ( v ) = q )

ACL2010—Heinz and Rogers 4 PT-Automata a { ε, b } a { ε, a, b } c b b c b b Slide 7 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a c c c c b c a b a { ε, c } { ε, a, c } c Piecewise-Testable Languages (PT) SI( w ) def = { v ∈ Σ ∗ | w ⊑ v } L is Piecewise Testable def ⇐ ⇒ L is a finite Boolean combination of principal shuffle ideals. Slide 8 P k -expressions Atoms v ∈ P ≤ k (Σ ∗ ) = v def w | ⇐ ⇒ w ∈ SI( v ) (i.e., v ⊑ w ) Operators Truth functional connectives L ∈ PT k ⇔ L = { w ∈ Σ ∗ | w | = ϕ } for some P k -expression ϕ

ACL2010—Heinz and Rogers 5 PT-Automata and P k -expressions a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a Slide 9 c c c c b c a b a { ε, c } { ε, a, c } c � � F ϕ = { q ∈ P ( P ≤ k (Σ ∗ )) | ( [ s ] ∧ [ ¬ s ]) → ϕ } s ∈ q s �∈ q L ( M ϕ ) = { w ∈ Σ ∗ | w | = φ } Subregular Hierarchies Reg MSO SF FO LTT Slide 10 LT PT Prop SL SP Fin +1 <

ACL2010—Heinz and Rogers 6 Strictly Piecewise Testable Languages (SP) The following are equivalent: 1. L ∈ SP 2. L is the set of strings satisfying a finite conjunction of negative P k -literals. Slide 11 3. L = � w ∈ S [SI( w )] , S finite, 4. ( ∃ k )[ P ≤ k ( w ) ⊆ P ≤ k ( L ) ⇒ w ∈ L ], 5. w ∈ L and v ⊑ w ⇒ v ∈ L ( L is subsequence closed ), 6. L = SI( X ) , X ⊆ Σ ∗ ( L is the complement of a shuffle ideal). DFA representation of SP k languages Let M be a trimmed minimal DFA recognizing an SP k language. Then: 1. All states of M are accepting states. Slide 12 2. If δ ( q, σ ) ↑ then there is some s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q } ) such that for all q ′ ∈ Q s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q ′ } ) ⇒ δ ( q, σ ) ↑ Consequently, for all q 1 , q 2 ∈ Q and σ ∈ Σ, if δ ( q 1 , σ ) ↑ and δ ( q 1 , w ) = q 2 for some w ∈ Σ ∗ then δ ( q 2 , σ ) ↑ . ˆ (Missing edges propagate down.)

ACL2010—Heinz and Rogers 7 SP k -automata a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b Slide 13 a c c c c b c a b a { ε, c } { ε, a, c } c Q = P ( P ≤ k − 1 (Σ ∗ )) Size of automaton: Θ(2 card (Σ) k ) Factored SP k -automata b b SI( aa ) ε a a a c c Slide 14 a b SI( bc ) ε a b c a c

ACL2010—Heinz and Rogers 8 SP-PDFA a b ε b a b ε a a a a b ε b b Slide 15 b b a b ε a aa a a b a a b ε a ab a b b b a ε b ba b b a b a a b ε b bb b b Product PDFAs Co-emission Probability CT( � σ, q 1 . . . q n � ) = Π n i =1 T i ( q i , σ ) CF( � q 1 . . . q n � ) = Π n i =1 F i ( q i ) Slide 16 � Z ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) + CT( � σ, q 1 . . . q n � ) σ ∈ Σ F ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) Z ( � q 1 . . . q n � ) T ( � q 1 . . . q n � , σ ) = CT( � σ, q 1 . . . q n � ) Z ( � q 1 . . . q n � )

ACL2010—Heinz and Rogers 9 Product PDFAs— k -sets Positive Co-emission Probability � PCT( � σ, q ǫ . . . q u � ) = T w ( q w , σ ) q w ∈� q ǫ ...q u � q w = w � PCF( � q ǫ . . . q u � ) = F w ( q w ) q w ∈� q ǫ ...q u � Slide 17 q w = w � Z ( � q 1 . . . q n � ) = PCF( � q 1 . . . q n � ) + PCT( � σ, q 1 . . . q n � ) σ ∈ Σ Let q = � ǫ, ǫ, b, aa, a, ba, b � : CT( a, q ) = T ǫ ( ǫ, a ) · T a ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ab ( a, a ) · T ba ( ba, a ) · T bb ( b, a ) PCT( a, q ) = T ǫ ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ba ( ba, a ) Complexity Number of automata: � [ card (Σ) i ] = Θ( card (Σ) k − 1 ) 0 ≤ i<k Number of states: � [( i + 1) card (Σ) i ] = Θ( k card (Σ) k − 1 ) Slide 18 0 ≤ i<k ML estimation n = � w ∈ S [ | w | ]—size of corpus Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Pr L ( w ) Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Parameters Only final states matter card (Σ)Θ( card (Σ) k − 1 ) = Θ( card (Σ) k ) ( Same )

ACL2010—Heinz and Rogers 10 Remaining issues • Estimation undercounts – counts number of k -sequences that start with first prefix—Θ( n ) � n � ∈ Θ(2 n ). – actual number k • Want probability to depend on multiset of subsequences Slide 19 – infinitely many states – but probability of n occurrences is (probability of occurrence) n – same number of parameters/still linear time • Not Regular distribution – Not clear that there is a corresponding class of distributions over strings Summary SP-Distributions • Regular distribution Model (some) long distance dependencies • Asymptotic complexity same as SL-distributions ( n -gram Slide 20 models) • SL-distributions can’t model long distance dependencies SP-distributions can’t model local ones • Both are classes of Regular distributions Combination is straightforward

ACL2010—Heinz and Rogers 11 Results of SP 2 estimation on the Samala corpus x Pr ( x | P ≤ 1 ( y )) > > s ts S tS Slide 21 s 0.0325 0.0051 0.0013 0.0002 ⁀ ts 0.0212 0.0114 0.0008 0. y 0.0011 0. 0.067 0.0359 S > tS 0.0006 0. 0.0458 0.0314

Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - PDF document

ACL2010Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan

Piecewise Isometries and Piecewise Contractions in Electronic Engineering Jonathan Deane

Reeb Graphs and Piecewise Linear Functions Koen Klaren Eindhoven University of Technology

Piecewise w -Noetherian domains and their applications Gyu Whan Chang - Incheon National

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Which Distributions (or Families of Continuous . . . Distributions) Best Represent Example:

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Tagger Comparison (Gao, Johnson) John Wieting CS 598 Unsupervised POS tagging Predict the

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

EXO-200 Mike Jewell Stanford University NorCal HEP-EXchange December 2 nd , 2017 Neutrinoless

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference June 26-27, 2007

Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of

Strong Gravitational Lensing and ML: generative models for galaxies Adam Coogan Dark Machines

Introduction to Gaussian Processes Neil D. Lawrence GPMC 6th February 2017 Book Rasmussen and

Bayesian networks Petr Pok Czech Technical University in Prague Faculty of Electrical

Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - PDF document

ACL2010Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Estimating Estimating Covariance . . . Statistical Characteristics Estimating . . . Proof of

Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan

Piecewise Isometries and Piecewise Contractions in Electronic Engineering Jonathan Deane

Reeb Graphs and Piecewise Linear Functions Koen Klaren Eindhoven University of Technology

Piecewise w -Noetherian domains and their applications Gyu Whan Chang - Incheon National

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Planning III-A: Planning III-A: Estimating Software Size - Estimating Software Size -

Estimating Frequency Moments Estimating F 0 Algorithm Correctness Further Anil Maheshwari

Estimating Frequency Moments Moments Estimating F 0 Algorithm Correctness Anil Maheshwari

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Which Distributions (or Families of Continuous . . . Distributions) Best Represent Example:

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

Create Distributions Empirically using Excel V0E 10/11/2014 0E 2014 Schield Creating

Tagger Comparison (Gao, Johnson) John Wieting CS 598 Unsupervised POS tagging Predict the

Random Projections Instructor: Sham Kakade 1 The Johnson-Lindenstrauss lemma Theorem 1.1.

EXO-200 Mike Jewell Stanford University NorCal HEP-EXchange December 2 nd , 2017 Neutrinoless

Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference June 26-27, 2007

Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of

Strong Gravitational Lensing and ML: generative models for galaxies Adam Coogan Dark Machines

Introduction to Gaussian Processes Neil D. Lawrence GPMC 6th February 2017 Book Rasmussen and

Bayesian networks Petr Pok Czech Technical University in Prague Faculty of Electrical

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart