estimating strictly piecewise distributions
play

Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - PDF document

ACL2010Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College


  1. ACL2010—Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College jrogers@cs.earlham.edu http://cs.earlham.edu/~jrogers/slides/acl2010talk.ho.pdf Regular Models of Long-Distance Dependencies “. . . we wish to escape the linear tyranny of these n -gram models and HMM tagging models, and to start to explore more complex notions of grammar.” —Manning and Sch¨ utze, 1999 Samala (Chumash): Slide 2 ts ]) do not occur after [+anterior] (e.g., [ S ], [ > [-anterior] (e.g., [s], [ > tS ]) [ S tojonowonowa S ] ‘it stood upright’ *[ S tojonowonowa s ] Σ ∗ · ([ S ] + [ > tS ]) · Σ ∗ · ([s] + [ > ts ]) · Σ ∗

  2. ACL2010—Heinz and Rogers 2 n -gram Models of Language 0.4 a a ♯ b 0.1 0.3 a c a 0.2 0.3 0.0 b ♯ b F ♯ 0.2 b 0.4 a 0.2 c 0.5 c 0.4 b 0.5 0.0 0.0 ♯ ♯ c Slide 3 0.0 c 0.5 � Pr L ( σ 1 · · · σ n ) = Pr L ( σ 1 | ♯ ) · [Pr L ( σ i | σ i − 1)] · Pr L ( ♯ | σ n ) 1 <i ≤ n F k ( w ) def = { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } k ( w ) def { v ∈ Σ k | w ∈ Σ ∗ · v · Σ ∗ } F M = { } � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ F M k ( ♯ · w · ♯ ) Strictly k -Local Languages (SL k ) a a b ♯ a c a b ♯ b F ♯ b a c c b ♯ ♯ Slide 4 c c T M def = { vσ ∈ F k ( ♯ · Σ ∗ · ♯ ) | δ ( v, σ ) ↓} L ( M ) = { w ∈ Σ ∗ | F k ( w ) ⊆ T M } L ∈ SL k def ⇐ ⇒ L is L ( M ) for some k -scanner M L ∈ SL def ⇐ ⇒ ( ∃ k )[ L ∈ SL k ]

  3. ACL2010—Heinz and Rogers 3 Subsequences v is a subsequence of w : v ⊑ w def ⇒ v = σ 1 · · · σ k and w ∈ Σ ∗ · σ 1 · Σ ∗ · · · Σ ∗ · σ k · Σ ∗ ⇐ P k ( w ) def P ≤ k ( w ) def = { v ∈ Σ k | v ⊑ w } � = [ P i ( w )] Slide 5 0 <i ≤ k k ( w ) def P M = { { v ⊑ w } } Would like: � Pr L ( w ) = [Pr L ( σ | v )] v · σ ∈ P M ≤ k ( w ) Initial Model 0.1 0.5 a a { ε, b } { ε, a, b } c 0.4 0.2 b b c 0.0 0.1 0.2 0.3 b 0.0 0.2 0.2 b 0.3 0.2 0.3 0.0 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a 0.3 0.3 0.2 0.4 c c Slide 6 0.3 c 0.2 0.4 0.2 c 0.5 0.0 0.2 b c 0.4 a 0.5 b { ε, c } a { ε, a, c } 0.5 c 0.1 0.3 0.0 Q = P ( P ≤ k (Σ ∗ )) Let w = v · σ · u , q = ˆ δ ( { ε } , v ): T ( q, σ ) = Pr L ( σ | P ≤ k ( v ) = q )

  4. ACL2010—Heinz and Rogers 4 PT-Automata a { ε, b } a { ε, a, b } c b b c b b Slide 7 a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a c c c c b c a b a { ε, c } { ε, a, c } c Piecewise-Testable Languages (PT) SI( w ) def = { v ∈ Σ ∗ | w ⊑ v } L is Piecewise Testable def ⇐ ⇒ L is a finite Boolean combination of principal shuffle ideals. Slide 8 P k -expressions Atoms v ∈ P ≤ k (Σ ∗ ) = v def w | ⇐ ⇒ w ∈ SI( v ) (i.e., v ⊑ w ) Operators Truth functional connectives L ∈ PT k ⇔ L = { w ∈ Σ ∗ | w | = ϕ } for some P k -expression ϕ

  5. ACL2010—Heinz and Rogers 5 PT-Automata and P k -expressions a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b a Slide 9 c c c c b c a b a { ε, c } { ε, a, c } c � � F ϕ = { q ∈ P ( P ≤ k (Σ ∗ )) | ( [ s ] ∧ [ ¬ s ]) → ϕ } s ∈ q s �∈ q L ( M ϕ ) = { w ∈ Σ ∗ | w | = φ } Subregular Hierarchies Reg MSO SF FO LTT Slide 10 LT PT Prop SL SP Fin +1 <

  6. ACL2010—Heinz and Rogers 6 Strictly Piecewise Testable Languages (SP) The following are equivalent: 1. L ∈ SP 2. L is the set of strings satisfying a finite conjunction of negative P k -literals. Slide 11 3. L = � w ∈ S [SI( w )] , S finite, 4. ( ∃ k )[ P ≤ k ( w ) ⊆ P ≤ k ( L ) ⇒ w ∈ L ], 5. w ∈ L and v ⊑ w ⇒ v ∈ L ( L is subsequence closed ), 6. L = SI( X ) , X ⊆ Σ ∗ ( L is the complement of a shuffle ideal). DFA representation of SP k languages Let M be a trimmed minimal DFA recognizing an SP k language. Then: 1. All states of M are accepting states. Slide 12 2. If δ ( q, σ ) ↑ then there is some s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q } ) such that for all q ′ ∈ Q s ∈ P ≤ k ( { w | ˆ δ ( q 0 , w ) = q ′ } ) ⇒ δ ( q, σ ) ↑ Consequently, for all q 1 , q 2 ∈ Q and σ ∈ Σ, if δ ( q 1 , σ ) ↑ and δ ( q 1 , w ) = q 2 for some w ∈ Σ ∗ then δ ( q 2 , σ ) ↑ . ˆ (Missing edges propagate down.)

  7. ACL2010—Heinz and Rogers 7 SP k -automata a { ε, b } a { ε, a, b } c b b c b b a b a { ε } { ε, a } { ε, b, c } a { ε, a, b, c } b Slide 13 a c c c c b c a b a { ε, c } { ε, a, c } c Q = P ( P ≤ k − 1 (Σ ∗ )) Size of automaton: Θ(2 card (Σ) k ) Factored SP k -automata b b SI( aa ) ε a a a c c Slide 14 a b SI( bc ) ε a b c a c

  8. ACL2010—Heinz and Rogers 8 SP-PDFA a b ε b a b ε a a a a b ε b b Slide 15 b b a b ε a aa a a b a a b ε a ab a b b b a ε b ba b b a b a a b ε b bb b b Product PDFAs Co-emission Probability CT( � σ, q 1 . . . q n � ) = Π n i =1 T i ( q i , σ ) CF( � q 1 . . . q n � ) = Π n i =1 F i ( q i ) Slide 16 � Z ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) + CT( � σ, q 1 . . . q n � ) σ ∈ Σ F ( � q 1 . . . q n � ) = CF( � q 1 . . . q n � ) Z ( � q 1 . . . q n � ) T ( � q 1 . . . q n � , σ ) = CT( � σ, q 1 . . . q n � ) Z ( � q 1 . . . q n � )

  9. ACL2010—Heinz and Rogers 9 Product PDFAs— k -sets Positive Co-emission Probability � PCT( � σ, q ǫ . . . q u � ) = T w ( q w , σ ) q w ∈� q ǫ ...q u � q w = w � PCF( � q ǫ . . . q u � ) = F w ( q w ) q w ∈� q ǫ ...q u � Slide 17 q w = w � Z ( � q 1 . . . q n � ) = PCF( � q 1 . . . q n � ) + PCT( � σ, q 1 . . . q n � ) σ ∈ Σ Let q = � ǫ, ǫ, b, aa, a, ba, b � : CT( a, q ) = T ǫ ( ǫ, a ) · T a ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ab ( a, a ) · T ba ( ba, a ) · T bb ( b, a ) PCT( a, q ) = T ǫ ( ǫ, a ) · T b ( b, a ) · T aa ( aa, a ) · T ba ( ba, a ) Complexity Number of automata: � [ card (Σ) i ] = Θ( card (Σ) k − 1 ) 0 ≤ i<k Number of states: � [( i + 1) card (Σ) i ] = Θ( k card (Σ) k − 1 ) Slide 18 0 ≤ i<k ML estimation n = � w ∈ S [ | w | ]—size of corpus Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Pr L ( w ) Θ( n card (Σ) k − 1 ) (v.s. Θ( n )) Parameters Only final states matter card (Σ)Θ( card (Σ) k − 1 ) = Θ( card (Σ) k ) ( Same )

  10. ACL2010—Heinz and Rogers 10 Remaining issues • Estimation undercounts – counts number of k -sequences that start with first prefix—Θ( n ) � n � ∈ Θ(2 n ). – actual number k • Want probability to depend on multiset of subsequences Slide 19 – infinitely many states – but probability of n occurrences is (probability of occurrence) n – same number of parameters/still linear time • Not Regular distribution – Not clear that there is a corresponding class of distributions over strings Summary SP-Distributions • Regular distribution Model (some) long distance dependencies • Asymptotic complexity same as SL-distributions ( n -gram Slide 20 models) • SL-distributions can’t model long distance dependencies SP-distributions can’t model local ones • Both are classes of Regular distributions Combination is straightforward

  11. ACL2010—Heinz and Rogers 11 Results of SP 2 estimation on the Samala corpus x Pr ( x | P ≤ 1 ( y )) > > s ts S tS Slide 21 s 0.0325 0.0051 0.0013 0.0002 ⁀ ts 0.0212 0.0114 0.0008 0. y 0.0011 0. 0.067 0.0359 S > tS 0.0006 0. 0.0458 0.0314

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend