Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - - PDF document

estimating strictly piecewise distributions
SMART_READER_LITE
LIVE PREVIEW

Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of - - PDF document

ACL2010Heinz and Rogers 1 Estimating Strictly Piecewise Distributions Jeffery Heinz Dept. of Linguistics and Cognitive Science University of Delaware Slide 1 heinz@udel.edu James Rogers Dept. of Computer Science Earlham College


slide-1
SLIDE 1

ACL2010—Heinz and Rogers 1 Slide 1

Estimating Strictly Piecewise Distributions

Jeffery Heinz

  • Dept. of Linguistics and Cognitive Science

University of Delaware heinz@udel.edu

James Rogers

  • Dept. of Computer Science

Earlham College jrogers@cs.earlham.edu http://cs.earlham.edu/~jrogers/slides/acl2010talk.ho.pdf Slide 2

Regular Models of Long-Distance Dependencies

“. . . we wish to escape the linear tyranny of these n-gram models and HMM tagging models, and to start to explore more complex notions of grammar.”

—Manning and Sch¨ utze, 1999

Samala (Chumash): [-anterior] (e.g., [s], [> ts]) do not occur after [+anterior] (e.g., [S], [> tS]) [StojonowonowaS] ‘it stood upright’ *[Stojonowonowas] Σ∗ · ([S] + [> tS]) · Σ∗ · ([s] + [> ts]) · Σ∗

slide-2
SLIDE 2

ACL2010—Heinz and Rogers 2 Slide 3

n-gram Models of Language

♯ b a F c a ♯ b a a b b ♯ ♯ ♯ c c c b c a

0.3 0.1 0.4 0.3 0.0 0.4 0.2 0.2 0.5 0.4 0.0 0.0 0.5 0.5 0.2 0.0

PrL(σ1 · · · σn) = PrL(σ1 | ♯) ·

  • 1<i≤n

[PrL(σi | σi − 1)] · PrL(♯ | σn) Fk(w) def = {v ∈ Σk | w ∈ Σ∗ · v · Σ∗} F M

k (w) def

= { {v ∈ Σk | w ∈ Σ∗ · v · Σ∗} } PrL(w) =

  • v·σ∈F M

k (♯·w·♯)

[PrL(σ | v)] Slide 4

Strictly k-Local Languages (SLk)

a ♯ b c F ♯ ♯ c a b a a b b c b a ♯ c ♯ c

TM def = {vσ ∈ Fk(♯ · Σ∗ · ♯) | δ(v, σ)↓} L(M) = {w ∈ Σ∗ | Fk(w) ⊆ TM} L ∈ SLk def ⇐ ⇒ L is L(M) for some k-scanner M L ∈ SL def ⇐ ⇒ (∃k)[L ∈ SLk]

slide-3
SLIDE 3

ACL2010—Heinz and Rogers 3 Slide 5

Subsequences

v is a subsequence of w: v ⊑ w def ⇐ ⇒ v = σ1 · · · σk and w ∈ Σ∗ · σ1 · Σ∗ · · · Σ∗ · σk · Σ∗ Pk(w) def = {v ∈ Σk | v ⊑ w} P≤k(w) def =

  • 0<i≤k

[Pi(w)] P M

k (w) def

= { {v ⊑ w} } Would like: PrL(w) =

  • v·σ∈P M

≤k(w)

[PrL(σ | v)] Slide 6

Initial Model

{ε} {ε, b} {ε, a} {ε, c} {ε, a, b} {ε, a, c} {ε, b, c} {ε, a, b, c} a b c a b c a b c a c b a b c c b a b c a b a c 0.0 0.0 0.2 0.5 0.2 0.1 0.1 0.3 0.2 0.4 0.2 0.0 0.3 0.4 0.4 0.0 0.3 0.0 0.5 0.5 0.5 0.2 0.3 0.2 0.2 0.4 0.3 0.1 0.3 0.3 0.2 0.2

Q = P(P≤k(Σ∗)) Let w = v · σ · u, q = ˆ δ({ε}, v): T (q, σ) = PrL(σ | P≤k(v) = q)

slide-4
SLIDE 4

ACL2010—Heinz and Rogers 4 Slide 7

PT-Automata

{ε} {ε, b} {ε, a} {ε, c} {ε, a, b} {ε, a, c} {ε, b, c} {ε, a, b, c} c a b b a c a a b b c c a b c a b c a a c b c b

Slide 8

Piecewise-Testable Languages (PT)

SI(w) def = {v ∈ Σ∗ | w ⊑ v} L is Piecewise Testable def ⇐ ⇒ L is a finite Boolean combination of principal shuffle ideals. Pk-expressions Atoms v ∈ P≤k(Σ∗) w | = v def ⇐ ⇒ w ∈ SI(v) (i.e., v ⊑ w) Operators Truth functional connectives L ∈ PTk ⇔ L = {w ∈ Σ∗ | w | = ϕ} for some Pk-expression ϕ

slide-5
SLIDE 5

ACL2010—Heinz and Rogers 5 Slide 9

PT-Automata and Pk-expressions

{ε} {ε, b} {ε, a} {ε, c} {ε, a, b} {ε, a, c} {ε, b, c} {ε, a, b, c} c a b b a c a a b b c c a b c a b c a a c b c b

Fϕ = {q ∈ P(P≤k(Σ∗)) | (

  • s∈q

[s] ∧

  • s∈q

[¬s]) → ϕ} L(Mϕ) = {w ∈ Σ∗ | w | = φ} Slide 10

Subregular Hierarchies SL SP LT PT LTT SF FO MSO Prop Reg Fin +1 <

slide-6
SLIDE 6

ACL2010—Heinz and Rogers 6 Slide 11

Strictly Piecewise Testable Languages (SP)

The following are equivalent:

  • 1. L ∈ SP
  • 2. L is the set of strings satisfying a finite conjunction of negative

Pk-literals.

  • 3. L =

w∈S[SI(w)], S finite,

  • 4. (∃k)[P≤k(w) ⊆ P≤k(L) ⇒ w ∈ L],
  • 5. w ∈ L and v ⊑ w ⇒ v ∈ L (L is subsequence closed),
  • 6. L = SI(X), X ⊆ Σ∗ (L is the complement of a shuffle ideal).

Slide 12

DFA representation of SPk languages

Let M be a trimmed minimal DFA recognizing an SPk language. Then:

  • 1. All states of M are accepting states.
  • 2. If δ(q, σ)↑ then there is some s ∈ P≤k({w | ˆ

δ(q0, w) = q}) such that for all q′ ∈ Q s ∈ P≤k({w | ˆ δ(q0, w) = q′}) ⇒ δ(q, σ)↑ Consequently, for all q1, q2 ∈ Q and σ ∈ Σ, if δ(q1, σ)↑ and ˆ δ(q1, w) = q2 for some w ∈ Σ∗ then δ(q2, σ)↑. (Missing edges propagate down.)

slide-7
SLIDE 7

ACL2010—Heinz and Rogers 7 Slide 13

SPk-automata

{ε} {ε, b} {ε, a} {ε, c} {ε, a, b} {ε, a, c} {ε, b, c} {ε, a, b, c} c a b b a c a a b b c c a b c a b c a a c b c b

Q = P(P≤k−1(Σ∗)) Size of automaton: Θ(2card(Σ)k) Slide 14

Factored SPk-automata

SI(aa) SI(bc) ε a ε a b c a b a c a c b a c b

slide-8
SLIDE 8

ACL2010—Heinz and Rogers 8 Slide 15

SP-PDFA

ε a ε ε a aa ε ε a ab ε b ba ε b bb a b b a b a a b b b a b a b a a b b a a b b a b b b a b a b b a b b a

Slide 16

Product PDFAs

Co-emission Probability CT(σ, q1 . . . qn) = Πn

i=1Ti(qi, σ)

CF(q1 . . . qn) = Πn

i=1Fi(qi)

Z(q1 . . . qn) = CF(q1 . . . qn) +

  • σ∈Σ

CT(σ, q1 . . . qn) F(q1 . . . qn) = CF(q1 . . . qn) Z(q1 . . . qn) T (q1 . . . qn, σ) = CT(σ, q1 . . . qn) Z(q1 . . . qn)

slide-9
SLIDE 9

ACL2010—Heinz and Rogers 9 Slide 17

Product PDFAs—k-sets

Positive Co-emission Probability PCT(σ, qǫ . . . qu) =

  • qw∈qǫ...qu

qw=w

Tw(qw, σ) PCF(qǫ . . . qu) =

  • qw∈qǫ...qu

qw=w

Fw(qw) Z(q1 . . . qn) = PCF(q1 . . . qn) +

  • σ∈Σ

PCT(σ, q1 . . . qn) Let q = ǫ, ǫ, b, aa, a, ba, b: CT(a, q) = Tǫ(ǫ, a) · Ta(ǫ, a) · Tb(b, a) · Taa(aa, a) · Tab(a, a) · Tba(ba, a) · Tbb(b, a) PCT(a, q) = Tǫ(ǫ, a) · Tb(b, a) · Taa(aa, a) · Tba(ba, a) Slide 18

Complexity

Number of automata:

  • 0≤i<k

[card(Σ)i] = Θ(card(Σ)k−1) Number of states:

  • 0≤i<k

[(i + 1) card(Σ)i] = Θ(k card(Σ)k−1) ML estimation n =

w∈S[|w|]—size of corpus

Θ(n card(Σ)k−1) (v.s. Θ(n)) PrL(w) Θ(n card(Σ)k−1) (v.s. Θ(n)) Parameters Only final states matter card(Σ)Θ(card(Σ)k−1) = Θ(card(Σ)k) (Same)

slide-10
SLIDE 10

ACL2010—Heinz and Rogers 10 Slide 19

Remaining issues

  • Estimation undercounts

– counts number of k-sequences that start with first prefix—Θ(n) – actual number n k

  • ∈ Θ(2n).
  • Want probability to depend on multiset of subsequences

– infinitely many states – but probability of n occurrences is (probability of occurrence)n – same number of parameters/still linear time

  • Not Regular distribution

– Not clear that there is a corresponding class of distributions

  • ver strings

Slide 20

Summary

SP-Distributions

  • Regular distribution

Model (some) long distance dependencies

  • Asymptotic complexity same as SL-distributions (n-gram

models)

  • SL-distributions can’t model long distance dependencies

SP-distributions can’t model local ones

  • Both are classes of Regular distributions

Combination is straightforward

slide-11
SLIDE 11

ACL2010—Heinz and Rogers 11 Slide 21

Results of SP2 estimation on the Samala corpus

x Pr(x | P≤1(y)) s > ts S > tS s 0.0325 0.0051 0.0013 0.0002 ⁀ ts 0.0212 0.0114 0.0008 0. y S 0.0011 0. 0.067 0.0359 > tS 0.0006 0. 0.0458 0.0314

slide-12
SLIDE 12

ACL2010—Heinz and Rogers 12

References

Applegate, R.B. 1972. Inese˜ no chumash grammar. Doctoral Dissertation, University of California, Berkeley. Applegate, R.B. 2007. Samala-english dictionary : a guide to the samala language of the inese˜ no chumash people. Santa Ynez Band of Chumash Indians. Bakovi´ c, Eric. 2000. Harmony, dominance and control. Doctoral Dissertation, Rutgers University. Beauquier, D., and Jean-Eric Pin. 1991. Languages and scanners. Theoretical Computer Science 84:3–21. Brill, Eric. 1995. Transformation-based error-driven learning and natural language process- ing: A case study in part-of-speech tagging. Computational Linguistics 21:543–566. Brzozowski, J. A., and Imre Simon. 1973. Characterizations of locally testable events. Discrete Mathematics 4:243–271. Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory IT-2. Coleman, J. S., and J. Pierrehumbert. 1997. Stochastic phonological grammars and accept-

  • ability. In Computational Phonology, 49–56. Somerset, NJ: Association for Computational
  • Linguistics. Third Meeting of the ACL Special Interest Group in Computational Phonol-
  • gy.

Garc´ ıa, Pedro, and Jos´ e Ruiz. 1990. Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 9:920–925. Garc´ ıa, Pedro, and Jos´ e Ruiz. 1996. Learning k-piecewise testable languages from positive

  • data. In Grammatical Interference: Learning Syntax from Sentences, ed. Laurent Miclet

and Colin de la Higuera, volume 1147 of Lecture Notes in Computer Science, 203–210. Springer. Garcia, Pedro, Enrique Vidal, and Jos´ e Oncina. 1990. Learning locally testable languages in the strict sense. In Proceedings of the Workshop on Algorithmic Learning Theory, 325–338. Hansson, Gunnar. 2001. Theoretical and typological issues in consonant harmony. Doctoral Dissertation, University of California, Berkeley. Hayes, Bruce, and Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39:379–440. Heinz, Jeffrey. 2007. The inductive learning of phonotactic patterns. Doctoral Dissertation, University of California, Los Angeles. Heinz, Jeffrey. to appear. Learning long distance phonotactics. Linguistic Inquiry . de la Higuera, Colin. in press. Grammatical inference: Learning automata and grammars. Cambridge University Press. Hopcroft, John, Rajeev Motwani, and Jeffrey Ullman. 2001. Introduction to automata the-

  • ry, languages, and computation. Addison-Wesley.

Jelenik, Frederick. 1997. Statistical methods for speech recognition. MIT Press.

slide-13
SLIDE 13

ACL2010—Heinz and Rogers 13 Johnson, C. Douglas. 1972. Formal aspects of phonological description. The Hague: Mouton. Joshi, A. K. 1985. Tree-adjoining grammars: How much context sensitivity is required to provide reasonable structural descriptions? In Natural language parsing, ed. D. Dowty,

  • L. Karttunen, and A. Zwicky, 206–250. Cambridge University Press.

Jurafsky, Daniel, and James Martin. 2008. Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics. Prentice- Hall, 2nd edition. Kaplan, Ronald, and Martin Kay. 1994. Regular models of phonological rule systems. Computational Linguistics 20:331–378. Kobele, Gregory. 2006. Generating copies: An investigation into structural identity in language and grammar. Doctoral Dissertation, University of California, Los Angeles. Kontorovich, Leonid (Aryeh), Corinna Cortes, and Mehryar Mohri. 2008. Ker- nel methods for learning languages. Theoretical Computer Science 405:223 – 236. URL http://www.sciencedirect.com/science/article/B6V1G-4SV5V96-6/2/ 4829c2c454d31fefbb2d5d007596bb45, algorithmic Learning Theory. Lothaire, M., ed. 1997. Combinatorics on words. Cambridge, UK, New York: Cambridge University Press. Markov, A. A. 1913. An example of statistical study on the text of ‘eugene onegin’ illus- trating the linking of events to a chain. McNaughton, Robert, and Simon Papert. 1971. Counter-free automata. MIT Press. Newell, A., S. Langer, and M. Hickey. 1998. The rˆ

  • le of natural language processing in

alternative and augmentative communication. Natural Language Engineering 4:1–16. Perrin, Dominique, and Jean-Eric Pin. 1986. First-Order logic and Star-Free sets. Journal

  • f Computer and System Sciences 32:393–406.

Ringen, Catherine. 1988. Vowel harmony: Theoretical implications. Garland Publishing, Inc. Rogers, James, Jeffrey Heinz, Matt Edlefsen, Dylan Leeman, Nathan Myers, Nathaniel Smith, Molly Visscher, and David Wellcome. to appear. On languages piecewise testable in the strict sense. In Proceedings of the 11th Meeting of the Assocation for Mathematics

  • f Language.

Rogers, James, and Geoffrey Pullum. to appear. Aural pattern recognition experiments and the subregular hierarchy. Journal of Logic, Language and Information . Rose, Sharon, and Rachel Walker. 2004. A typology of consonant agreement as correspon-

  • dence. Language 80:475–531.

Sakarovitch, Jacques, and Imre Simon. 1983. Subwords. In Combinatorics on words, ed.

  • M. Lothaire, volume 17 of Encyclopedia of Mathematics and Its Applications, chapter 6,

105–134. Reading, Massachusetts: Addison-Wesley. Shieber, Stuart. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333–343. Simon, Imre. 1975. Piecewise testable events. In Automata Theory and Formal Languages: 2nd Grammatical Inference conference, 214–222. Berlin ; New York: Springer-Verlag.

slide-14
SLIDE 14

ACL2010—Heinz and Rogers 14 Straubing, Howard. 1994. Finite automata, formal logic and circuit complexity. Birkh¨ auser. Thomas, Wolfgang. 1982. Classifying regular events in symbolic logic. Journal of Computer and Systems Sciences 25:360–376. Vidal, Enrique, Franck Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C.

  • Carrasco. 2005a. Probabilistic finite-state machines-part I. IEEE Transactions on Pattern

Analysis and Machine Intelligence 27:1013–1025. Vidal, Enrique, Frank Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C.

  • Carrasco. 2005b. Probabilistic finite-state machines-part II. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence 27:1026–1039.