Spectral Learning Techniques for Weighted Automata, Transducers, and - PowerPoint PPT Presentation

Applications of WFA WFA Can Model: ➓ Probability distributions f A ♣ x q ✏ P r x s ➓ Binary classifiers g ♣ x q ✏ sign ♣ f A ♣ x q � θ q ➓ Real predictors f A ♣ x q ➓ Sequence predictors g ♣ x q ✏ argmax y f A ♣ x , y q (with Σ ✏ X ✂ Y ) Used In Several Applications: ➓ Speech recognition [Mohri et al., 2008] ➓ Machine translation [de Gispert et al., 2010] ➓ Image processing [Albert and Kari, 2009] ➓ OCR systems [Knight and May, 2009] ➓ System testing [Baier et al., 2009]

Useful Intuitions About f A f A ♣ x q ✏ f A ♣ x 1 . . . x T q ✏ α ❏ 0 A x 1 ☎ ☎ ☎ A x T α ✽ ✏ α ❏ 0 A x α ✽ ➓ Sum-Product: f A ♣ x q is a sum–product computation ✄ T ☛ ➳ ➵ α 0 ♣ i 0 q A x t ♣ i t ✁ 1 , i t q α ✽ ♣ i T q i 0 , i 1 ,..., i T Pr n s t ✏ 1 ➓ Forward-Backward: f A ♣ x q is dot product between forward and backward vectors α ❏ � ✟ f A ♣ ps q ✏ ☎ ♣ A s α ✽ q ✏ α p ☎ β s 0 A p ➓ Compositional Features: f A ♣ x q is a linear model α ❏ � ✟ f A ♣ x q ✏ ☎ α ✽ ✏ φ ♣ x q ☎ α ✽ 0 A x where φ : Σ ✍ Ñ R n compositional features (i.e. φ ♣ xσ q ✏ φ ♣ x q A σ )

➓ r ♣ qs ✏ r � ✏ s r ♣ qs ✏ r ⑤ ✏ s Forward–Backward Equations for A σ Any WFA A defines forward and backward maps α A , β A : Σ ✍ Ñ R n such that for any splitting x ✏ p ☎ s one has α ❏ � ✟ � ✟ f A ♣ x q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ✏ α A ♣ p q ☎ β A ♣ s q

Forward–Backward Equations for A σ Any WFA A defines forward and backward maps α A , β A : Σ ✍ Ñ R n such that for any splitting x ✏ p ☎ s one has α ❏ � ✟ � ✟ f A ♣ x q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ✏ α A ♣ p q ☎ β A ♣ s q Example ➓ In HMM coordinates of α A and β A have probabilistic interpretation: r α A ♣ p qs i ✏ P r p , h � 1 ✏ i s r β A ♣ s qs i ✏ P r s ⑤ h ✏ i s

Forward–Backward Equations for A σ Any WFA A defines forward and backward maps α A , β A : Σ ✍ Ñ R n such that for any splitting x ✏ p ☎ s one has α ❏ � ✟ � ✟ f A ♣ x q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ✏ α A ♣ p q ☎ β A ♣ s q Key Observation Comparing f A ♣ ps q and f A ♣ pσs q reveals information about A σ : f A ♣ ps q ✏ α A ♣ p q ☎ β A ♣ s q f A ♣ pσs q ✏ α A ♣ p q ☎ A σ ☎ β A ♣ s q

Forward–Backward Equations for A σ Any WFA A defines forward and backward maps α A , β A : Σ ✍ Ñ R n such that for any splitting x ✏ p ☎ s one has α ❏ � ✟ � ✟ f A ♣ x q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ✏ α A ♣ p q ☎ β A ♣ s q Key Observation Comparing f A ♣ ps q and f A ♣ pσs q reveals information about A σ : f A ♣ ps q ✏ α A ♣ p q ☎ β A ♣ s q f A ♣ pσs q ✏ α A ♣ p q ☎ A σ ☎ β A ♣ s q Hankel matrices help organize and solve these equations!

♣ q ✏ ⑤ ⑤ ☎☎☎ ✔ ☎ ☎ ☎ ✜ ✖ ✣ ✖ ✣ ✏ ✖ ✣ ✖ ✣ ✖ ✣ ✕ ✢ ♣ q ✏ ♣ q ✏ ♣ q ✏ The Hankel Matrix Two Equivalent Representations ➓ Functional: f : Σ ✍ Ñ R ➓ Matricial: H f P R Σ ✍ ✂ Σ ✍ , the Hankel matrix of f Definition: p prefix, s suffix ñ H f ♣ p , s q ✏ f ♣ p ☎ s q

♣ q ✏ ♣ q ✏ ♣ q ✏ The Hankel Matrix Two Equivalent Representations ➓ Functional: f : Σ ✍ Ñ R ➓ Matricial: H f P R Σ ✍ ✂ Σ ✍ , the Hankel matrix of f Definition: p prefix, s suffix ñ H f ♣ p , s q ✏ f ♣ p ☎ s q Example f ♣ x q ✏ ⑤ x ⑤ a λ a b aa ☎☎☎ ✔ ✜ 0 1 0 2 ☎ ☎ ☎ (number of a ’s in x ) λ 1 2 1 3 a ✖ ✣ ✖ ✣ 0 1 0 2 H f ✏ b ✖ ✣ ✖ ✣ 2 3 2 4 aa ✖ ✣ . ✕ . ✢ ... . . . .

The Hankel Matrix Two Equivalent Representations ➓ Functional: f : Σ ✍ Ñ R ➓ Matricial: H f P R Σ ✍ ✂ Σ ✍ , the Hankel matrix of f Definition: p prefix, s suffix ñ H f ♣ p , s q ✏ f ♣ p ☎ s q Example f ♣ x q ✏ ⑤ x ⑤ a λ a b aa ☎☎☎ ✔ ✜ 0 1 0 2 ☎ ☎ ☎ (number of a ’s in x ) λ 1 2 1 3 a ✖ ✣ ✖ ✣ 0 1 0 2 H f ✏ b ✖ ✣ ✖ ✣ 2 3 2 4 aa ✖ ✣ . ✕ . ✢ ... . . . . H f ♣ λ , aa q ✏ H f ♣ a , a q ✏ H f ♣ aa , λ q ✏ 2

♣ q ✏ ♣ q ✏ ♣ q ✏ The Hankel Matrix Two Equivalent Representations ➓ Functional: f : Σ ✍ Ñ R ➓ Matricial: H f P R Σ ✍ ✂ Σ ✍ , the Hankel matrix of f Definition: p prefix, s suffix ñ H f ♣ p , s q ✏ f ♣ p ☎ s q Properties λ a b aa ☎☎☎ ✔ ✜ 0 1 0 2 ☎ ☎ ☎ λ ➓ ⑤ x ⑤ � 1 entries for f ♣ x q 1 2 1 3 a ✖ ✣ ✖ ✣ 0 1 0 2 ➓ Depends on ordering of Σ ✍ H f ✏ b ✖ ✣ ✖ ✣ 2 3 2 4 aa ✖ ✣ ➓ Captures structure . ✕ . ✢ ... . . . .

A Fundamental Theorem about WFA Relates the rank of H f and the number of states of WFA computing f

A Fundamental Theorem about WFA Theorem [Carlyle and Paz, 1971, Fliess, 1974] Let f : Σ ✍ Ñ R be any function 1. If f ✏ f A for some WFA A with n states ñ rank ♣ H f q ↕ n 2. If rank ♣ H f q ✏ n ñ exists WFA A with n states s.t. f ✏ f A

A Fundamental Theorem about WFA Theorem [Carlyle and Paz, 1971, Fliess, 1974] Let f : Σ ✍ Ñ R be any function 1. If f ✏ f A for some WFA A with n states ñ rank ♣ H f q ↕ n 2. If rank ♣ H f q ✏ n ñ exists WFA A with n states s.t. f ✏ f A Why Fundamental? Because proof of (2) gives an algorithm for recovering A from the Hankel matrix of f A

A Fundamental Theorem about WFA Theorem [Carlyle and Paz, 1971, Fliess, 1974] Let f : Σ ✍ Ñ R be any function 1. If f ✏ f A for some WFA A with n states ñ rank ♣ H f q ↕ n 2. If rank ♣ H f q ✏ n ñ exists WFA A with n states s.t. f ✏ f A Why Fundamental? Because proof of (2) gives an algorithm for recovering A from the Hankel matrix of f A Example: Can recover an HMM from the probabilities it assigns to sequences of observations

♣ q ✏ ♣ ☎q ♣ q ✏ ♣☎ q Structure of Low-rank Hankel Matrices H f A P R Σ ✍ ✂ Σ ✍ P P R Σ ✍ ✂ n S P R n ✂ Σ ✍ s . ✔ ✜ . . ✔ ✜ ☎ ☎ ☎ s . ✖ . ✣ ✔ ✜ . ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✖ ✣ . ✖ ✣ ✏ ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ✖ . ✣ . ✖ ✣ ✕ ✢ ✖ ✣ ✖ ✣ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ p ✕ ✢ ☎ ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ☎ ☎ ☎ ☎ p ✖ ✣ ☎ ☎ ☎ ✕ ✢ . . . α ❏ f A ♣ p 1 ☎ ☎ ☎ p T ☎ s 1 ☎ ☎ ☎ s T ✶ q ✏ 0 A p 1 ☎ ☎ ☎ A p T A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ α A ♣ p q β A ♣ s q

Structure of Low-rank Hankel Matrices H f A P R Σ ✍ ✂ Σ ✍ P P R Σ ✍ ✂ n S P R n ✂ Σ ✍ s . ✔ ✜ . . ✔ ✜ ☎ ☎ ☎ s . ✖ . ✣ ✔ ✜ . ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✖ ✣ . ✖ ✣ ✏ ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ✖ . ✣ . ✖ ✣ ✕ ✢ ✖ ✣ ✖ ✣ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ p ✕ ✢ ☎ ☎ ☎ ☎ ☎ ☎ ✌ ☎ ☎ ☎ ☎ ☎ ☎ p ✖ ✣ ☎ ☎ ☎ ✕ ✢ . . . α ❏ f A ♣ p 1 ☎ ☎ ☎ p T ☎ s 1 ☎ ☎ ☎ s T ✶ q ✏ 0 A p 1 ☎ ☎ ☎ A p T A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ α A ♣ p q β A ♣ s q α A ♣ p q ✏ P ♣ p , ☎q β A ♣ s q ✏ S ♣☎ , s q

♣ q ✏ ♣ q ✏ ✍ ✂ ✍ ✂ ✂ P P P ✔ ✜ ☎ ☎ ☎ ✔ ✜ ✔ ✜ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✏ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✕ ✢ ✕ ✢ ✖ ✣ ✌ ✌ ✌ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✕ ✢ ☎ ☎ ☎ ❏ ✏ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✽ ✶ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ♣ q ♣ q � � ✏ ñ ✏ ñ ✏ Hankel Factorizations and Operators H σ P R Σ ✍ ✂ Σ ✍ s ✔ ✜ ☎ ☎ ✖ ✣ ✖ ✣ ☎ ✖ ✣ ✖ ✣ ☎ ☎ ✌ ☎ ☎ p ✕ ✢ ☎ f A ♣ p 1 ☎ ☎ ☎ p T ☎ σ ☎ s 1 ☎ ☎ ☎ s T ✶ q

♣ q ✏ ♣ q ✏ � � ✏ ñ ✏ ñ ✏ Hankel Factorizations and Operators H σ P R Σ ✍ ✂ Σ ✍ P P R Σ ✍ ✂ n S P R n ✂ Σ ✍ A σ P R n ✂ n s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✖ ✣ ✖ ✣ ☎ ✏ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✕ ✢ ✕ ✢ ✖ ✣ ✖ ✣ ☎ ☎ ✌ ☎ ☎ ✌ ✌ ✌ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ p p ✕ ✢ ✕ ✢ ☎ ☎ ☎ ☎ α ❏ f A ♣ p 1 ☎ ☎ ☎ p T ☎ σ ☎ s 1 ☎ ☎ ☎ s T ✶ q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A σ ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ α A ♣ p q β A ♣ s q

♣ q ✏ ♣ q ✏ Hankel Factorizations and Operators H σ P R Σ ✍ ✂ Σ ✍ P P R Σ ✍ ✂ n S P R n ✂ Σ ✍ A σ P R n ✂ n s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✖ ✣ ✖ ✣ ☎ ✏ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✕ ✢ ✕ ✢ ✖ ✣ ✖ ✣ ☎ ☎ ✌ ☎ ☎ ✌ ✌ ✌ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ p p ✕ ✢ ✕ ✢ ☎ ☎ ☎ ☎ α ❏ f A ♣ p 1 ☎ ☎ ☎ p T ☎ σ ☎ s 1 ☎ ☎ ☎ s T ✶ q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A σ ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ α A ♣ p q β A ♣ s q A σ ✏ P � H σ S � H ✏ P S ñ H σ ✏ P A σ S ñ = =

Hankel Factorizations and Operators H σ P R Σ ✍ ✂ Σ ✍ P P R Σ ✍ ✂ n S P R n ✂ Σ ✍ A σ P R n ✂ n s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ s ✔ ✜ ✔ ✜ ☎ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✖ ✣ ✖ ✣ ☎ ✏ ☎ ☎ ☎ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ ✖ ✣ ✖ ✣ ✕ ✢ ✕ ✢ ✖ ✣ ✖ ✣ ☎ ☎ ✌ ☎ ☎ ✌ ✌ ✌ ✌ ✌ ✌ ☎ ☎ ✌ ☎ ☎ p p ✕ ✢ ✕ ✢ ☎ ☎ ☎ ☎ α ❏ f A ♣ p 1 ☎ ☎ ☎ p T ☎ σ ☎ s 1 ☎ ☎ ☎ s T ✶ q ✏ 0 A p 1 ☎ ☎ ☎ A p T ☎ A σ ☎ A s 1 ☎ ☎ ☎ A s T ✶ α ✽ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♥ α A ♣ p q β A ♣ s q A σ ✏ P � H σ S � H ✏ P S ñ H σ ✏ P A σ S ñ = = Note: Works with finite sub-blocks as well (assuming rank ♣ P q ✏ rank ♣ S q ✏ n )

General Learning Algorithm for WFA Hankel Low-rank matrix Factorization and Data WFA estimation matrix linear algebra

General Learning Algorithm for WFA Hankel Low-rank matrix Factorization and Data WFA estimation matrix linear algebra Key Idea: The Hankel Trick 1. Learn a low-rank Hankel matrix that implicitly induces “latent” states 2. Recover the states from a decomposition of the Hankel matrix

✒ ✚ ✒ ✚ ✒ ✚ ✁ ✁ ✏ ✏ ✏ ✁ ✁ Limitations of WFA Invariance Under Change of Basis For any invertible matrix Q the following WFA are equivalent: ➓ A ✏ ① α 0 , α ✽ , t A σ ✉② ➓ B ✏ ① Q ❏ α 0 , Q ✁ 1 α ✽ , t Q ✁ 1 A σ Q ✉② f A ♣ x q ✏ α ❏ 0 A x 1 ☎ ☎ ☎ A x T α ✽ ✏ ♣ α ❏ 0 Q q♣ Q ✁ 1 A x 1 Q q ☎ ☎ ☎ ♣ Q ✁ 1 A x T Q q♣ Q ✁ 1 α ✽ q ✏ f B ♣ x q

Limitations of WFA Invariance Under Change of Basis For any invertible matrix Q the following WFA are equivalent: ➓ A ✏ ① α 0 , α ✽ , t A σ ✉② ➓ B ✏ ① Q ❏ α 0 , Q ✁ 1 α ✽ , t Q ✁ 1 A σ Q ✉② f A ♣ x q ✏ α ❏ 0 A x 1 ☎ ☎ ☎ A x T α ✽ ✏ ♣ α ❏ 0 Q q♣ Q ✁ 1 A x 1 Q q ☎ ☎ ☎ ♣ Q ✁ 1 A x T Q q♣ Q ✁ 1 α ✽ q ✏ f B ♣ x q Example ✒ 0.5 ✚ ✒ ✚ ✒ ✚ 0.1 0 1 0.3 ✁ 0.2 Q ✁ 1 A a Q ✏ A a ✏ Q ✏ 0.2 0.3 ✁ 1 0 ✁ 0.1 0.5

Limitations of WFA Invariance Under Change of Basis For any invertible matrix Q the following WFA are equivalent: ➓ A ✏ ① α 0 , α ✽ , t A σ ✉② ➓ B ✏ ① Q ❏ α 0 , Q ✁ 1 α ✽ , t Q ✁ 1 A σ Q ✉② f A ♣ x q ✏ α ❏ 0 A x 1 ☎ ☎ ☎ A x T α ✽ ✏ ♣ α ❏ 0 Q q♣ Q ✁ 1 A x 1 Q q ☎ ☎ ☎ ♣ Q ✁ 1 A x T Q q♣ Q ✁ 1 α ✽ q ✏ f B ♣ x q Consequences ➓ There is no unique parametrization for WFA ➓ Given A it is undecidable whether ❅ x f A ♣ x q ➙ 0 ➓ Cannot expect to recover a probabilistic parametrization

Outline 1. Weighted Automata and Hankel Matrices 2. Spectral Learning of Probabilistic Automata 3. Spectral Methods for Transducers and Grammars Sequence Tagging Finite-State Transductions Tree Automata 4. Hankel Matrices with Missing Entries 5. Conclusion 6. References

Spectral Learning of Probabilistic Automata Hankel Low-rank matrix Factorization and Data WFA estimation linear algebra matrix Basic Setup: ➓ Data are strings sampled from probability distribution on Σ ✍ ➓ Hankel matrix is estimated by empiricial probabilities ➓ Factorization and low-rank approximation is computed using SVD

✏ t ✉ ✏ t ✉ ✩ ✱ ✬ ✴ ✬ ✴ ✫ ✳ ✏ Ñ ♣ q ✏ ✓ ✬ ✴ ✬ ✴ ✪ ✲ The Empirical Hankel Matrix Suppose S ✏ ♣ x 1 , . . . , x N q is a sample of N i.i.d. strings Empirical distribution Empirical Hankel matrix H S ♣ p , s q ✏ ˆ ˆ N f S ♣ ps q f S ♣ x q ✏ 1 I r x i ✏ x s ˆ ➳ N i ✏ 1

✏ t ✉ ✏ t ✉ The Empirical Hankel Matrix Suppose S ✏ ♣ x 1 , . . . , x N q is a sample of N i.i.d. strings Empirical distribution Empirical Hankel matrix H S ♣ p , s q ✏ ˆ ˆ N f S ♣ ps q f S ♣ x q ✏ 1 I r x i ✏ x s ˆ ➳ N i ✏ 1 Example: ✩ ✱ aa , b , bab , a , ✬ ✴ ✬ ✴ b , a , ab , aa , f S ♣ aa q ✏ 5 ✫ ✳ ˆ S ✏ Ñ 16 ✓ 0.31 − ba , b , aa , a , ✬ ✴ ✬ ✴ aa , bab , b , aa ✪ ✲

The Empirical Hankel Matrix Suppose S ✏ ♣ x 1 , . . . , x N q is a sample of N i.i.d. strings Empirical distribution Empirical Hankel matrix H S ♣ p , s q ✏ ˆ ˆ N f S ♣ ps q f S ♣ x q ✏ 1 I r x i ✏ x s ˆ ➳ N i ✏ 1 Example: a b ✩ ✱ aa , b , bab , a , ✔ .19 .25 ✜ λ ✬ ✴ ✬ ✴ b , a , ab , aa , .31 .06 ✫ ✳ a ˆ ✖ ✣ S ✏ Ñ H S ✏ − ✖ ✣ ba , b , aa , a , .06 .00 b ✕ ✢ ✬ ✴ ✬ ✴ aa , bab , b , aa .00 .13 ✪ ✲ ba (Hankel with rows P ✏ t λ , a , b , ba ✉ and columns S ✏ t a , b ✉ )

Finite Sub-blocks of Hankel Matrices Parameters: ➓ Set of rows (prefixes) P ⑨ Σ ✍ ➓ Set of columns (suffixes) S ⑨ Σ ✍ S ... Σ λ a b aa ab h λ , S 1 0.3 0.7 0.05 0.25 . . . λ H 0.3 0.05 0.25 0.02 0.03 . . . a P 0.7 0.6 0.1 0.03 0.2 . . . b 0.05 0.02 0.03 0.017 0.003 . . . aa H a h P , λ 0.25 0.23 0.02 0.11 0.12 . . . ab . . . . . . ... . . . . . . . . . . . . ➓ H P R P ✂ S for finding P and S ➓ H σ P R P ✂ S for finding A σ ➓ h λ , S P R 1 ✂ S for finding α 0 ➓ h P , λ P R P ✂ 1 for finding α ✽

Low-rank Approximation and Factorization Will use the singular value decomposition (SVD) as the main building block Hence the name spectral!

❏ ✓ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ✂ ✂ ✂ ✂ ✓ � ✏ ✁ ❏ q � ✟ � ✏ ñ ✏ ♣ � ✏ ❏ ✏ ñ Low-rank Approximation and Factorization Parameters: ➓ Desired number of states n ➓ Block H P R P ✂ S of the empirical Hankel matrix

✓ � ✏ ✁ ❏ q � ✟ � ✏ ñ ✏ ♣ � ✏ ❏ ✏ ñ Low-rank Approximation and Factorization Parameters: ➓ Desired number of states n ➓ Block H P R P ✂ S of the empirical Hankel matrix Low-rank Approximation: compute truncated SVD of rank n V ❏ ✓ H U n Λ n n ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ P ✂ S n ✂ n P ✂ n n ✂ S

Low-rank Approximation and Factorization Parameters: ➓ Desired number of states n ➓ Block H P R P ✂ S of the empirical Hankel matrix Low-rank Approximation: compute truncated SVD of rank n V ❏ ✓ H U n Λ n n ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ ❧♦ ♦♠♦ ♦♥ P ✂ S n ✂ n P ✂ n n ✂ S Factorization: H ✓ PS given by SVD, pseudo-inverses are easy P � ✏ Λ ✁ 1 n U ❏ ✏ ♣ HV n q � ✟ � P ✏ U n Λ n ñ n S � ✏ V n S ✏ V ❏ ñ n

� ✏ � ✁ ❏ q � � ✟ ✏ ✏ ♣ ❏ ✏ � ✏ � ✁ ❏ q � � ✟ ✽ ✏ ✏ ✏ ♣ ✽ Computing the WFA Parameters: ➓ Factorization H ✓ ♣ U Λ q ☎ V ❏ ✏ P ☎ S ➓ Hankel blocks H σ , h λ , S , h P , λ

✽ Computing the WFA Parameters: ➓ Factorization H ✓ ♣ U Λ q ☎ V ❏ ✏ P ☎ S ➓ Hankel blocks H σ , h λ , S , h P , λ Equations: A σ ✏ P � H σ S � ✏ Λ ✁ 1 U ❏ H σ V ✏ ♣ HV q � H σ V � ✟ h λ , S S � ✏ α ❏ 0 ✏ h λ , S V P � h P , λ ✏ Λ ✁ 1 U ❏ h P , λ ✏ ♣ HV q � h P , λ � ✟ α ✽ ✏

Computing the WFA Parameters: ➓ Factorization H ✓ ♣ U Λ q ☎ V ❏ ✏ P ☎ S ➓ Hankel blocks H σ , h λ , S , h P , λ Equations: A σ ✏ P � H σ S � ✏ Λ ✁ 1 U ❏ H σ V ✏ ♣ HV q � H σ V � ✟ h λ , S S � ✏ α ❏ 0 ✏ h λ , S V P � h P , λ ✏ Λ ✁ 1 U ❏ h P , λ ✏ ♣ HV q � h P , λ � ✟ α ✽ ✏ Full Algorithm 1. Estimate empirical Hankel and retrieve sub-blocks H , H σ , h λ , S , h P , λ 2. Perform SVD of H 3. Solve for A σ , α 0 , α ✽ with pseudo-inverses

Computational and Statistical Complexity Running Time: ➓ Empirical Hankel matrix: O ♣⑤ PS ⑤ ☎ N q ➓ SVD and linear algebra: O ♣⑤ P ⑤ ☎ ⑤ S ⑤ ☎ n q Statistical Consistency: ➓ By law of large numbers, ˆ H S Ñ E r H s when N Ñ ✽ ➓ If E r H s is Hankel of some WFA A , then ˆ A Ñ A ➓ Works for data coming from PFA and HMM PAC Analysis: (assuming data from A with n states) ❄ ➓ With high probability, ⑥ ˆ H S ✁ H ⑥ ↕ O ♣ 1 ④ N q ➓ When N ➙ O ♣ n ⑤ Σ ⑤ 2 T 4 ④ ε 2 s n ♣ H q 4 q , then ➳ ⑤ f A ♣ x q ✁ f ˆ A ♣ x q⑤ ↕ ε ⑤ x ⑤↕ T Proofs can be found in [Hsu et al., 2009, Bailly, 2011, Balle, 2013]

Practical Considerations Hankel Low-rank matrix Factorization and Data WFA estimation matrix linear algebra Basic Setup: ➓ Data are strings sampled from probability distribution on Σ ✍ ➓ Hankel matrix is estimated by empiricial probabilities ➓ Factorization and low-rank approximation is computed using SVD Advanced Implementations: ➓ Choice of parameters P and S ➓ Scalable estimation and factorization of Hankel matrices ➓ Smoothing and variance normalization ➓ Use of prefix and substring statistics

↕ ✏ ✏ ➙ ➓ ➓ ➓ Choosing the Basis Definition: The pair ♣ P , S q defining the sub-block is called a basis Intuitions: ➓ Basis should be choosen such that E r H s has full rank ➓ P must contain strings reaching each possible state of the WFA ➓ S must contain string producing different outcomes for each pair of states in the WFA

Choosing the Basis Definition: The pair ♣ P , S q defining the sub-block is called a basis Intuitions: ➓ Basis should be choosen such that E r H s has full rank ➓ P must contain strings reaching each possible state of the WFA ➓ S must contain string producing different outcomes for each pair of states in the WFA Popular Approaches: ➓ Set P ✏ S ✏ Σ ↕ k for some k ➙ 1 [Hsu et al., 2009] ➓ Choose P and S to contain the K most frequent prefixes and suffixes in the sample [Balle et al., 2012] ➓ Take all prefixes and suffixes appearing in the sample [Bailly et al., 2009]

Scalable Implementations Problem: When ⑤ Σ ⑤ is large, even the simplest basis become huge Hankel Matrix Representation: ➓ Use hash functions to map P ( S ) to row (column) indices ➓ Use sparse matrix data structures because statistics are usually sparse ➓ Never store the full Hankel matrix in memory Efficient SVD Computation: ➓ SVD for sparse matrices [Berry, 1992] ➓ Approximate randomized SVD [Halko et al., 2011] ➓ On-line SVD with rank 1 updates [Brand, 2006]

➓ ➓ ➓ ➓ Refining the Statistics in the Hankel Matrix Smoothing the Estimates ➓ Empirical probabilities ˆ f S ♣ x q tend to be sparse ➓ Like in n -gram models, smoothing can help when Σ is large ➓ Should take into account that strings in PS have different lengths ➓ Open Problem: How to smooth empirical Hankels properly

Refining the Statistics in the Hankel Matrix Smoothing the Estimates ➓ Empirical probabilities ˆ f S ♣ x q tend to be sparse ➓ Like in n -gram models, smoothing can help when Σ is large ➓ Should take into account that strings in PS have different lengths ➓ Open Problem: How to smooth empirical Hankels properly Row and Column Weighting ➓ More frequent prefixes (suffixes) have better estimated rows (columns) ➓ Can scale rows and columns to reflect that ➓ Will lead to more reliable SVD decompositions ➓ See [Cohen et al., 2013] for details

➳ ✏ r s ✏ ✩ ✱ ✔ ✜ ✬ ✴ ✬ ✴ ✫ ✳ ✖ ✣ ✏ Ñ ✏ ✖ ✣ ✕ ✢ ✬ ✴ ✬ ✴ ✪ ✲ ✩ ✱ ✔ ✜ ✬ ✴ ✬ ✴ ✫ ✳ ✖ ✣ ✏ Ñ ✏ ✖ ✣ ✕ ✢ ✬ ✴ ✬ ✴ ✪ ✲ Substring Statistics Problem: If the sample contains strings with wide range of lengths, small basis will ignore most of the examples

➳ ✏ r s ✏ ✩ ✱ ✔ ✜ ✬ ✴ ✬ ✴ ✫ ✳ ✖ ✣ ✏ Ñ ✏ ✖ ✣ ✕ ✢ ✬ ✴ ✬ ✴ ✪ ✲ Substring Statistics Problem: If the sample contains strings with wide range of lengths, small basis will ignore most of the examples String Statistics (occurence probability): a b ✩ ✱ ✔ ✜ aa , b , bab , a , .19 .06 λ ✬ ✴ ✬ ✴ bbab , abb , babba , abbb , .06 .06 ✫ ✳ a ˆ ✖ ✣ S ✏ Ñ H ✏ − ✖ ✣ ab , a , aabba , baa , .00 .06 b ✕ ✢ ✬ ✴ ✬ ✴ abbab , baba , bb , a .06 .06 ✪ ✲ ba

Substring Statistics Problem: If the sample contains strings with wide range of lengths, small basis will ignore most of the examples String Statistics (occurence probability): a b ✩ ✱ ✔ ✜ aa , b , bab , a , .19 .06 λ ✬ ✴ ✬ ✴ bbab , abb , babba , abbb , .06 .06 ✫ ✳ a ˆ ✖ ✣ S ✏ Ñ H ✏ − ✖ ✣ ab , a , aabba , baa , .00 .06 b ✕ ✢ ✬ ✴ ✬ ✴ abbab , baba , bb , a .06 .06 ✪ ✲ ba Substring Statistics (expected number of occurences as substring): N ✏ 1 ➳ r number of occurences of x in x i s Empirical expectation N i ✏ 1 a b ✩ ✱ ✔ ✜ aa , b , bab , a , 1.31 1.56 λ ✬ ✴ ✬ ✴ bbab , abb , babba , abbb , .19 .62 ✫ ✳ ˆ a ✖ ✣ S ✏ Ñ H ✏ − ✖ ✣ ab , a , aabba , baa , .56 .50 b ✕ ✢ ✬ ✴ ✬ ✴ abbab , baba , bb , a .06 .31 ✪ ✲ ba

Substring Statistics Theorem [Balle et al., 2014] If a probability distribution f is computed by a WFA with n states, then the corresponding substring statistics are also computed by a WFA with n states Learning from Substring Statistics ➓ Can work with smaller Hankel matrices ➓ But estimating the matrix takes longer

Experiment: PoS-tag Sequence Models Spectral, Σ basis 74 Spectral, basis k=25 Spectral, basis k=50 Spectral, basis k=100 72 Spectral, basis k=300 Spectral, basis k=500 Word Error Rate (%) Unigram 70 Bigram 68 66 64 62 60 0 10 20 30 40 50 Number of States ➓ PTB sequences of simplified PoS tags [Petrov et al., 2012] ➓ Configuration: expectations on frequent substrings ➓ Metric: error rate on predicting next symbol in test sequences

Experiment: PoS-tag Sequence Models 70 68 Word Error Rate (%) 66 64 62 Spectral, Σ basis 60 Spectral, basis k=500 EM Unigram 58 Bigram 0 10 20 30 40 50 Number of States ➓ Comparison with a bigram baseline and EM ➓ Metric: error rate on predicting next symbol in test sequences ➓ At training, the Spectral Method is → 100 faster than EM

Outline 1. Weighted Automata and Hankel Matrices 2. Spectral Learning of Probabilistic Automata 3. Spectral Methods for Transducers and Grammars Sequence Tagging Finite-State Transductions Tree Automata 4. Hankel Matrices with Missing Entries 5. Conclusion 6. References

Sequence Tagging and Transduction ➓ Many applications involve pairs of input-output sequences: ➓ Sequence tagging (one output tag per input token) e.g.: part of speech tagging output: NNP NNP VBZ NNP . input: Ms. Haag plays Elianti . ➓ Transductions (sequence lenghts might differ) e.g.: spelling correction output: a p p l e input: a p l e ➓ Finite-state automata are classic methods to model these relations. Spectral methods apply naturally to this setting.

Sequence Tagging ➓ Notation: ➓ Input alphabet X ➓ Output alphabet Y ➓ Joint alphabet Σ ✏ X ✂ Y ➓ Goal: map input sequences to output sequences of the same length ➓ Approach: learn a function f : ♣ X ✂ Y q ✍ Ñ R Then, given an input x P X T return argmax f ♣ x , y q y P Y T (note: this maximization is not tractable in general)

✂ ➓ ✏ ① ✽ t ✉② ➓ q ✍ Ñ ♣ ✂ ❏ ❏ ♣ q ✏ ☎ ☎ ☎ ✽ ✏ ✽ Weighted Finite Tagger ➓ Notation: ➓ X ✂ Y : joint alphabet – finite set ➓ n : number of states – positive integer ➓ α 0 : initial weights – vector in R n (features of empty prefix) ➓ α ✽ : final weights – vector in R n (features of empty suffix) a : transition weights – matrix in R n ✂ n ( ❅ a P X , b P Y ) ➓ A b

➓ q ✍ Ñ ♣ ✂ ❏ ❏ ♣ q ✏ ☎ ☎ ☎ ✽ ✏ ✽ Weighted Finite Tagger ➓ Notation: ➓ X ✂ Y : joint alphabet – finite set ➓ n : number of states – positive integer ➓ α 0 : initial weights – vector in R n (features of empty prefix) ➓ α ✽ : final weights – vector in R n (features of empty suffix) a : transition weights – matrix in R n ✂ n ( ❅ a P X , b P Y ) ➓ A b ➓ Definition: WFTagger with n states over X ✂ Y A ✏ ① α 0 , α ✽ , t A b a ✉②

Weighted Finite Tagger ➓ Notation: ➓ X ✂ Y : joint alphabet – finite set ➓ n : number of states – positive integer ➓ α 0 : initial weights – vector in R n (features of empty prefix) ➓ α ✽ : final weights – vector in R n (features of empty suffix) a : transition weights – matrix in R n ✂ n ( ❅ a P X , b P Y ) ➓ A b ➓ Definition: WFTagger with n states over X ✂ Y A ✏ ① α 0 , α ✽ , t A b a ✉② ➓ Compositional Function: Every WFTagger defines a function f A : ♣ X ✂ Y q ✍ Ñ R f A ♣ x 1 . . . x T , y 1 . . . y T q ✏ α ❏ x T α ✽ ✏ α ❏ 0 A y 1 x 1 ☎ ☎ ☎ A y T 0 A y x α ✽

The Spectral Method for WFTaggers Low-rank matrix Hankel Factorization and Data WFA estimation linear algebra matrix ➓ Assume f ♣ x , y q ✏ P ♣ x , y q ➓ Same mechanics as for WFA, with Σ ✏ X ✂ Y ➓ In a nutshell: 1. Choose set of prefixes and suffixes to define Hankel Ñ in this case they are bistrings 2. Estimate Hankel with prefix-suffix training statistics 3. Factorize Hankel using SVD 4. Compute α and β projections, and compute operators ① α 0 , α ✽ , t A σ ✉② ➓ Other cases: ➓ f A ♣ x , y q ✏ P ♣ y ⑤ x q — see [Balle et al., 2011] ➓ f A ♣ x , y q non-probabilistic — see [Quattoni et al., 2014]

➳ ❏ ✾ ✽ ✏ ✄ ☛ ✄ ☛ ❏ ➳ ➳ ✁ � ✾ ✽ ✁ � ✁ � ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ ✍ ♣ ✍ ♣ ✁ q q � ✄ ➳ ☛ ✄ ➳ ☛ ✍ ♣ ✍ ♣ ✍ ♣ ✍ ♣ q ✏ ✁ q q ✏ q � P P Prediction with WFTaggers ➓ Assume f A ♣ x , y q ✏ P ♣ x , y q ➓ Given x 1 : T , compute most likely output tag at position t : argmax µ ♣ t , a q a P Y where ➳ µ ♣ t , a q ✜ P ♣ y t ✏ a ⑤ x q ✾ P ♣ x , y q y ✏ y 1 ... a ... y T

Prediction with WFTaggers ➓ Assume f A ♣ x , y q ✏ P ♣ x , y q ➓ Given x 1 : T , compute most likely output tag at position t : argmax µ ♣ t , a q a P Y where ➳ µ ♣ t , a q ✜ P ♣ y t ✏ a ⑤ x q ✾ P ♣ x , y q y ✏ y 1 ... a ... y T ➳ α ❏ 0 A y ✾ x α ✽ y ✏ y 1 ... a ... y T ✄ ☛ ✄ ☛ A y 1 : t ✁ 1 A y i � 1 : T ✾ α ❏ ➳ A a ➳ α ✽ x 1 : t ✁ 1 x t � 1 : T 0 x t y 1 ... y t ✁ 1 y t � 1 ... y T ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ ❧♦♦♦♦♦♦♦♦♦♦♦♦♠♦♦♦♦♦♦♦♦♦♦♦♦♥ α ✍ β ✍ A ♣ x 1 : t ✁ 1 q A ♣ x t � 1 : T q ✄ ➳ ☛ ✄ ➳ ☛ α ✍ A ♣ x 1 : t q ✏ α ✍ A b β ✍ A b β ✍ A ♣ x 1 : t ✁ 1 q A ♣ x t : T q ✏ A ♣ x t � 1 : T q x t x t b P Y b P Y

Prediction with WFTaggers (II) ➓ Assume f A ♣ x , y q ✏ P ♣ x , y q ➓ Given x 1 : T , compute most likely output bigram ab at position t : argmax µ ♣ t , a , b q a , b P Y where µ ♣ t , a , b q ✏ P ♣ y t ✏ a , y t � 1 ✏ b ⑤ x q α ✍ A ♣ x 1 : t ✁ 1 q A a x t A b x t � 1 β ✍ ✾ A ♣ x t � 2 : T q ➓ Compute most likely full sequence y – intractable In practice, use Minimum Bayes-Risk decoding: ➳ argmax µ ♣ t , y t , y t � 1 q y P Y T t

Finite State Transducers c e d a a-c ǫ -d (ab,cde) b-e b ➓ A WFTransducer evaluates aligned strings, using the empty symbol ǫ to produce one-to-one alignments: f ♣ c d e b q ✏ α ❏ 0 A c a A d ǫ A e b α ✽ a ǫ ➓ Then, a function g can be defined on unaligned strings by aggregating alignments ➳ g ♣ ab , cde q ✏ f ♣ π q π P Π ♣ ab , cde q

Ñ Ñ Ñ ➓ ➓ ➓ Finite State Transducers: Main Problems ➓ Prediction: given an FST A , how to . . . ➓ Compute g ♣ x , y q for unaligned strings? ➓ Compute marginal quantities µ ♣ edge q ✏ P ♣ edge ⑤ x q ? ➓ Compute most-likely y for given x ?

➓ ➓ ➓ Finite State Transducers: Main Problems ➓ Prediction: given an FST A , how to . . . ➓ Compute g ♣ x , y q for unaligned strings? Ñ using edit-distance recursions ➓ Compute marginal quantities µ ♣ edge q ✏ P ♣ edge ⑤ x q ? Ñ also using edit-distance recursions ➓ Compute most-likely y for given x ? Ñ use MBR-decoding with marginal scores

Finite State Transducers: Main Problems ➓ Prediction: given an FST A , how to . . . ➓ Compute g ♣ x , y q for unaligned strings? Ñ using edit-distance recursions ➓ Compute marginal quantities µ ♣ edge q ✏ P ♣ edge ⑤ x q ? Ñ also using edit-distance recursions ➓ Compute most-likely y for given x ? Ñ use MBR-decoding with marginal scores ➓ Unsupervised Learning: learn an FST from pairs of unaligned strings ➓ Unlike with EM, the spectral method can not recover latent structure such as alignments (recall: alignments are needed to estimate Hankel entries) ➓ See [Bailly et al., 2013b] for a solution based on Hankel matrix completion

Spectral Learning of Tree Automata and Grammars S NP VP noun verb NP Mary det noun plays the guitar Some References: ➓ Tree Series: [Bailly et al., 2010, Bailly et al., 2010] ➓ Latent-annotated PCFG: [Cohen et al., 2012, Cohen et al., 2013] ➓ Dependency parsing: [Luque et al., 2012, Dhillon et al., 2012] ➓ Unsupervised learning of WCFG: [Bailly et al., 2013a, Parikh et al., 2014] ➓ Synchronous grammars: [Saluja et al., 2014]

☎ ☞ ☛ ❏ ✄ ✁ ✁ ✠ ✠ ✏ ✌ ✏ ❜ ♣ q ✝ ✍ ✍ ✆ ☎ ☞ ☛ ❏ ✄ ✏ ✌ ✏ ♣ ♣ q ❜ ♣ qq ✝ ✍ ✆ ✍ Compositional Functions over Trees ☎ ☞ ☎ ☞ ☛ ❏ ☎ ☞ a a ✄ a a b a b a ✌ ✏ ✌ ✏ α A f f β A ✝ ✍ ✝ ✍ ✍ b ✆ ✌ c ✆ ✆ c c c c c b b b b b b

☎ ☞ ☛ ❏ ✄ ✏ ✌ ✏ ♣ ♣ q ❜ ♣ qq ✝ ✍ ✆ ✍ Compositional Functions over Trees ☎ ☞ ☎ ☞ ☛ ❏ ☎ ☞ a a ✄ a a b a b a ✌ ✏ ✌ ✏ α A f f β A ✝ ✍ ✝ ✍ ✍ b ✆ ✌ c ✆ ✆ c c c c c b b b b b b ☎ ☞ ☛ ❏ a ✄ a ✁ ✁ ✠ ✠ c b a ✏ f ✌ ✏ α A A a β A ❜ β A ♣ c q ✝ ✍ b ✍ ✆ b b c c b b

Compositional Functions over Trees ☎ ☞ ☎ ☞ ☛ ❏ ☎ ☞ a a ✄ a a b a b a ✌ ✏ ✌ ✏ α A f f β A ✝ ✍ ✝ ✍ ✍ b ✆ ✌ c ✆ ✆ c c c c c b b b b b b ☎ ☞ ☛ ❏ a ✄ a ✁ ✁ ✠ ✠ c b a ✏ f ✌ ✏ α A A a β A ❜ β A ♣ c q ✝ ✍ b ✍ ✆ b b c c b b ☎ ☞ ☛ ❏ a ✄ a b a ✏ f ✌ ✏ α A A c ♣ β A ♣ b q ❜ β A ♣ b qq ✝ ✍ b a ✆ c c c ✍ b b

❞ ✏ Inside-Outside Composition of Trees a a c c ✍ b b c a c a b b t ✏ t o ❞ t i note: i-o composition generalizes the notion of concatenation in strings, i.e., outside trees are prefixes, inside trees are suffixes

Spectral Learning Techniques for Weighted Automata, Transducers, and - PowerPoint PPT Presentation

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle Ariadna Quattoni Xavier Carreras q McGill University q Xerox Research Centre Europe TUTORIAL @ EMNLP 2014 Status Quo

Scikit Spectral Learning (SpLearn): a toolbox for the spectral learning of weighted automata Denis

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

A Nivat Theorem for Weighted Timed Automata and Weighted Relative Distance Logic Manfred Droste

Weighted Automata and Logics for Infinite Nested Words Manfred Droste and Stefan D uck

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Rigorous Approximated Determinization of Weighted Automata Benjamin Aminof (Hebrew University)

A Weak Bisimulation for Weighted Automata Peter Kemper College of William and Mary Weighted

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Spectral estimates and Random Weighted Sobolev Inequalities Didier Robert in collaboration with

Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Craig Prince and

Weighted Automata and Concurrency Akash Lal Microsoft Research, India Microsoft Research, India

Rationality & Recognisability An introduction to weighted automata theory Tutorial given at

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang,

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Predicting Share Prices in Real-Time with Apache Spark and Apache Ignite MANUEL MOURATO Summary

Fractal Structures in Functions Related to Number Theory Je ff Lagarias University of Michigan

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

1 & 2 Samuel Series Lesson #001 February 5, 2015 Dean Bible Ministries

Spectral Learning Techniques for Weighted Automata, Transducers, and - PowerPoint PPT Presentation

Spectral Learning Techniques for Weighted Automata, Transducers, and Grammars Borja Balle Ariadna Quattoni Xavier Carreras q McGill University q Xerox Research Centre Europe TUTORIAL @ EMNLP 2014 Status Quo

Scikit Spectral Learning (SpLearn): a toolbox for the spectral learning of weighted automata Denis

Automata and program analysis Thomas Colcombet FCT Bordeaux 13 September 2017 based on

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

A Nivat Theorem for Weighted Timed Automata and Weighted Relative Distance Logic Manfred Droste

Weighted Automata and Logics for Infinite Nested Words Manfred Droste and Stefan D uck

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Rigorous Approximated Determinization of Weighted Automata Benjamin Aminof (Hebrew University)

A Weak Bisimulation for Weighted Automata Peter Kemper College of William and Mary Weighted

CSC 473 Automata, Grammars &amp; Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Spectral estimates and Random Weighted Sobolev Inequalities Didier Robert in collaboration with

Confidence Weighted Marginal Utility Analyses of Internet Mapping Techniques Craig Prince and

Weighted Automata and Concurrency Akash Lal Microsoft Research, India Microsoft Research, India

Rationality &amp; Recognisability An introduction to weighted automata theory Tutorial given at

Your 2 is My 1, Your 3 is My 9: Handling Arbitrary Miscalibrations in Ratings Jingyan Wang,

A First Investigation of Sturmian Trees Jean Berstel 2 , Luc Boasson 1 Olivier Carton 1 , Isabelle

Observations of IPv6 Addresses David Malone &lt;David.Malone@nuim.ie&gt; Hamilton Institute, NUI

Finite-state Strategies in Delay Games Martin Zimmermann Saarland University September 21st,

Predicting Share Prices in Real-Time with Apache Spark and Apache Ignite MANUEL MOURATO Summary

Fractal Structures in Functions Related to Number Theory Je ff Lagarias University of Michigan

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

1 &amp; 2 Samuel Series Lesson #001 February 5, 2015 Dean Bible Ministries

CSC 473 Automata, Grammars & Languages 9/29/10 Automata, Grammars and Languages Discourse 03

Rationality & Recognisability An introduction to weighted automata theory Tutorial given at

Observations of IPv6 Addresses David Malone <David.Malone@nuim.ie> Hamilton Institute, NUI

1 & 2 Samuel Series Lesson #001 February 5, 2015 Dean Bible Ministries