a spectral learning algorithm for finite state transducers
play

A Spectral Learning Algorithm for Finite State Transducers Borja - PowerPoint PPT Presentation

A Spectral Learning Algorithm for Finite State Transducers Borja Balle , Ariadna Quattoni, Xavier Carreras ECML PKDD September 7, 2011 B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 1 / 15 Overview Probabilistic


  1. A Spectral Learning Algorithm for Finite State Transducers Borja Balle , Ariadna Quattoni, Xavier Carreras ECML PKDD — September 7, 2011 B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 1 / 15

  2. Overview Probabilistic Transducers ◮ Model input-output relations with hidden states ◮ As conditional distribution Pr [ y | x ] over strings ◮ With certain independence assumptions Input X 1 X 2 X 3 X 4 ... H 1 H 2 H 3 H 4 · · · Hidden Output Y 1 Y 2 Y 3 Y 4 ◮ Used in many applications: NLP , biology, . . . ◮ Hard to learn in general — usually EM algorithm is used B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 2 / 15

  3. Overview Spectral Learning Probabilistic Transducers Our contribution: ◮ Fast learning algorithm for probabilistic FST ◮ With PAC-style theoretical guarantees ◮ Based on Observable Operator Model for FST ◮ Using spectral methods (Chang ’96, Mossel-Roch ’05, Hsu et al. ’09, Siddiqi et al. ’10) ◮ Performing better than EM in experiments with real data B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 3 / 15

  4. Outline Observable Operators for FST Learning Observable Operator Models Experimental Evaluation Conclusion B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 4 / 15

  5. Observable Operators for FST Deriving Observable Operator Models Given ( x , y ) ∈ ( X × Y ) t aligned sequences, model computes conditional probability (i.e. | x | = | y | ) Pr [ y | x ] = � h ∈H t Pr [ y , h | x ] (marginalize states) = � h t + 1 ∈H Pr [ y , h t + 1 | x ] (independence assumptions) = 1 ⊤ α t + 1 (vector form, α t + 1 ∈ R m ) = 1 ⊤ A y t x t α t (forward-backward equations) = 1 ⊤ A y t x t · · · A y 1 x 1 α (induction on t ) The choice of an operator A b a depends only on observable symbols B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 5 / 15

  6. Observable Operators for FST Observable Operator Model Parameters Given X = { a 1 , . . . , a k } , Y = { b 1 , . . . , b l } , H = { c 1 , . . . , c m } , then Pr [ y | x ] = 1 ⊤ A y t x t · · · A y 1 x 1 α with parameters: A b a = T a D b ∈ R m × m (factorized operator) T a ( i , j ) = Pr [ H s = c i | X s − 1 = a , H s − 1 = c j ] ∈ R m × m (state transition) D b ( i , j ) = δ i , j Pr [ Y s = b | H s = c j ] ∈ R m × m (observation emission) O ( i , j ) = Pr [ Y s = b i | H s = c j ] ∈ R l × m (collected emissions) α ( i ) = Pr [ H 1 = c i ] ∈ R m (initial probabilites) The choice of an operator A b a depends only on observable symbols . . . . . . but operator parameters are conditioned by hidden states B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 6 / 15

  7. Observable Operators for FST A Learnable Set of Observable Operators Note that for any invertible Q ∈ R m × m Pr [ y | x ] = 1 ⊤ Q − 1 ( Q A y t x t Q − 1 ) · · · ( Q A y 1 x 1 Q − 1 ) Q α Idea ( subspace identification methods for linear systems, ’80s ) Find a basis for the state space such that operators in the new basis are related to observable quantities Following multiplicity automata and spectral HMM learning . . . B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 7 / 15

  8. Observable Operators for FST A Learnable Set of Observable Operators Find a basis Q where operators can be expressed in terms of unigram, bigram and trigram probabilities ρ ( i ) = Pr [ Y 1 = b i ] ∈ R l P ( i , j ) = Pr [ Y 1 = b j , Y 2 = b i ] ∈ R l × l P b a ( i , j ) = Pr [ Y 1 = b j , Y 2 = b , Y 3 = b i | X 2 = a ] ∈ R l × l Theorem ( ρ , P and P b a are sufficient statistics) Let P = U Σ V ∗ be a thin SVD decomposition, then Q = U ⊤ O yields (under certain assumptions) Q α = U ⊤ ρ 1 ⊤ Q − 1 = ρ ⊤ ( U ⊤ P ) + a Q − 1 = ( U ⊤ P b Q A b a )( U ⊤ P ) + B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 8 / 15

  9. Learning Observable Operator Models Spectral Learning Algorithm Given ◮ Input X and output Y alphabet ◮ Number of hidden states m ◮ Training sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } Do ρ , bigram � P and trigram � P b ◮ Compute unigram � a relative frequencies in S ◮ Perform SVD on � P and take � U with top m left singular vectors ρ , � P , � a and � ◮ Return operators computed using � P b U In Time ◮ O ( n ) to compute relative frequencies ◮ O ( |Y| 3 ) to compute SVD B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 9 / 15

  10. Learning Observable Operator Models PAC-Style Result ◮ Input distribution D X over X ∗ with λ = E [ | X | ] , µ = min a Pr [ X 2 = a ] ◮ Conditional distributions D Y | x on Y ∗ given x ∈ X ∗ modeled by an FST with m states (satisfying certain rank assumptions) ◮ Sampling i.i.d. from joint distribution D X ⊗ D Y | X Theorem For any 0 < ε, δ < 1 , if the algorithm receives a sample of size � � λ 2 m |Y| log |X| ( σ O and σ P are mth singular n ≥ O , ε 4 µσ 2 O σ 4 values of O and P in target) δ P then with probability at least 1 − δ the hypothesis � D Y | x satisfies   � �  � ( L 1 distance between � � � D Y | X ( y ) − � joint distributions  ≤ ε . E X D Y | X ( y ) � D X ⊗ D Y | X and D X ⊗ � D Y | X ) y ∈Y ∗ B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 10 / 15

  11. Experimental Evaluation Synthetic Experiments Goal: Compare against baselines when learning hypothesis hold Target: Randomly generated with |X| = 3, |Y| = 3, |H| = 2 0.7 HMM k−HMM 0.6 FST 0.5 ◮ HMM: model input-output L1 distance jointly 0.4 ◮ k -HMM: one model for each 0.3 input symbol 0.2 ◮ Results averaged over 5 runs 0.1 0 32 128 512 2048 8192 32768 # training samples (in thousands) B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 11 / 15

  12. Experimental Evaluation Transliteration Experiments Goal: Compare against EM in a real task (where modeling assumptions fail) Task: English to Russian transliteration (brooklyn → бруклин ) 80 Training times Spectral, m=2 Spectral, m=3 Spectral 26 s EM, m=2 70 normalized edit distance EM, m=3 EM (iteration) 37 s EM (best) 1133 s 60 50 ◮ Sequence alignment done in 40 preprocessing 30 ◮ Standard techniques used for inference 20 75 150 350 750 1500 3000 6000 ◮ Test size: 943, |X| = 82, |Y| = 34 # training sequences B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 12 / 15

  13. Conclusion Summary of Contributions ◮ Fast spectral method for learning input-output OOM ◮ Strong theoretical guarantees with few assumptions on input distribution ◮ Outperforming previous spectral algorithms on FST ◮ Faster and better than EM in some real tasks B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 13 / 15

  14. A Spectral Learning Algorithm for Finite State Transducers Borja Balle , Ariadna Quattoni, Xavier Carreras ECML PKDD — September 7, 2011 B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 14 / 15

  15. Technical Assumptions X = { a 1 , . . . , a k } , Y = { b 1 , . . . , b l } , H = { c 1 , . . . , c m } Parameters T a ( i , j ) = Pr [ H s = c i | X s − 1 = a , H s − 1 = c j ] ∈ R m × m (state transition) T = � a T a Pr [ X 1 = a ] ∈ R m × m (“mean” transition matrix) O ( i , j ) = Pr [ Y s = b i | H s = c j ] ∈ R l × m (collected emissions) α ( i ) = Pr [ H 1 = c i ] ∈ R m (initial probabilites) Assumptions 1. l ≥ m 2. α > 0 3. rank ( T ) = rank ( O ) = m 4. min a Pr [ X 2 = a ] > 0 B. Balle , A. Quattoni, X. Carreras Spectral Learning FST ECML PKDD 2011 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend