local loss optimization in operator models a new insight
play

Local Loss Optimization in Operator Models: A New Insight into - PowerPoint PPT Presentation

Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award


  1. Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

  2. A Simple Spectral Method [HKZ09] ➓ n states – Y t P t 1, . . . , n ✉ Discrete Homogeneous ➓ k symbols – X t P t σ 1 , . . . , σ k ✉ Hidden Markov Model ➓ for now assume n ↕ k Y 1 Y 2 Y 3 Y 4 ⋯ ➓ Forward-backward equations with A σ P R n ✂ n : X 1 X 2 X 3 X 4 P r X 1 : t ✏ w s ✏ α ❏ 1 A w 1 ☎ ☎ ☎ A w t � 1 ➓ Probabilities arranged into matrices H , H σ 1 , . . . , H σ k P R k ✂ k H ♣ i , j q ✏ P r X 1 ✏ σ i , X 2 ✏ σ j s H σ ♣ i , j q ✏ P r X 1 ✏ σ i , X 2 ✏ σ , X 3 ✏ σ j s ➓ Spectral learning algorithm for B σ ✏ QA σ Q ✁ 1 : 1. Compute SVD H ✏ UDV ❏ and take top n right singular vectors V n 2. B σ ✏ ♣ HV n q � H σ V n (For simplicity, in this talk we ignore learning of initial and final vectors)

  3. A Local Approach to Learning? ➓ Maximum likelihood uses the whole of the sample S ✏ t w 1 , . . . , w N ✉ and is always consistent in the realizable case ➳ N 1 log ♣ α ❏ � max 1 A w i 1 ☎ ☎ ☎ A w i 1 q N ti α 1 , t A σ ✉ i ✏ 1 ➓ The spectral method only uses local information from the sample in H , ♣ ♣ H a , ♣ H b and its consistency depends on properties of H S ✏ t abbabba , aabaa , baaabbbabab , bbaaba , bababbabbaaaba , abbb , . . . ✉ Questions ➓ Is the spectral method minimizing a “local” loss function? ➓ When does this minimization yield a consistent algorithm?

  4. Outline Spectral Learning as Local Loss Optimization A Convex Relaxation of the Local Loss Choosing a Consistent Local Loss

  5. Loss Function of the Spectral Method ➓ Both ingredients in the spectral method have optimization interpretations n V n ✏ I ⑥ HV n V ❏ SVD — min V ❏ n ✁ H ⑥ F Pseudo-inverse — min B σ ⑥ HV n B σ ✁ H σ V n ⑥ F ➓ Can formulate a joint optimization for the spectral method ➳ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min F t B σ ✉ , V ❏ n V n ✏ I σ P Σ

  6. Properties of the Spectral Optimization ➳ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min F t B σ ✉ , V ❏ n V n ✏ I σ P Σ ➓ Theorem The optimization is consistent under the same conditions of the spectral method ➓ The loss is non-convex due to V n B σ and constraint V ❏ n V n ✏ I ➓ Spectral method equivalent to 1. Choosing V n using SVD 2. Optimizing t B σ ✉ with fixed V n Intuition about the Loss Function ➓ Minimize the ℓ 2 norm of the unexplained (finite set of) futures when a symbol σ is generated and the transition is explained using B σ ( over a finite set of pasts ) ➓ Strongly based on the markovianity of the process – which generic ML does not exploit

  7. A Convex Relaxation of the Local Loss ➓ For algorithmic purposes a convex local loss function is more desirable ➓ A relaxation can be obtained by replacing the projection V n with a regularization term ➦ σ P Σ ⑥ HV n B σ ✁ H σ V n ⑥ 2 min t B σ ✉ , V ❏ n V n ✏ I F ➓ 1. fix n ✏ ⑤ S ⑤ and take V n ✏ I ➓ ➒ 2. B Σ ✏ r B σ 1 ⑤ ☎ ☎ ☎ ⑤ B σ k s and H Σ ✏ r H σ 1 ⑤ ☎ ☎ ☎ ⑤ H σ k s 3. regularize via nuclear norm to emulate V n min B Σ ⑥ HB Σ ✁ H Σ ⑥ 2 F � τ ⑥ B Σ ⑥ ✝ ➓ This optimization is convex and has some interesting theoretical (see paper) and empirical properties

  8. Experimental Results with the Convex Local Loss Performing experiments with synthetic targets the following facts are observed ➓ Tuning the regularization parameter τ a better trade-off between generalization and model complexity can be achieved ➓ The largest gains when using the convex relaxation are attained for targets suposedly hard to the spectral method 0.09 0.1 SVD n=1 SVD SVD n=2 0.09 CO SVD n=3 difference 0.08 SVD n=4 0.08 SVD n=5 0.07 0.07 CO 0.06 0.06 L1 error L1 error 0.05 0.04 0.05 0.03 0.04 0.02 0.01 0.03 0 0.02 -0.01 0 500 1000 1500 2000 2500 3000 1e-05 0.0001 0.001 0.01 0.1 1 tau minimum singular value of target model

  9. The Hankel Matrix For any function f : Σ ✍ Ñ R its Hankel matrix H f P R Σ ✍ ✂ Σ ✍ is defined as H f ♣ p , s q ✏ f ♣ p ☎ s q Σ λ a b aa ab ... 1 0.3 0.7 0.05 0.25 . . . λ H 0.3 0.05 0.25 0.02 0.03 . . . a 0.7 0.6 0.1 0.03 0.2 . . . b 0.05 0.02 0.03 0.017 0.003 . . . aa H a 0.25 0.23 0.02 0.11 0.12 . . . ab . . . . . . ... . . . . . . . . . . . . ➓ Blocks defined by sets of rows (prefixes P ) and columns (suffixes S ) ➓ Can parametrize the spectral method by P and S taking H P R P ✂ S ➓ Each pair ♣ P , S q defines a different local loss function

  10. Consistency of the Local Loss Theorem (Schützenberger ’61) rank ♣ H f q ✏ n iff f can be computed with operators A σ P R n ✂ n Consequences ➓ The spectral method is consistent iff rank ♣ H q ✏ rank ♣ H f q ✏ n ➓ There always exist ⑤ P ⑤ ✏ ⑤ S ⑤ ✏ n with rank ♣ H q ✏ n Trade-off ➓ Larger P and S more likely to have rank ♣ H q ✏ n , but also require larger samples for good estimation ♣ H Question ➓ Given a sample, how to choose good P and S ? Answer ➓ Random sampling succeeds w.h.p. with ⑤ P ⑤ and ⑤ S ⑤ depending polynomially on the complexity of the target

  11. Visit us at poster 53

  12. Local Loss Optimization in Operator Models: A New Insight into Spectral Learning Borja Balle , Ariadna Quattoni, Xavier Carreras ICML 2012 June 2012, Edinburgh This work is partially supported by the PASCAL2 Network and a Google Research Award

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend