learning with structured output spaces
play

Learning with Structured Output Spaces Keerthiram Murugesan - PowerPoint PPT Presentation

Learning with Structured Output Spaces Keerthiram Murugesan Standard Predic,on Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount


  1. Learning with Structured Output Spaces Keerthiram Murugesan

  2. Standard Predic,on • Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount equal to the y gross national product of Switzerland. 1 GATACAACCTATCCCCGTATATATATTCTA Microsoft officials stated that they first x TGGGTATAGTATTAAATCAATACAACCTAT y wanted to buy Switzerland, but eventually CCCCGTATATATATTCTATGGGTATAGTAT -1 were turned off by the mountains and the TAAATCAATACAACCTATCCCCGTATATAT snowy winters… y 7.3 ATTCTATGGGTATAGTATTAAATCAGATAC AACCTATCCCCGTATATATATTCTATGGGT ATAGTATTAAATCACATTTA (typically Y is “simple”)

  3. ��� � � ��� � � � ��� � � � � �� � � �� �� � �� �� � � � ��� � � � ��� � �� � ��� � �� � � � �� � ��� � � � ��� � � � ��� � �� � � �� �� �� � � Structured Predic,on X Y Y Y X X APPGEAYLQPGEAYLQV The dog chased the cat. [Obama]running S presidential election in the [presidental election] has position NP VP mobilized [ many this group young voters]. many young voters [His][position] on Det N V NP [climate change] Obama was well received His climate change by [this group]. Det N X X Y � Y � Y � Y Conservation Reservoir !"#$0' !"#$%&'()*'+,'-.%' 4' 5' !"#$%&'/)*'+,'-.%' Corridors 12+30' � Y � � Y � � Y �

  4. � � �� � � �� �� �� � � ��� � ��� � � � ��� � � Talk Overview • Structured Predic8on (Quick Review) The dog chased the cat. S – Conven8onal Approach NP VP Det N V NP Det N • Structured Predic8on Cascades – Ensemble Cascades � Y • Ensemble learning for Structured Predic8on – Online algorithm – Boos8ng-style algorithm � Y �

  5. Structured Predic8on

  6. Structured Output Spaces • Input: x • Predict: y Y(x) Structured! ∈ • Quality determined by u8lity func8on Scoring • Conven,onal Approach: func,on – Train: learn model U(x,y) of u8lity – Test: predict via h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) Can be challenging

  7. Example: Sequence Predic8on • Part-of-Speech Tagging h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) – Given a sequence of words x – Predict sequence of tags y. y x Det V Det N The rain wet the cat N y Det V V Adj V y Adv N V V Det …

  8. Example: Sequence Predic8on • MAP inference in 1-st order Markov models 1 st order dynamics … y 1 y 2 y 3 y 4 … x 1 x 3 x 2 x 4 Similar models include CRFs, Kalman Filters, Linear Dynamical Systems, etc.

  9. Example: Sequence Predic8on y 1 y 2 y 3 y 4 x 1 x 2 x 3 x 4 n • Utility function: ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Sum over maximal cliques t = 1 n ∑ • Prediction: h ( x ) = argmax u ( x t , y t , y t − 1 ) y t = 1 Dynamic Programming

  10. Scoring function as Linear Models • U/u is parameterized linearly: ∑ U ( x , y ; θ ) = u ( x t , y t , y t − 1 ; θ ) t Some feature u ( x , y 1 , y 2 ; θ ) = θ T f ( x , y 1 , y 2 ) representa,on θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t Dynamic Programming

  11. Feature representa8on

  12. Generalizing to Other Structures • From last slide: θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t • General Formulation: ∑ Ψ ( x , y ) = f ( x t , y t , y t − 1 ) • Viterbi t • CKY Parsing θ T Ψ ( x , y ) h ( x ; θ ) = argmax • Sorting • Belief Propagation y • Integer Programming

  13. Learning Se]ng λ 2 + ℓ y , h ( x ; θ ) ∑ ( ) argmin 2 θ h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) θ ( x , y ) Regulariza,on Loss Func,on • Generaliza8on of Conven8onal Se]ngs – Hinge loss = Structural SVMs – Log-loss = Condi8onal Random Fields – Gradient Descent, Cu]ng Plane, etc… • Requires running inference during training

  14. Restric8on: Increased Complexity

  15. Restric8on: Pre-specified Structure h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Learn a (linearly) parameterized U – Such that h(x) gives good predic8ons • What if U is “wrong”? – Known to not be consistent – Infinite training data ≠ converging to best model

  16. Summary: Structured Predic8on h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Conven8onal Approach – Specify structure & inference procedure – Train parameters on training set {(x,y)} • Limita,ons: – Run,me propor,onal to Model Complexity – Structure Mismatch & Inconsistency

  17. Structured Predic8on Cascades

  18. Classifier Cascades (Face Classifier)

  19. Classifier Cascades

  20. Tradeoffs in Cascaded Learning • Accuracy : Minimize the number of errors incurred by each level • Efficiency : Maximize the number of filtered assignments at each level

  21. Structured Predic8on Cascades

  22. Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) Remember Sum over Adj N Cliques? ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Y k-1 Y k c ∈ C • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

  23. Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) How do we know this Adj N assignment is good or bad? 1. Score 2. Threshold Y k-1 Y k • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

  24. Max-marginal score (sequence models)

  25. Threshold (t)

  26. Threshold (t)

  27. Threshold (t)

  28. Learning θ at each cascade level

  29. Online learning

  30. Structured Predic8on Ensembles

  31. Ensemble Learning h 1 h 2 h 3 h p face face no face face Goal: Combine these output from mul8ple models / hypotheses / experts: 1) Majority Vo8ng 2) Linear combina8on of hypotheses/experts 3) Boos8ng, etc

  32. Weighted Majority Algorithm

  33. Ensemble learning for Structured Predic8on h 1 h 2 h 3 h p . . . h 1 1 h 1 2 . . . h 1 l h p 1 h p 2 . . . h p l h 1 Adv N V V Det

  34. Example: Sequence Model

  35. Weighted Majority Algorithm for Structured Predic8on Ensembles

  36. Ensemble output from Weighted Majority Algorithm • Given W 1 , W 2 , … W T

  37. Boos8ng for Structured Predic8on Ensembles

  38. Ensemble output from Boos8ng • Given the base learners h 1 , h 2 , … h T, : • Note h 1 , h 2 , … h T are different from h 1 , h 2 , … h P

  39. • THE END

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend