Learning with Structured Output Spaces Keerthiram Murugesan

Standard Predic,on • Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount equal to the y gross national product of Switzerland. 1 GATACAACCTATCCCCGTATATATATTCTA Microsoft officials stated that they first x TGGGTATAGTATTAAATCAATACAACCTAT y wanted to buy Switzerland, but eventually CCCCGTATATATATTCTATGGGTATAGTAT -1 were turned off by the mountains and the TAAATCAATACAACCTATCCCCGTATATAT snowy winters… y 7.3 ATTCTATGGGTATAGTATTAAATCAGATAC AACCTATCCCCGTATATATATTCTATGGGT ATAGTATTAAATCACATTTA (typically Y is “simple”)

�� Structured Predic,on X Y Y Y X X APPGEAYLQPGEAYLQV The dog chased the cat. [Obama]running S presidential election in the [presidental election] has position NP VP mobilized [ many this group young voters]. many young voters [His][position] on Det N V NP [climate change] Obama was well received His climate change by [this group]. Det N X X Y � Y � Y � Y Conservation Reservoir !"#$0' !"#$%&'()*'+,'-.%' 4' 5' !"#$%&'/)*'+,'-.%' Corridors 12+30' � Y � � Y � � Y �

� � �� Talk Overview • Structured Predic8on (Quick Review) The dog chased the cat. S – Conven8onal Approach NP VP Det N V NP Det N • Structured Predic8on Cascades – Ensemble Cascades � Y • Ensemble learning for Structured Predic8on – Online algorithm – Boos8ng-style algorithm � Y �

Structured Predic8on

Structured Output Spaces • Input: x • Predict: y Y(x) Structured! ∈ • Quality determined by u8lity func8on Scoring • Conven,onal Approach: func,on – Train: learn model U(x,y) of u8lity – Test: predict via h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) Can be challenging

Example: Sequence Predic8on • Part-of-Speech Tagging h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) – Given a sequence of words x – Predict sequence of tags y. y x Det V Det N The rain wet the cat N y Det V V Adj V y Adv N V V Det …

Example: Sequence Predic8on • MAP inference in 1-st order Markov models 1 st order dynamics … y 1 y 2 y 3 y 4 … x 1 x 3 x 2 x 4 Similar models include CRFs, Kalman Filters, Linear Dynamical Systems, etc.

Example: Sequence Predic8on y 1 y 2 y 3 y 4 x 1 x 2 x 3 x 4 n • Utility function: ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Sum over maximal cliques t = 1 n ∑ • Prediction: h ( x ) = argmax u ( x t , y t , y t − 1 ) y t = 1 Dynamic Programming

Scoring function as Linear Models • U/u is parameterized linearly: ∑ U ( x , y ; θ ) = u ( x t , y t , y t − 1 ; θ ) t Some feature u ( x , y 1 , y 2 ; θ ) = θ T f ( x , y 1 , y 2 ) representa,on θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t Dynamic Programming

Feature representa8on

Generalizing to Other Structures • From last slide: θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t • General Formulation: ∑ Ψ ( x , y ) = f ( x t , y t , y t − 1 ) • Viterbi t • CKY Parsing θ T Ψ ( x , y ) h ( x ; θ ) = argmax • Sorting • Belief Propagation y • Integer Programming

Learning Se]ng λ 2 + ℓ y , h ( x ; θ ) ∑ ( ) argmin 2 θ h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) θ ( x , y ) Regulariza,on Loss Func,on • Generaliza8on of Conven8onal Se]ngs – Hinge loss = Structural SVMs – Log-loss = Condi8onal Random Fields – Gradient Descent, Cu]ng Plane, etc… • Requires running inference during training

Restric8on: Increased Complexity

Restric8on: Pre-specified Structure h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Learn a (linearly) parameterized U – Such that h(x) gives good predic8ons • What if U is “wrong”? – Known to not be consistent – Infinite training data ≠ converging to best model

Summary: Structured Predic8on h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Conven8onal Approach – Specify structure & inference procedure – Train parameters on training set {(x,y)} • Limita,ons: – Run,me propor,onal to Model Complexity – Structure Mismatch & Inconsistency

Structured Predic8on Cascades

Classifier Cascades (Face Classifier)

Classifier Cascades

Tradeoffs in Cascaded Learning • Accuracy : Minimize the number of errors incurred by each level • Efficiency : Maximize the number of filtered assignments at each level

Structured Predic8on Cascades

Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) Remember Sum over Adj N Cliques? ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Y k-1 Y k c ∈ C • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) How do we know this Adj N assignment is good or bad? 1. Score 2. Threshold Y k-1 Y k • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

Max-marginal score (sequence models)

Threshold (t)

Learning θ at each cascade level

Online learning

Structured Predic8on Ensembles

Ensemble Learning h 1 h 2 h 3 h p face face no face face Goal: Combine these output from mul8ple models / hypotheses / experts: 1) Majority Vo8ng 2) Linear combina8on of hypotheses/experts 3) Boos8ng, etc

Weighted Majority Algorithm

Ensemble learning for Structured Predic8on h 1 h 2 h 3 h p . . . h 1 1 h 1 2 . . . h 1 l h p 1 h p 2 . . . h p l h 1 Adv N V V Det

Example: Sequence Model

Weighted Majority Algorithm for Structured Predic8on Ensembles

Ensemble output from Weighted Majority Algorithm • Given W 1 , W 2 , … W T

Boos8ng for Structured Predic8on Ensembles

Ensemble output from Boos8ng • Given the base learners h 1 , h 2 , … h T, : • Note h 1 , h 2 , … h T are different from h 1 , h 2 , … h P

• THE END

Learning with Structured Output Spaces Keerthiram Murugesan - PowerPoint PPT Presentation

Learning with Structured Output Spaces Keerthiram Murugesan Standard Predic,on Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

16. Recursion 2 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20 Output: 12

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Training Strategies CS 6355: Structured Prediction 1 So far we saw What is structured output

Structured Probability Spaces Guy Van den Broeck Southern California Machine Learning Symposium

A two-step method to incorporate task features spaces for large output spaces Michiel Stock

Structured Probability Spaces Guy Van den Broeck UCLA Stats Seminar Jan 17, 2017 Outline 1.

Calibrating the Calibrating the Output of a Linear Output of a Linear Output of a Linear

BASIC INPUT/OUTPUT Fundamentals of Computer Science Outline: Basic Input/Output Screen Output

Structured Output Learning with Indirect Supervision Ming-Wei Chang , Vivek Srikumar, Dan

Psychological Abuse NCEA Elder Abuse Presentation: Psychological Abuse www.ncea.aoa.gov 1

Programming Models and Runtime Systems for Heterogeneous Architectures Sylvain Henry

ENVIRONMENTAL GEOTECHNICS CE-488 Lecture No. 18 Prof. D N Singh Department of Civil Engineering

5/25/2016 T he Gua rdia n a d L ite m Wo rking with the Yo ung Child Ne b ra ska Yo ung

Functions Function: Unit of operation Functions o A series of statements grouped together

Frequent Pattern Mining Christian Borgelt Dept. of Mathematics / Dept. of Computer Sciences

Introduction to Matlab Marco Chiarandini Department of Mathematics & Computer Science

Lecture 25: Natural Language Processing with Neural Nets Julia Hockenmaier April 2019