Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 - PowerPoint PPT Presentation

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Outline Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Combining Models • Motivation: let’s say we have a number of models for a problem • e.g. Regression with polynomials (different degree) • e.g. Classification with support vector machines (kernel type, parameters) • Often, improved performance can be obtained by combining different models. • But how do we combine classifiers?

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Why Combining Works Intuitively, two reasons. 1. Portfolio Diversification : if you combine options that on average perform equally well, you keep the same average performance but you lower your risk— variance reduction . • E.g., invest in Gold and in Equities. 2. The Boosting Theorem from computational learning theory.

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Probably Approximately Correct Learning 1. We have discussed generalization error in terms of the expected error wrt a random test set. 2. PAC learning considers the worst-case error wrt a random test set. • Guarantees bounds on test error. 3. Intuitively, a PAC guarantee works like this, for a given learning problem: • The theory specifies a sample size n , s.t. • after seeing n i.i.d. data points, with high probability ( 1 − δ ), a classifier with training error 0 will have test error no greater than ε on any test set. • Leslie Valiant, Turing Award 2011.

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function The Boosting Theorem • Suppose you have a learning algorithm L with a PAC guarantee that is guaranteed to have test accuracy at least 50%. • Then you can repeatedly run L and combine the resulting classifiers in such a way that with high confidence you can achieve any desired degree of accuracy <100%.

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Committees • A combination of models is often called a committee • Simplest way to combine models is to just average them together: M y COM ( x ) = 1 � y m ( x ) M m = 1 • It turns out this simple method is better than (or same as) the individual models on average (in expectation) • And usually slightly better • Example: If the errors of 5 classifiers are independent , then averaging predictions reduces an error rate of 10% to 1%!

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Error of Individual Models • Consider individual models y m ( x ) , assume they can be written as true value plus error: y m ( x ) = h ( x ) + ǫ m ( x ) • Exercise: Show that the expected value of the error of an individual model is: E x [ { y m ( x ) − h ( x ) } 2 ] = E x [ ǫ m ( x ) 2 ] • The average error made by an individual model is then: M E AV = 1 � E x [ ǫ m ( x ) 2 ] M m = 1

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Error of Committee • Similarly, the committee M y COM ( x ) = 1 � y m ( x ) M m = 1 has expected error  � 2  �� M 1 � E COM = E x y m ( x ) − h ( x )   M m = 1  � 2  �� M 1 � = E x h ( x ) + ǫ m ( x ) − h ( x )   M m = 1  � 2   � 2  �� M M 1 � 1 �  = E x = E x ǫ m ( x ) + h ( x ) − h ( x ) ǫ m ( x )    M M m = 1 m = 1

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Committee Error vs. Individual Error • Multiplying out the inner sum over m , the committee error is  � 2  � M M M 1  = 1 � � � E COM = E x ǫ m ( x ) E x [ ǫ m ( x ) ǫ n ( x )]  M 2 M m = 1 m = 1 n = 1 • If we assume errors are uncorrelated, E x [ ǫ m ( x ) ǫ n ( x )] = 0 when m � = n , then: M E COM = 1 = 1 � � ǫ m ( x ) 2 � E x ME AV M 2 m = 1 • However, errors are rarely uncorrelated • For example, if all errors are the same, ǫ m ( x ) = ǫ n ( x ) , then E COM = E AV • Using Jensen’s inequality (convex functions), can show E COM ≤ E AV

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Enlarging the Hypothesis space – – – – – – – – – – – – – – – + – – – + – + + – + + + – – + + – – + + + + + – – – – – – – – – – – – – – – – – • Classifier committees are more expressive than a single classifier. • Example: classify as positive if all three threshold classifiers classify as positive. • Figure Russell and Norvig 18.32.

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Outline Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 - PowerPoint PPT Presentation

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential

Validity-preservation properties of rules for combining inferential models combining

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Combining Local and Global History for High Combining Local and Global History for High

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

The Calculus of Computation: Decision Procedures with 10. Combining Decision Procedures

Combining Images Combining Images Blending Seam Carving Corner Detection Today:

Combining Clustering with Pattern Combining Clustering with Pattern Matching for Architecture

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining models of ratings and

Combining Physical and Statistical Models in Order to Narrow Uncertainty in Projections of Global

Combining Old and New Systems in Existing Buildings and Other Retrofit Tales By Paul Jewett CFAA

Unsupervised Learning There is no direct ground truth for the quantity of interest

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

14.1 Review From the last lecture, we have the following general formulation for learning

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal:

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

A Comprehensive Study of Deep Learning for Side-Channel Analysis c Masure 1,3 ecile Dumas 1

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 - PowerPoint PPT Presentation

Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential Loss Function Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory Boosting Derivation of Adaboost from the Exponential

Validity-preservation properties of rules for combining inferential models combining

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

General Schemes of Combining Rules and the Quality Characteristics of Combining Alexander Lepskiy

Combining GLM and ABI Data for Enhanced GOES-R Rainfall Estimates A New GOES-R3 Project (combining

Combining Local and Global History for High Combining Local and Global History for High

Combining XML querying Combining XML querying with ontology reasoning: with ontology reasoning:

Reducing Label Cost by Combining Feature Labels and Crowdsourcing Combining Learning Strategies

Combining Point and Line Samples for Direct Illumination Points only Points + Lines Katherine

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

The Calculus of Computation: Decision Procedures with 10. Combining Decision Procedures

Combining Images Combining Images Blending Seam Carving Corner Detection Today:

Combining Clustering with Pattern Combining Clustering with Pattern Matching for Architecture

CSE 255 Lecture 6 Data Mining and Predictive Analytics Combining models of ratings and

Combining Physical and Statistical Models in Order to Narrow Uncertainty in Projections of Global

Combining Old and New Systems in Existing Buildings and Other Retrofit Tales By Paul Jewett CFAA

Unsupervised Learning There is no direct ground truth for the quantity of interest

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

14.1 Review From the last lecture, we have the following general formulation for learning

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal:

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

A Comprehensive Study of Deep Learning for Side-Channel Analysis c Masure 1,3 ecile Dumas 1

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know