Random Forest Bagging Bagging or bootstrap aggregation a technique - PowerPoint PPT Presentation

10701 Ensemble of trees: Begging and Random Forest

Bagging • Bagging or bootstrap aggregation a technique for reducing the variance of an estimated prediction function. • For classification, a committee of trees each cast a vote for the predicted class.

Bootstrap The basic idea: randomly draw datasets with replacement from the training data, each sample of the same size

Bagging Create bootstrap samples from the training data M features N examples ....…

Random Forest Classifier Construct a decision tree M features N examples ....…

Bagging tree classifier M features N examples Take the majority vote ....… ....…

Bagging Z = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x N , y N )} Z *b where= 1,.., B.. The prediction at input x when bootstrap sample b is used for training

Bagging Treat the voting Proportions as probabilities Hastie

Random forest classifier Random forest classifier, an extension to bagging which uses a subset of the features rather than the samples.

Random Forest Classifier Training Data M features N examples

Random Forest Classifier Create bootstrap samples from the training data M features N examples ....…

Random Forest Classifier Construct a decision tree M features N examples ....…

Random Forest Classifier At each node in choosing the split feature choose only among m < M features M features N examples ....…

Random Forest Classifier Create decision tree from each bootstrap sample M features N examples ....… ....…

Random Forest Classifier M features N examples Take he majority vote ....… ....…

Random forest for biology TAP GeneExpress Domain GeneExpress Y2H GeneExpress GOProcess N N HMS_PCI SynExpress HMS-PCI Y2H ProteinExpress Y GeneOccur Y GOLocalization GeneExpress ProteinExpress

10-701 Machine Learning Regression

Where we are Inputs Prob- Density √ ability Estimator Inputs Predict √ Classifier category Inputs Predict Regressor Today real no.

Choosing a restaurant • In everyday life we need to make decisions Reviews $ Distance Cuisine score (out of 5 (out of by taking into account lots of factors stars) 10) • The question is what weight we put on each 4 30 21 7 8.5 of these factors (how important are they with 2 15 12 8 7.8 respect to the others). • Assume we would like to build a 5 27 53 9 6.7 recommender system for ranking potential 3 20 5 6 5.4 restaurants based on an individuals’ preferences • If we have many observations we may be able to recover the weights ?

Linear regression • Given an input x we would like to compute an output y • For example: Y - Predict height from age - Predict Google’s price from Yahoo’s price - Predict distance from wall using sensor readings X Note that now Y can be continuous

Linear regression • Given an input x we would like to compute an output y • In linear regression we assume that y and x are related with the Y following equation: Observed values What we are trying to predict y = wx+  where w is a parameter and  X represents measurement or other noise

Linear regression = wx +  y Y • Our goal is to estimate w from a training data of <x i ,y i > pairs • One way to find such relationship is to minimize the a least squares error:  − 2 arg min ( y wx ) w i i X i • Several other approaches can be used as well If the noise is Gaussian • So why least squares? with mean 0 then least squares is also the - minimizes squared distance between maximum likelihood measurements and predicted line estimate of w - has a nice probabilistic interpretation - easy to compute

Solving linear regression using least squares minimization • You should be familiar with this by now … • We just take the derivative w.r.t. to w and set to 0:    − = − −  2 ( y wx ) 2 x ( y wx )  i i i i i w i i  − =  2 x ( y wx ) 0 i i i i   =  2 x y wx i i i i i  x y i i = i w  2 x i i

Regression example • Generated: w=2 • Recovered: w=2.03 • Noise: std=1

Bias term • So far we assumed that the line passes through the origin Y • What if the line does not? • No problem, simply change the model to y = w 0 + w 1 x+  w 0 • Can use least squares to determine w 0 , w 1 X   − − x ( y w ) y w x i i 0 i 1 i = = i w i  w 1 2 x 0 n i i

Bias term • So far we assumed that the line passes through the origin Y • What if the line does not? • No problem, simply change the model to y = w 0 + w 1 x+  Just a second, we will soon w 0 give a simpler solution • Can use least squares to determine w 0 , w 1 X   − − x ( y w ) y w x i i 0 i 1 i = = i w i  w 1 2 x 0 n i i

Multivariate regression • What if we have several inputs? - Stock prices for Yahoo, Microsoft and Ebay for the Google prediction task • This becomes a multivariate linear regression problem • Again, its easy to model: y = w 0 + w 1 x 1 + … + w k x k +  Microsoft’s stock price Google’s stock price Yahoo’s stock price

Multivariate regression • What if we have several inputs? - Stock prices for Yahoo, Microsoft and Ebay for Not all functions can be the Google prediction task approximated using the input • This becomes a multivariate regression problem values directly • Again, its easy to model: y = w 0 + w 1 x 1 + … + w k x k + 

2 +  2 -2x 2 y=10+3x 1 In some cases we would like to use polynomial or other terms based on the input data, are these still linear regression problems? Yes. As long as the coefficients are linear the equation is still a linear regression problem!

Non-Linear basis function • So far we only used the observed values • However, linear regression can be applied in the same way to functions of these values • As long as these functions can be directly computed from the observed values the parameters are still linear in the data and the problem remains a linear regression problem = + + + +  2 2  y w w x w k x 0 1 1 k

Non-Linear basis function • What type of functions can we use? • A few common examples: - Polynomial:  j (x) = x j for j=0 … n  j ( x ) = ( x −  j ) - Gaussian: 2  j 2 1  j ( x ) = - Sigmoid: Any function of the input 1 + exp( − s j x ) values can be used. The ฀ solution for the parameters of the regression remains the same. ฀

General linear regression problem • Using our new notations for the basis function linear regression can n be written as  y = w j  j ( x ) j = 0 Where  j (x) can be either x j for multivariate regression or one of the • non linear basis we defined • Once again we can use ‘least squares’ to find the optimal solution. ฀

LMS for the general linear regression problem k  =  y w ( x ) Our goal is to minimize the following j j = loss function: j 0 ( y i −   2 J (w) = w j  j ( x i ) ) w – vector of dimension k+1  (x i ) – vector of dimension k+1 i j y i – a scaler Moving to vector notations we get: ( y i − w T  ( x i )) 2  J (w) = ฀ i We take the derivative w.r.t w    −  = −   i T i 2 i T i i T ( y w ( x )) 2 ( y w ( x )) ( x )  w ฀ i i  −   =  i T i i T 2 ( y w ( x )) ( x ) 0 Equating to 0 we get i      =   i i T T i i T y ( x ) w ( x ) ( x )     i i

LMS for general linear regression problem ( y i − w T  ( x i )) 2  J (w) = We take the derivative w.r.t w i  ( y i − w T  ( x i )) 2 ( y i − w T  ( x i ))   = 2  ( x i ) T  w i i ฀ ( y i − w T  ( x i ))  ( x i ) T = 0   Equating to 0 we get 2 i    ( x i ) T = w T   ฀  ( x i )  ( x i ) T y i     i i      1 1 1  ( x ) ( x ) ( x ) Define:   0 1 k      2 2 2  ( x ) ( x ) ( x )  = 0 1 k   ฀            n n n    ( x ) ( x ) ( x ) 0 1 k Then deriving w w = (  T  ) − 1  T y we get: ฀

LMS for general linear regression problem ( y i − w T  ( x i )) 2  J (w) = i w = (  T  ) − 1  T y Deriving w we get: ฀ n entries vector k+1 entries vector ฀ n by k+1 matrix This solution is also known as ‘psuedo inverse’

Example: Polynomial regression

A probabilistic interpretation Our least squares minimization solution can also be motivated by a probabilistic in interpretation of the y = w T  ( x ) +  regression problem: The MLE for w in this model ฀ is the same as the solution we derived for least squares criteria: w = (  T  ) − 1  T y ฀

Random Forest Bagging Bagging or bootstrap aggregation a technique - PowerPoint PPT Presentation

10701 Ensemble of trees: Begging and Random Forest Bagging Bagging or bootstrap aggregation a technique for reducing the variance of an estimated prediction function. For classification, a committee of trees each cast a vote for the

Bagging and Boosting Amit Srinet Dave Snyder Outline Bagging Definition Variants Examples

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Lecture 13 Lecture 13 Oct-27-2007 Bagging Bagging Generate T random sample from training

Bagging, Boosting and RANSAC MACHINE LEARNING - 2013 Bootstrap Aggregation Bagging The Main

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

- " " AGGREGATION " " BOOTSTRAP BAGGING FOREST RANDOM BOOSTING

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor Random

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

CS489/698 Lecture 22: March 27, 2017 Bagging and Distributed Computing [RN] Sec. 18.10, [M] Sec.

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

Introduction to Machine Learning Random Forest: Introduction compstat-lmu.github.io/lecture_i2ml

CITIES, HEALTH AND WELL-BEING NOVEMBER 2011 Planning ahead of urbanization: Lessons from

Pilleriine Kamenjuk Anto Aasa Importance of the subject Widen the concept of migration and

SOCIAL DYNAMICS OF ECONOMIC SOCIAL DYNAMICS OF ECONOMIC PERFORMANCE PERFORMANCE INNOVATION AND

in Radiative Transfer Models for Data Assimilation: an Evaluation of Satellite-derived Emissivity

AcSveandPassiveNetworking NetworkAnalysisandVisualizaSon

Research Workshop Series Session 2: Surveys and Focus Groups Dominique Bradley, PhD Nick

BRIAN WERNER PUBLIC INFORMATION OFFICER NORTHERN WATER NORTHERN COLORADO WATER CONFERENCE

Announcements Written Assignment 3 is out. Due Tuesday (or 9am Wednesday) 1 Computer

Random Forest Bagging Bagging or bootstrap aggregation a technique - PowerPoint PPT Presentation

10701 Ensemble of trees: Begging and Random Forest Bagging Bagging or bootstrap aggregation a technique for reducing the variance of an estimated prediction function. For classification, a committee of trees each cast a vote for the

Bagging and Boosting Amit Srinet Dave Snyder Outline Bagging Definition Variants Examples

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

Lecture 13 Lecture 13 Oct-27-2007 Bagging Bagging Generate T random sample from training

Bagging, Boosting and RANSAC MACHINE LEARNING - 2013 Bootstrap Aggregation Bagging The Main

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

- &quot; &quot; AGGREGATION &quot; &quot; BOOTSTRAP BAGGING FOREST RANDOM BOOSTING

Introd u ction to Random Forest TR E E - BASE D MOD E L S IN R Erin LeDell Instr u ctor Random

U.S. Forest Service Forest Service U.S. Forest Inventory and Analysis Forest Service Research

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

CS489/698 Lecture 22: March 27, 2017 Bagging and Distributed Computing [RN] Sec. 18.10, [M] Sec.

Epping Forest Arts Epping Forest Arts Epping Forest Councils Epping Forest Councils Arts

Forest management associations Forest owners own associations Forest Management Association is

CURRENT U.S. FOREST DATA AND MAPS Forest age FIA MapMaker Forest ownership TPO Data CURRENT

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

Introduction to Machine Learning Random Forest: Introduction compstat-lmu.github.io/lecture_i2ml

CITIES, HEALTH AND WELL-BEING NOVEMBER 2011 Planning ahead of urbanization: Lessons from

Pilleriine Kamenjuk Anto Aasa Importance of the subject Widen the concept of migration and

SOCIAL DYNAMICS OF ECONOMIC SOCIAL DYNAMICS OF ECONOMIC PERFORMANCE PERFORMANCE INNOVATION AND

in Radiative Transfer Models for Data Assimilation: an Evaluation of Satellite-derived Emissivity

AcSveandPassiveNetworking NetworkAnalysisandVisualizaSon

Research Workshop Series Session 2: Surveys and Focus Groups Dominique Bradley, PhD Nick

BRIAN WERNER PUBLIC INFORMATION OFFICER NORTHERN WATER NORTHERN COLORADO WATER CONFERENCE

Announcements Written Assignment 3 is out. Due Tuesday (or 9am Wednesday) 1 Computer

- " " AGGREGATION " " BOOTSTRAP BAGGING FOREST RANDOM BOOSTING