AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH - - PowerPoint PPT Presentation

adaboost
SMART_READER_LITE
LIVE PREVIEW

AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH - - PowerPoint PPT Presentation

AdaBoost MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON Elie Kawerk Data Scientist Boosting Boosting : Ensemble method combining several weak learners to form a strong learner. Weak learner : Model doing slightly better than random


slide-1
SLIDE 1

AdaBoost

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON

Elie Kawerk

Data Scientist

slide-2
SLIDE 2

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Boosting

Boosting: Ensemble method combining several weak learners to form a strong learner. Weak learner: Model doing slightly better than random guessing. Example of weak learner: Decision stump (CART whose maximum depth is 1).

slide-3
SLIDE 3

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Boosting

Train an ensemble of predictors sequentially. Each predictor tries to correct its predecessor. Most popular boosting methods: AdaBoost, Gradient Boosting.

slide-4
SLIDE 4

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Adaboost

Stands for Adaptive Boosting. Each predictor pays more attention to the instances wrongly predicted by its predecessor. Achieved by changing the weights of training instances. Each predictor is assigned a coefcient α.

α depends on the predictor's training error.

slide-5
SLIDE 5

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

AdaBoost: Training

slide-6
SLIDE 6

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Learning Rate

Learning rate: 0 < η ≤ 1

slide-7
SLIDE 7

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

AdaBoost: Prediction

Classication: Weighted majority voting. In sklearn: AdaBoostClassifier . Regression: Weighted average. In sklearn: AdaBoostRegressor .

slide-8
SLIDE 8

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

AdaBoost Classication in sklearn (Breast Cancer dataset)

# Import models and utility functions from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import roc_auc_score from sklearn.model_selection import train_test_split # Set seed for reproducibility SEED = 1 # Split data into 70% train and 30% test X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=SEED)

slide-9
SLIDE 9

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Instantiate a classification-tree 'dt' dt = DecisionTreeClassifier(max_depth=1, random_state=SEED) # Instantiate an AdaBoost classifier 'adab_clf' adb_clf = AdaBoostClassifier(base_estimator=dt, n_estimators=100) # Fit 'adb_clf' to the training set adb_clf.fit(X_train, y_train) # Predict the test set probabilities of positive class y_pred_proba = adb_clf.predict_proba(X_test)[:,1] # Evaluate test-set roc_auc_score adb_clf_roc_auc_score = roc_auc_score(y_test, y_pred_proba)

slide-10
SLIDE 10

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

AdaBoost Classication in sklearn (Breast Cancer dataset)

# Print adb_clf_roc_auc_score print('ROC AUC score: {:.2f}'.format(adb_clf_roc_auc_score)) ROC AUC score: 0.99

slide-11
SLIDE 11

Let's practice!

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON

slide-12
SLIDE 12

Gradient Boosting (GB)

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON

Elie Kawerk

Data Scientist

slide-13
SLIDE 13

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Gradient Boosted Trees

Sequential correction of predecessor's errors. Does not tweak the weights of training instances. Fit each predictor is trained using its predecessor's residual errors as labels. Gradient Boosted Trees: a CART is used as a base learner.

slide-14
SLIDE 14

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Gradient Boosted Trees for Regression: Training

slide-15
SLIDE 15

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Shrinkage

slide-16
SLIDE 16

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Gradient Boosted Trees: Prediction

Regression:

y = y + ηr + ... + ηr

In sklearn: GradientBoostingRegressor . Classication: In sklearn: GradientBoostingClassifier .

pred 1 1 N

slide-17
SLIDE 17

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Gradient Boosting in sklearn (auto dataset)

# Import models and utility functions from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error as MSE # Set seed for reproducibility SEED = 1 # Split dataset into 70% train and 30% test X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=SEED)

slide-18
SLIDE 18

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON # Instantiate a GradientBoostingRegressor 'gbt' gbt = GradientBoostingRegressor(n_estimators=300, max_depth=1, random_state=SEED) # Fit 'gbt' to the training set gbt.fit(X_train, y_train) # Predict the test set labels y_pred = gbt.predict(X_test) # Evaluate the test set RMSE rmse_test = MSE(y_test, y_pred)**(1/2) # Print the test set RMSE print('Test set RMSE: {:.2f}'.format(rmse_test)) Test set RMSE: 4.01

slide-19
SLIDE 19

Let's practice!

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON

slide-20
SLIDE 20

Stochastic Gradient Boosting (SGB)

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON

Elie Kawerk

Data Scientist

slide-21
SLIDE 21

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Gradient Boosting: Cons

GB involves an exhaustive search procedure. Each CART is trained to nd the best split points and features. May lead to CARTs using the same split points and maybe the same features.

slide-22
SLIDE 22

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Stochastic Gradient Boosting

Each tree is trained on a random subset of rows of the training data. The sampled instances (40%-80% of the training set) are sampled without replacement. Features are sampled (without replacement) when choosing split points. Result: further ensemble diversity. Effect: adding further variance to the ensemble of trees.

slide-23
SLIDE 23

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Stochastic Gradient Boosting: Training

slide-24
SLIDE 24

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Stochastic Gradient Boosting in sklearn (auto dataset)

# Import models and utility functions from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error as MSE # Set seed for reproducibility SEED = 1 # Split dataset into 70% train and 30% test X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=SEED)

slide-25
SLIDE 25

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Stochastic Gradient Boosting in sklearn (auto dataset)

# Instantiate a stochastic GradientBoostingRegressor 'sgbt' sgbt = GradientBoostingRegressor(max_depth=1, subsample=0.8, max_features=0.2, n_estimators=300, random_state=SEED) # Fit 'sgbt' to the training set sgbt.fit(X_train, y_train) # Predict the test set labels y_pred = sgbt.predict(X_test)

slide-26
SLIDE 26

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Stochastic Gradient Boosting in sklearn (auto dataset)

# Evaluate test set RMSE 'rmse_test' rmse_test = MSE(y_test, y_pred)**(1/2) # Print 'rmse_test' print('Test set RMSE: {:.2f}'.format(rmse_test)) Test set RMSE: 3.95

slide-27
SLIDE 27

Let's practice!

MACH IN E LEARN IN G W ITH TREE-BAS ED MODELS IN P YTH ON