The strength of weak models EN S EMBLE METH ODS IN P YTH ON Romn - - PowerPoint PPT Presentation

the strength of weak models
SMART_READER_LITE
LIVE PREVIEW

The strength of weak models EN S EMBLE METH ODS IN P YTH ON Romn - - PowerPoint PPT Presentation

The strength of weak models EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data Scientist, SAP / Agile Solutions "Weak" model Voting and Averaging: Small number of estimators Fine-tuned estimators Individually trained


slide-1
SLIDE 1

The strength of “weak” models

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-2
SLIDE 2

ENSEMBLE METHODS IN PYTHON

"Weak" model

Voting and Averaging: Small number of estimators Fine-tuned estimators Individually trained New concept: "weak" estimator

slide-3
SLIDE 3

ENSEMBLE METHODS IN PYTHON

slide-4
SLIDE 4

ENSEMBLE METHODS IN PYTHON

Properties of "weak" models

Weak estimator Performance better than random guessing Light model Low training and evaluation time Example: Decision Tree

slide-5
SLIDE 5

ENSEMBLE METHODS IN PYTHON

Examples of "weak" models

Some "weak" models: Decision tree: small depth Logistic Regression Linear Regression Other restricted models Sample code:

model = DecisionTreeClassifier( max_depth=3 ) model = LogisticRegression( max_iter=50, C=100.0 ) model = LinearRegression( normalize=False )

slide-6
SLIDE 6

Let's practice!

EN S EMBLE METH ODS IN P YTH ON

slide-7
SLIDE 7

Bootstrap aggregating

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-8
SLIDE 8

ENSEMBLE METHODS IN PYTHON

Heterogeneous vs Homogeneous Ensembles

Heterogeneous: Different algorithms (ne-tuned) Small amount of estimators Voting, Averaging, and Stacking Homogeneous: The same algorithm ("weak" model) Large amount of estimators Bagging and Boosting

slide-9
SLIDE 9

ENSEMBLE METHODS IN PYTHON

Condorcet's Jury Theorem

Requirements: Models are independent Each model performs better than random guessing All individual models have similar performance Conclusion: Adding more models improves the performance of the ensemble (Voting or Averaging), and this approaches 1 (100%) Marquis de Condorcet, French philosopher and mathematician

slide-10
SLIDE 10

ENSEMBLE METHODS IN PYTHON

Bootstrapping

Bootstrapping requires: Random subsamples Using replacement Bootstrapping guarantees: Diverse crowd: different datasets Independent: separately sampled

slide-11
SLIDE 11

ENSEMBLE METHODS IN PYTHON

Pros and cons of bagging

Pros Bagging usually reduces variance Overtting can be avoided by the ensemble itself More stability and robustness Cons It is computationally expensive

slide-12
SLIDE 12

It's time to practice!

EN S EMBLE METH ODS IN P YTH ON

slide-13
SLIDE 13

BaggingClassier: nuts and bolts

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-14
SLIDE 14

ENSEMBLE METHODS IN PYTHON

Heterogeneous vs Homogeneous Functions

Heterogeneous Ensemble Function

het_est = HeterogeneousEnsemble( estimators=[('est1', est1), ('est2', est2), ...], # additional parameters )

Homogeneous Ensemble Function

hom_est = HomogeneousEnsemble( base_estimator=est_base, n_estimators=chosen_number, # additional parameters )

slide-15
SLIDE 15

ENSEMBLE METHODS IN PYTHON

BaggingClassier

Bagging Classier example:

# Instantiate the base estimator ("weak" model) clf_dt = DecisionTreeClassifier(max_depth=3) # Build the Bagging classifier with 5 estimators clf_bag = BaggingClassifier( base_estimator=clf_dt, n_estimators=5 ) # Fit the Bagging model to the training set clf_bag.fit(X_train, y_train) # Make predictions on the test set y_pred = clf_bag.predict(X_test)

slide-16
SLIDE 16

ENSEMBLE METHODS IN PYTHON

BaggingRegressor

Bagging Regressor example:

# Instantiate the base estimator ("weak" model) reg_lr = LinearRegression(normalize=False) # Build the Bagging regressor with 10 estimators reg_bag = BaggingRegressor( base_estimator=reg_lr ) # Fit the Bagging model to the training set reg_bag.fit(X_train, y_train) # Make predictions on the test set y_pred = reg_bag.predict(X_test)

slide-17
SLIDE 17

ENSEMBLE METHODS IN PYTHON

Out-of-bag score

Calculate the individual predictions using all estimators for which an instance was out of the sample Combine the individual predictions Evaluate the metric on those predictions: Classication: accuracy Regression: R^2

clf_bag = BaggingClassifier( base_estimator=clf_dt,

  • ob_score=True

) clf_bag.fit(X_train, y_train) print(clf_bag.oob_score_) 0.9328125 pred = clf_bag.predict(X_test) print(accuracy_score(y_test, pred)) 0.9625

slide-18
SLIDE 18

Now it's your turn!

EN S EMBLE METH ODS IN P YTH ON

slide-19
SLIDE 19

Bagging parameters: tips and tricks

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-20
SLIDE 20

ENSEMBLE METHODS IN PYTHON

Basic parameters for bagging

BASIC PARAMETERS

base_estimator n_estimators

  • ob_score

est_bag.oob_score_

slide-21
SLIDE 21

ENSEMBLE METHODS IN PYTHON

Additional parameters for bagging

ADDITIONAL PARAMETERS

max_samples : the number of samples to draw for each estimator. max_features : the number of features to draw for each estimator.

Classication ~ sqrt(number_of_features) Regression ~ number_of_features / 3

bootstrap : whether samples are drawn with replacement.

True --> max_samples = 1.0 False --> max_samples < 1.0

slide-22
SLIDE 22

ENSEMBLE METHODS IN PYTHON

Random forest

Classication

from sklearn.ensemble import RandomForestClassifier clf_rf = RandomForestClassifier( # parameters... )

Regression

from sklearn.ensemble import RandomForestRegressor reg_rf = RandomForestRegressor( # parameters... )

Bagging parameters:

n_estimators max_features

  • ob_score

Tree-specic parameters:

max_depth min_samples_split min_samples_leaf class_weight ( “balanced” )

slide-23
SLIDE 23

ENSEMBLE METHODS IN PYTHON

Bias-variance tradeoff

slide-24
SLIDE 24

Let's practice!

EN S EMBLE METH ODS IN P YTH ON