Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON - - PowerPoint PPT Presentation

introduction to ensemble methods
SMART_READER_LITE
LIVE PREVIEW

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON - - PowerPoint PPT Presentation

Introduction to ensemble methods EN S EMBLE METH ODS IN P YTH ON Romn de las Heras Data Scientist, SAP / Agile Solutions Choosing the best model ENSEMBLE METHODS IN PYTHON Surveys ENSEMBLE METHODS IN PYTHON Prerequisite knowledge


slide-1
SLIDE 1

Introduction to ensemble methods

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-2
SLIDE 2

ENSEMBLE METHODS IN PYTHON

Choosing the best model

slide-3
SLIDE 3

ENSEMBLE METHODS IN PYTHON

Surveys

slide-4
SLIDE 4

ENSEMBLE METHODS IN PYTHON

Prerequisite knowledge

Supervised Learning with scikit-learn Machine Learning with Tree-Based Models in Python Linear Classiers in Python

slide-5
SLIDE 5

ENSEMBLE METHODS IN PYTHON

Technologies

scikit-learn numpy pandas seaborn

from sklearn.ensemble import MetaEstimator # Base estimators est1 = Model1() est2 = Model2() estN = ModelN() # Meta estimator est_combined = MetaEstimator( estimators=[est1, est2, ..., estN], # Additional parameters ) # Train and test est_combined.fit(X_train, y_train) pred = est_combined.predict(X_test)

slide-6
SLIDE 6

Learners, ensemble!

EN S EMBLE METH ODS IN P YTH ON

slide-7
SLIDE 7

Voting

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-8
SLIDE 8

ENSEMBLE METHODS IN PYTHON

Ask the audience

Wisdom of the crowd Collective intelligence Large group of individuals >= Single expert Problem solving Decision making Innovation Prediction

slide-9
SLIDE 9

ENSEMBLE METHODS IN PYTHON

Majority voting

Properties Classication problems Majority Voting: Mode Odd number of classiers (3+) Wise Crowd Characteristics: Diverse: different algorithms or datasets Independent and uncorrelated Use individual knowledge Aggregate individual predictions

slide-10
SLIDE 10

ENSEMBLE METHODS IN PYTHON

Voting ensemble using scikit-learn

from sklearn.ensemble import VotingClassifier clf_voting = VotingClassifier( estimators=[ ('label1', clf_1), ('label2', clf_2), ('labelN', clf_N)])

Evaluate the performance

# Get the accuracy score acc = accuracy_score(y_test, y_pred) print("Accuracy: {:0.3f}".format(acc)) Accuracy: 0.938 # Create the individual models clf_knn = KNeighborsClassifier(5) clf_dt = DecisionTreeClassifier() clf_lr = LogisticRegression() # Create voting classifier clf_voting = VotingClassifier( estimators=[ ('knn', clf_knn), ('dt', clf_dt), ('lr', clf_lr)]) # Fit it to the training set and predict clf_voting.fit(X_train, y_train) y_pred = clf_voting.predict(X_test)

slide-11
SLIDE 11

Let's give it a try!

EN S EMBLE METH ODS IN P YTH ON

slide-12
SLIDE 12

Averaging

EN S EMBLE METH ODS IN P YTH ON

Román de las Heras

Data Scientist, SAP / Agile Solutions

slide-13
SLIDE 13

ENSEMBLE METHODS IN PYTHON

Counting Jelly Beans

How to provide a good estimate? Guessing (random number) Volume approximation Many more approaches Actual Value ~ mean(estimates)

slide-14
SLIDE 14

ENSEMBLE METHODS IN PYTHON

Averaging (Soft Voting)

Properties Classication & Regression problems Soft Voting: Mean Regression: mean of predicted values Classication: mean of predicted probabilities Need at least 2 estimators

slide-15
SLIDE 15

ENSEMBLE METHODS IN PYTHON

Averaging ensemble with scikit-learn

Averaging Classier

from sklearn.ensemble import VotingClassifier clf_voting = VotingClassifier( estimators=[ ('label1', clf_1), ('label2', clf_2), ... ('labelN', clf_N)], voting='soft', weights=[w_1, w_2, ..., w_N] )

Averaging Regressor

from sklearn.ensemble import VotingRegressor reg_voting = VotingRegressor( estimators=[ ('label1', reg_1), ('label2', reg_2), ... ('labelN', reg_N)], weights=[w_1, w_2, ..., w_N] )

slide-16
SLIDE 16

ENSEMBLE METHODS IN PYTHON

scikit-learn example

# Instantiate the individual models clf_knn = KNeighborsClassifier(5) clf_dt = DecisionTreeClassifier() clf_lr = LogisticRegression() # Create an averaging classifier clf_voting = VotingClassifier( estimators=[ ('knn', clf_knn), ('dt', clf_dt), ('lr', clf_lr)], voting='soft', weights=[1, 2, 1] )

slide-17
SLIDE 17

ENSEMBLE METHODS IN PYTHON

Game of Thrones deaths

Target: Predict whether a character is alive or not Features: Age Gender Books of appearance Popularity Whether relatives are alive or not

slide-18
SLIDE 18

Time to practice!

EN S EMBLE METH ODS IN P YTH ON