P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s - PowerPoint PPT Presentation

P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s t C isco S ystems , I ndia

T w o h un t sm e n g o b ir d -h u nt i ng. B o th h un t sm e n c an h it a t a rg e t w it h p r ob a bi l it y o f 0.2 . T h ey s ee a fm o c k o f 150 b i rd s , a t op a b a ny a n t re e . F i rs t h u nt s ma n t a ke s a i m a nd fj r e s t hr e e c on t in u ou s s h ot s . A m in u te a ft e r t ha t , t h e s ec o nd h un t sm a n fj r e s t hr e e s ho t s a t t he b an y an t re e . H o w m an y b i rd s d i d t he s ec o nd h un t sm a n s ho o t?

H o w m an y b i rd s d i d t he s ec o nd h un t sm a n s ho o t? A n d t he n , t h er e w e re n on e

Y o ur m od e l i s o nl y a s g o od a s y ou (a n d y ou r f e at u re s )

F e at u re i de n ti fj c a ti o n/ c r ea t io n /g e ne r at i on t ak e s a l o t o f t im e

T w o d i fg e r en t m o de l s w it h s a me f ea t ur e s c an r es u lt i n d i fg e r en t o u tp u ts W h y?

T w o d i fg e r en t m o de l s w it h s a me f ea t ur e s c an r es u lt i n d i fg e r en t o u tp u ts S e ar c he d d ifg e r en t r e gi o ns o f t he s ol u ti o n s pa c e

S ome c ommon p roblem s f aced by m odeler s 1. D ifg e rent m odels 2. M odel p aramet e rs 3. N umber of f eature s

P o ss i bl e S o lu t io n A p pr o ac h ?

E n se m bl e m o de l s a re o ur f ri e nd s

W h at i s a n e ns e mb l e?

C P U a s a p r ox y f o r h um a n I Q

C l ev e r A lg o ri t hm i c w ay t o s ea r ch t he s ol u ti o n s pa c e

B u t i s i t n ew?

B u t i s i t n ew? K n ow n t o r e se a rc h er s /a c ad e mi a f o r l on g . W a sn't w id e ly u se d i n i n du s tr y u n ti l ....

S u cc e ss S to r y N e t fm i x $ 1 m i ll i on p ri z e c om p et i ti o n

S o me A dv a nt a ge s 1. I mprove d a ccurac y 2. R obustn e ss 3. P aralle l izatio n

B a se m od e l d iv e rs i ty M o de l a g gr e ga t io n

B ase M odel 1. D ifg e rent t rainin g s ets 2. F eature s amplin g 3. D ifg e rent a lgorit h ms 4. D ifg e rent H yperpa r ameter s

M odel A ggrega t ion 1. V oting 2. A veragi n g 3. B agging 4. S tackin g

W H ER E I S P Y TH O N ?

RandomizedSearchCV from scipy.stats import randint as sp_randint from sklearn.grid_search import GridSearchCV, RandomizedSearchCV # build a classifier clf = RandomForestClassifier(n_estimators=20) # specify parameters and distributions to sample from param_dist = {"max_depth": [3, None], "max_features": sp_randint(1, 11), "min_samples_split": sp_randint(1, 11), "min_samples_leaf": sp_randint(1, 11), "bootstrap": [True, False], "criterion": ["gini", "entropy”]} # run randomized search n_iter_search = 20 random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search)

hyperopt P y th o n l ib r ar y f o r s er i al a nd p ar a ll e l o pt i mi z at i on o ve r a w kw a rd s ea r ch s pa c es, w h ic h m a y i nc l ud e r e al-v a lu e d, d i sc r et e , a n d c on d it i on a l d im e ns i on s . h ttps ://g ithub .c om /h yperop t/h yperop t

hyperopt # define an objective function def objective(args): # Define the objective function here # define a search space from hyperopt import hp space = hp.choice('a', [ ('Model 1', randomForestModel), ('Model 2', xgboostModel) ]) # minimize the objective over the space from hyperopt import fmin, tpe best = fmin(objective, space, algo=tpe.suggest, max_evals=100)

joblib 1. t ranspa r ent d isk -c aching of t h e o utput v alues a n d la z y r e -e valuat i on (m emoize p attern ) 2. e asy s imple p aralle l c omputi n g 3. l ogging a n d t racing of t h e e xecuti o n

joblib import pandas as pd from sklearn.externals import joblib # build a classifier train = pd.read_csv('train.csv') clf = RandomForestClassifier(n_estimators=20) clf.fit(train) # once the classifier is built we can store it as a synchronized object # and can load it later and use it to predict, thereby reducing memory footprint. joblib.dump(clf, 'randomforest_20estimator.pkl') clf = joblib.load('randomforest_20estimator.pkl')

D i sa d va n ta g es 1. M odel h uman r eadabi l ity i sn 't g reat 2. T ime /E fg o rt t rade -o fg to i mprove a ccurac y m a y n o t ma k e s ense

Q u es t io n s ?

P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s - PowerPoint PPT Presentation

P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s t C isco S ystems , I ndia T w o h un t sm e n g o b ir d -h u nt i ng. B o th h un t sm e n c an h it a t a rg e t w it h p r ob a bi l it y o f 0.2 . T h ey s ee a fm o c k o f

Behavioral Types and Logical Frameworks An Introduction Carsten Sch urmann IT University of

Plotting Dr. Mihail September 25, 2018 (Dr. Mihail) Plots September 25, 2018 1 / 24 Plots

Backstepping From simple designs to take-off Ola Hrkegrd Control & Communication

Simulating the Sky, Lecture2 Creating, Testing, and Using Simulations of the Galaxy Population

A Comparison of Chinese Parsers for Stanford Dependencies Wanxiang Che, Valentin I. Spitkovsky

EUSO-TA Y.Kawasaki (RIKEN)

The support is a morphism of monads Sharwin Rezagholi 1 Tobias Fritz 2 Paolo Perrone 1 1 Max Planck

Stability and Stabilization of polynomial dynamical systems Hadi Ravanbakhsh Sriram

Pitfalls of evaluating a classifiers performance in high energy physics applications Gilles

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto Nordander 2 Pierre Nugues 3 1

Classification with a control channel Dont cheat yourself! Gilles Louppe (@glouppe) Tim Head

3D Data visualization with Mayavi Prabhu Ramachandran Department of Aerospace Engineering IIT

Applications of Machine Learning in Engineering (and Parameter Tuning) Lars Kotthofg University

RISMA: A Rule-based Interval State Machine Algorithm for Performance Analysis, Alerts Generation,

The SRB service at STFC and the road to iRODS(?) Roger Downing Kevin ONeill iRODS

Geometry of the conjugacy problem Andrew Sale Vanderbilt University May 14, 2015 Andrew Sale

Discontinuous Feedback in Nonlinear Control: Stabilization Under Disturbances and Optimization

Tizen Devices! Arron Wang Payment Overview Traditional Payment Cash Paper Cheque

Large scale structure: Phenomenology The halo model: Theory Halo abundances, clustering,

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th October 2016 Chris

Lake Como School for Advanced Studies Computational Methods for Inverse Problems and Applications

Introductory Scientific Computing with Python Exercises FOSSEE Department of Aerospace

Sambuz

Useful Links

Newsletter

Mail Us

P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s - PowerPoint PPT Presentation

P o we r o f E n se m bl e s B argava S ubrama n ian D ata S cienti s t C isco S ystems , I ndia T w o h un t sm e n g o b ir d -h u nt i ng. B o th h un t sm e n c an h it a t a rg e t w it h p r ob a bi l it y o f 0.2 . T h ey s ee a fm o c k o f

Behavioral Types and Logical Frameworks An Introduction Carsten Sch urmann IT University of

Plotting Dr. Mihail September 25, 2018 (Dr. Mihail) Plots September 25, 2018 1 / 24 Plots

Backstepping From simple designs to take-off Ola Hrkegrd Control &amp; Communication

Simulating the Sky, Lecture2 Creating, Testing, and Using Simulations of the Galaxy Population

A Comparison of Chinese Parsers for Stanford Dependencies Wanxiang Che, Valentin I. Spitkovsky

EUSO-TA Y.Kawasaki (RIKEN)

The support is a morphism of monads Sharwin Rezagholi 1 Tobias Fritz 2 Paolo Perrone 1 1 Max Planck

Stability and Stabilization of polynomial dynamical systems Hadi Ravanbakhsh Sriram

Pitfalls of evaluating a classifiers performance in high energy physics applications Gilles

Decision Trees ID3 A Python implementation Daniel Pettersson 1 Otto Nordander 2 Pierre Nugues 3 1

Classification with a control channel Dont cheat yourself! Gilles Louppe (@glouppe) Tim Head

3D Data visualization with Mayavi Prabhu Ramachandran Department of Aerospace Engineering IIT

Applications of Machine Learning in Engineering (and Parameter Tuning) Lars Kotthofg University

RISMA: A Rule-based Interval State Machine Algorithm for Performance Analysis, Alerts Generation,

The SRB service at STFC and the road to iRODS(?) Roger Downing Kevin ONeill iRODS

Geometry of the conjugacy problem Andrew Sale Vanderbilt University May 14, 2015 Andrew Sale

Discontinuous Feedback in Nonlinear Control: Stabilization Under Disturbances and Optimization

Tizen Devices! Arron Wang Payment Overview Traditional Payment Cash Paper Cheque

Large scale structure: Phenomenology The halo model: Theory Halo abundances, clustering,

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

SPiCE Workshop: Team Ping! Flexible State-Merging with Python 10th October 2016 Chris

Lake Como School for Advanced Studies Computational Methods for Inverse Problems and Applications

Introductory Scientific Computing with Python Exercises FOSSEE Department of Aerospace

Sambuz

Useful Links

Newsletter

Mail Us

Backstepping From simple designs to take-off Ola Hrkegrd Control & Communication