Classification scikit-learn Artificial Intelligence @ Allegheny - PowerPoint PPT Presentation

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova February 17–21, 2020 Janyl Jumadinova February 17–21, 2020 1 / 18 Classification scikit-learn

scikit-learn Popular Python machine learning library Designed to be a well documented and approachable for non-specialist Built on top of NumPy and SciPy scikit-learn can be easily installed with pip or conda pip install scikit-learn conda install scikit-learn Janyl Jumadinova February 17–21, 2020 2 / 18 Classification scikit-learn

Data representation in scikit-learn Training dataset is described by a pair of matrices, one for the input data and one for the output. Most commonly used data formats are a NumPy ndarray or a Pandas DataFrame / Series . Janyl Jumadinova February 17–21, 2020 3 / 18 Classification scikit-learn

Data representation in scikit-learn Training dataset is described by a pair of matrices, one for the input data and one for the output. Most commonly used data formats are a NumPy ndarray or a Pandas DataFrame / Series . Each row of these matrices corresponds to one sample of the dataset. Each column represents a quantitative piece of information that is used to describe each sample (called “features”). Janyl Jumadinova February 17–21, 2020 3 / 18 Classification scikit-learn

Data representation in scikit-learn image credit: James Bourbeau Janyl Jumadinova February 17–21, 2020 4 / 18 Classification scikit-learn

Features in scikit-learn feature Module https://scikit-image.org/docs/dev/api/skimage.feature.html Janyl Jumadinova February 17–21, 2020 5 / 18 Classification scikit-learn

Local Binary Pattern Feature Extraction Introduced by Ojala et. al in “Multiresolution Gray Scale and Rotation Invariant Texture Classificationwith Local Binary Patterns” 1 Check whether the points surrounding the central point are greater than or less than the central point → get LBP codes (stored as array). 2 Calculate a histogram of LBP codes as a feature vector. image credit: https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_local_binary_pattern.html Janyl Jumadinova February 17–21, 2020 6 / 18 Classification scikit-learn

Local Binary Pattern Feature Extraction Example: The histogram of the LBP outcome is used as a measure to classify textures. image credit: https://scikit-image.org/docs/dev/auto_examples/features_detection/plot_local_binary_pattern.html Janyl Jumadinova February 17–21, 2020 7 / 18 Classification scikit-learn

Estimators in scikit-learn Algorithms are implemented as estimator classes in scikit-learn . Each estimator in scikit-learn is extensively documented (e.g. the KNeighborsClassifier documentation) with API documentation, user guides, and example usages. A model is an instance of one of these estimator classes Janyl Jumadinova February 17–21, 2020 8 / 18 Classification scikit-learn

Training a model fit then predict # Fit the model model.fit(X, y) # Get model predictions y_pred = model.predict(X) Janyl Jumadinova February 17–21, 2020 9 / 18 Classification scikit-learn

Decision Tree in scikit-learn image credit: James Bourbeau Janyl Jumadinova February 17–21, 2020 10 / 18 Classification scikit-learn

Model performance metrics Many commonly used performance metrics are built into the metrics subpackage in scikit-learn. However, a user-defined scoring function can be created using the sklearn.metrics.make scorer function. Janyl Jumadinova February 17–21, 2020 11 / 18 Classification scikit-learn image credit: James Bourbeau

Separate training and testing sets scikit-learn has a convenient train test split function that randomly splits a dataset into a testing and training set. image credit: James Bourbeau Janyl Jumadinova February 17–21, 2020 12 / 18 Classification scikit-learn

Model selection - hyperparameter optimization Model hyperparameter values (parameters whose values are set before the learning process begins) can be used to avoid under- and over-fitting. Janyl Jumadinova February 17–21, 2020 13 / 18 Classification scikit-learn

Model selection - hyperparameter optimization Model hyperparameter values (parameters whose values are set before the learning process begins) can be used to avoid under- and over-fitting. Under-fitting - model isn’t sufficiently complex enough to properly model the dataset at hand. Over-fitting - model is too complex and begins to learn the noise in the training dataset. Janyl Jumadinova February 17–21, 2020 13 / 18 Classification scikit-learn image

k-fold cross validation Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. It uses a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model. The parameter k refers to the number of groups that a given data sample is to be split into. Janyl Jumadinova February 17–21, 2020 14 / 18 Classification scikit-learn

k-fold cross validation 1. Shuffle the dataset randomly. 2. Split the dataset into k groups. 3. For each unique group: 3.1. Take the group as a hold out or test data set. 3.2. Take the remaining groups as a training data set. 3.3. Fit a model on the training set and evaluate it on the test set. 3.4. Retain the evaluation score and discard the model. 4. Summarize the skill of the model using the sample of model evaluation scores. Janyl Jumadinova February 17–21, 2020 15 / 18 Classification scikit-learn

k-fold cross validation image credit: https://scikit-learn.org/stable/modules/cross_validation.html Janyl Jumadinova February 17–21, 2020 16 / 18 Classification scikit-learn

k-fold cross validation image credit: James Bourbeau Janyl Jumadinova February 17–21, 2020 17 / 18 Classification scikit-learn

Cross Validation in scikit-learn image credit: James Bourbeau Janyl Jumadinova February 17–21, 2020 18 / 18 Classification scikit-learn

Classification scikit-learn Artificial Intelligence @ Allegheny - PowerPoint PPT Presentation

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova February 1721, 2020 Janyl Jumadinova February 1721, 2020 1 / 18 Classification scikit-learn scikit-learn Popular Python machine learning library

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

scikit-learn Case Study Professor Patrick McDaniel Jonathan Price Fall 2015 More Advanced Usage

COMP 204 Intro to machine learning with scikit-learn (part three) Mathieu Blanchette 1 / 14

Topic Modelling with Scikit-learn Derek Greene University College Dublin PyData Dublin

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Luca Erculiani

Scikit-learn's Transformers - v0.20 and beyond - Tom Dupr la Tour - PyParis 14/11/2018 1 / 30

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Scikit-learn 1 / 13 Machine Learning Learning: using experience to improve performance.

COMP 204 Intro to machine learning with scikit-learn (part two) Mathieu Blanchette, based on

Satellites in MW-Mass Halos theory Single LSST: 93-179 sats DES: 19-37 sats Tollerud+08; see

Exploratory Data Analysis Summary Statistics Administrivia o Please activate your Piazza account

HANDS ON DATA MINING By Amit Somech Workshop in Data-science, March 2016 AGENDA Before you

Data Analysis with Python Pandas, Jupyter, and Friends Andreas Herten, 4 May 2017 The data

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

UNDERSTANDING NUMBA THE PYTHON AND NUMPY COMPILER Christoph Deil & EuroPython 2019

Rethinking I/O: Using HPC resources within HEP Jim Kowalkowski Scalable I/O Workshop 23 Aug 2018

Projects 3-4 person groups preferred CNN lecture Mockdag Deliverables: Poster & Report &

Classification scikit-learn Artificial Intelligence @ Allegheny - PowerPoint PPT Presentation

Classification scikit-learn Artificial Intelligence @ Allegheny College Janyl Jumadinova February 1721, 2020 Janyl Jumadinova February 1721, 2020 1 / 18 Classification scikit-learn scikit-learn Popular Python machine learning library

Scikit-learn some perspectives Lundi 17 septembre 2018 Lancement de linitjatjve scikit-learn

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Accelerating Random Forests in Scikit-Learn Gilles Louppe Universit e de Li` ege, Belgium

scikit-learn Case Study Professor Patrick McDaniel Jonathan Price Fall 2015 More Advanced Usage

COMP 204 Intro to machine learning with scikit-learn (part three) Mathieu Blanchette 1 / 14

Topic Modelling with Scikit-learn Derek Greene University College Dublin PyData Dublin

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Luca Erculiani

Scikit-learn's Transformers - v0.20 and beyond - Tom Dupr la Tour - PyParis 14/11/2018 1 / 30

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Exploring image processing pipelines with scikit-image, joblib, ipywidgets and dash A bag of

Scikit-learn 1 / 13 Machine Learning Learning: using experience to improve performance.

COMP 204 Intro to machine learning with scikit-learn (part two) Mathieu Blanchette, based on

Satellites in MW-Mass Halos theory Single LSST: 93-179 sats DES: 19-37 sats Tollerud+08; see

Exploratory Data Analysis Summary Statistics Administrivia o Please activate your Piazza account

HANDS ON DATA MINING By Amit Somech Workshop in Data-science, March 2016 AGENDA Before you

Data Analysis with Python Pandas, Jupyter, and Friends Andreas Herten, 4 May 2017 The data

Introduction to Data Science CS 5963 / Math 3900 Lecture 2: Introduction to Descriptive

UNDERSTANDING NUMBA THE PYTHON AND NUMPY COMPILER Christoph Deil &amp; EuroPython 2019

Rethinking I/O: Using HPC resources within HEP Jim Kowalkowski Scalable I/O Workshop 23 Aug 2018

Projects 3-4 person groups preferred CNN lecture Mockdag Deliverables: Poster &amp; Report &amp;

UNDERSTANDING NUMBA THE PYTHON AND NUMPY COMPILER Christoph Deil & EuroPython 2019

Projects 3-4 person groups preferred CNN lecture Mockdag Deliverables: Poster & Report &