Decision tree learning Introduction to Machine Learning Task of - PowerPoint PPT Presentation

INTRODUCTION TO MACHINE LEARNING Decision tree learning

Introduction to Machine Learning Task of classification ● Automatically assign class to observations with features ● Observation: vector of features , with a class ● Automatically assign class to new observation with features , using previous observations ● Binary classification: two classes ● Multiclass classification: more than two classes

Introduction to Machine Learning Example ● A dataset consisting of persons ● Features : age, weight and income ● Class : ● binary : happy or not happy ● multiclass : happy, satisfied or not happy

Introduction to Machine Learning Examples of features ● Features can be numerical ● age: 23, 25, 75, … ● height: 175.3, 179.5, … ● Features can be categorical ● travel_class: first class, business class, coach class ● smokes?: yes, no

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old?

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old? Old

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old? Old Smoked for more than 10 years?

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years?

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years? Yes No Yes No … … … …

Introduction to Machine Learning The decision tree ● Suppose you’re classifying patients as sick or not sick ● Intuitive way of classifying: ask questions Is the patient young or old? Young Old Vaccinated against the measles? Smoked for more than 10 years? Yes No Yes No … … … … It’s a decision tree!!!

Introduction to Machine Learning Define the tree A B C D E F G

Introduction to Machine Learning Define the tree A Nodes B C D E F G

Introduction to Machine Learning Define the tree A Edges B C D E F G

Introduction to Machine Learning Define the tree Root A B C D E F G

Introduction to Machine Learning Define the tree Root A B C D E F G Leafs

Introduction to Machine Learning Define the tree Root A Children of A B C Children of B, C D E F G Grandchildren of A

Introduction to Machine Learning Define the tree Root A Children of A B C Children of B, C D E F G Grandchildren of A Leafs

Introduction to Machine Learning Questions to ask age <= 18 yes no vaccinated smoked yes no yes no not not sick sick sick sick

Introduction to Machine Learning Categorical feature ● Can be a feature test on itself ● travel_class: coach, business or first travel_class coach first business … … …

Introduction to Machine Learning Classifying with the tree Observation: patient of 40 years, vaccinated and didn’t smoke age <= 18 yes no vaccinated smoked yes no yes no not not sick sick sick sick

Introduction to Machine Learning Classifying with the tree Observation: patient of 40 years, vaccinated and didn’t smoke age <= 18 yes no vaccinated smoked yes no yes no not not sick sick sick sick Prediction: not sick

Introduction to Machine Learning Learn a tree ● Use training set ● Come up with queries (feature tests) at each node

Introduction to Machine Learning Split into parts training set 2 parts for binary test age <= 18 yes no part of training set part of training set part of training set part of training set TRUE FALSE

Introduction to Machine Learning part of training set part of training set feature test feature test yes no yes no part of training set part of training set part of training set part of training set part of training set part of training set part of training set part of training set

Introduction to Machine Learning part of training set part of training set part of training set part of training set keep splitting until leafs contain small portion of training set

Introduction to Machine Learning Learn the tree ● Goal : end up with pure leafs — leafs that contain observations of one particular class leaf part of training set class 1 class class 2

Introduction to Machine Learning Learn the tree ● Goal : end up with pure leafs — leafs that contain observations of one particular class leaf leaf ● In practice : almost never the case — noise ● When classifying new instances part of training set part of training set class 1 class 1 class class 2 class 2 ● end up in leaf

Introduction to Machine Learning Learn the tree ● Goal : end up with pure leafs — leafs that contain observations of one particular class leaf ● In practice : almost never the case — noise ● When classifying new instances part of training set class 2 class 1 ● end up in leaf ● assign class of majority of training instances

Introduction to Machine Learning Learn the tree ● At each node ● Iterate over di ff erent feature tests ● Choose the best one ● Comes down to two parts ● Make list of feature tests ● Choose test with best split

Introduction to Machine Learning Construct list of tests ● Categorical features ● Parents/grandparents/… didn’t use the test yet ● Numerical features ● Choose feature ● Choose threshold

Introduction to Machine Learning Choose best feature test ● More complex ● Use spli � ing criteria to decide which test to use ● Information gain ~ entropy

Introduction to Machine Learning Information gain ● Information gained from split based on feature test ● Test leads to nicely divided classes   -> high information gain ● Test leads to scrambled classes   -> low information gain ● Test with highest information gain will be chosen

Introduction to Machine Learning Pruning ● Number of nodes influences chance on overfit ● Restrict size — higher bias ● Decrease chance on overfit ● Pruning the tree

INTRODUCTION TO MACHINE LEARNING Let’s practice!

INTRODUCTION TO MACHINE LEARNING k-Nearest Neighbors

Introduction to Machine Learning Instance-based learning ● Save training set in memory ● No real model like decision tree ● Compare unseen instances to training set ● Predict using the comparison of unseen data and the training set

Introduction to Machine Learning k-Nearest Neighbor ● Form of instance-based learning ● Simplest form: 1-Nearest Neighbor or Nearest Neighbor

Introduction to Machine Learning Nearest Neighbor - example ● 2 features : X1 and X2 ● Class : red or blue ● Binary classification

Introduction to Machine Learning Nearest Neighbor - example

Introduction to Machine Learning Nearest Neighbor - example ● Save complete training set

Introduction to Machine Learning Nearest Neighbor - example ● Save complete training set ● Given: unseen observation with features X = (1.3, -2)

Introduction to Machine Learning Nearest Neighbor - example ● Save complete training set ● Given: unseen observation with features X = (1.3, -2) ● Compare training set with new observation

Introduction to Machine Learning Nearest Neighbor - example ● Save complete training set ● Given: unseen observation with features X = (1.3, -2) ● Compare training set with new observation ● Find closest observation — nearest neighbor — and assign same class just Euclidean distance, nothing fancy

Introduction to Machine Learning k-Nearest Neighbors ● k is the amount of neighbors ● If k = 5 ● Use 5 most similar observations (neighbors) ● Assigned class will be the most represented class within the 5 neighbors

Decision tree learning Introduction to Machine Learning Task of - PowerPoint PPT Presentation

INTRODUCTION TO MACHINE LEARNING Decision tree learning Introduction to Machine Learning Task of classification Automatically assign class to observations with features Observation: vector of features , with a class Automatically

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Decision tree learning Aim: find a small tree consistent with the training examples Idea:

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Decision Tree and Automata Learning Stefan Edelkamp 1 Overview - Decision tree representation

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Decision Tree Learning: Part 2 CS 760@UW-Madison Goals for the last lecture you should

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Decision Tree Algorithm Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Team

HBV-23 Trial Heplisav-B vs Engerix-B in Adults 18-70, including DM HBV-23 Trial: Study Design

Mean field limit of controlled system: From discrete to continuous problems. Nicolas Gast 1 EPFL

quancol . ........ . . . ... ... ... ... ... ... ... SFM, Bertinoro, June 21, 2016

Large deviations for Poisson driven processes in epidemiology Peter Kratz joint work with

Pseudo-Bayesian Inference for Complex Survey Data Matt Williams 1 Terrance Savitsky 2 1 National

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

707.000 Web Science and Web Technology gy Network Evolution and Processes Markus

. IoT or Internet of {Things,Threats} Thomas (@nyx__o) Malware Researcher at ESET CTF lover