ECE 5984: Introduction to Machine Learning Topics: - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: – Decision/Classification Trees Readings: Murphy 16.1-16.2; Hastie 9.2 Dhruv Batra Virginia Tech

Project Proposals Graded • Mean 3.6/5 = 72% (C) Dhruv Batra 2

Administrativia • Project Mid-Sem Spotlight Presentations – Friday: 5-7pm, 3-5pm Whittemore 654 457A – 5 slides (recommended) – 4 minute time (STRICT) + 1-2 min Q&A – Tell the class what you’re working on – Any results yet? – Problems faced? – Upload slides on Scholar (C) Dhruv Batra 3

Recap of Last Time (C) Dhruv Batra 4

Convolution Explained • http://setosa.io/ev/image-kernels/ • https://github.com/bruckner/deepViz (C) Dhruv Batra 5

(C) Dhruv Batra 6 Slide Credit: Marc'Aurelio Ranzato

Convolutional Nets • Example: – http://yann.lecun.com/exdb/lenet/index.html C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Convolutions Convolutions Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 31

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 32

Addressing non-linearly separable data – Option 1, non-linear features • Choose non-linear features, e.g., – Typical linear features: w 0 + ∑ i w i x i – Example of non-linear features: • Degree 2 polynomials, w 0 + ∑ i w i x i + ∑ ij w ij x i x j • Classifier h w ( x ) still linear in parameters w – As easy to learn – Data is linearly separable in higher dimensional spaces – Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 35

Addressing non-linearly separable data – Option 2, non-linear classifier • Choose a classifier h w ( x ) that is non-linear in parameters w , e.g., – Decision trees, neural networks, … • More general than linear classifiers • But, can often be harder to learn (non-convex/ concave optimization required) • Often very useful (outperforms linear classifiers) • In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 36

New Topic: Decision Trees (C) Dhruv Batra 37

Synonyms • Decision Trees • Classification and Regression Trees (CART) • Algorithms for learning decision trees: – ID3 – C4.5 • Random Forests – Multiple decision trees (C) Dhruv Batra 38

Decision Trees • Demo – http://www.cs.technion.ac.il/~rani/LocBoost/ (C) Dhruv Batra 39

Pose Estimation • Random Forests! – Multiple decision trees – http://youtu.be/HNkbG3KsY84 (C) Dhruv Batra 40

Slide Credit: Pedro Domingos, Tom Mitchel, Tom Dietterich

A small dataset: Miles Per Gallon mpg cylinders displacement horsepower weight acceleration modelyear maker Suppose we want good 4 low low low high 75to78 asia bad 6 medium medium medium medium 70to74 america to predict MPG bad 4 medium medium medium low 75to78 europe bad 8 high high high low 70to74 america bad 6 medium medium medium medium 70to74 america bad 4 low medium low medium 70to74 asia bad 4 low medium low low 70to74 asia bad 8 high high high low 75to78 america : : : : : : : : : : : : : : : : : : : : : : : : bad 8 high high high low 70to74 america good 8 high medium high high 79to83 america bad 8 high high high low 75to78 america good 4 low low low low 79to83 america bad 6 medium medium medium high 75to78 america good 4 medium low low low 79to83 america good 4 low low medium high 79to83 america bad 8 high high high low 70to74 america good 4 low medium low medium 75to78 europe bad 5 medium medium medium medium 75to78 europe 40 Records From the UCI repository (thanks to Ross Quinlan) (C) Dhruv Batra Slide Credit: Carlos Guestrin 45

A Decision Stump (C) Dhruv Batra Slide Credit: Carlos Guestrin 46

The final tree (C) Dhruv Batra Slide Credit: Carlos Guestrin 47

Comments • Not all features/attributes need to appear in the tree. • A features/attribute X i may appear in multiple branches. • On a path, no feature may appear more than once. – Not true for continuous features. We’ll see later. • Many trees can represent the same concept • But, not all trees will have the same size! – e.g., Y = (A^B) ∨ ( ¬ A^C) (A and B) or (not A and C) (C) Dhruv Batra 48

Learning decision trees is hard!!! • Learning the simplest (smallest) decision tree is an NP-complete problem [Hyafil & Rivest ’76] • Resort to a greedy heuristic: – Start from empty decision tree – Split on next best attribute (feature) – Recurse • “Iterative Dichotomizer” (ID3) • C4.5 (ID3+improvements) (C) Dhruv Batra Slide Credit: Carlos Guestrin 49

Recursion Step Records in which cylinders = 4 Records in which cylinders = 5 Take the And partition it according Original to the value of the Records Dataset.. attribute we split on in which cylinders = 6 Records in which cylinders = 8 (C) Dhruv Batra Slide Credit: Carlos Guestrin 50

Recursion Step Build tree from Build tree from Build tree from Build tree from These records.. These records.. These records.. These records.. Records in Records in which cylinders which cylinders = 8 = 6 Records in Records in which cylinders which cylinders = 5 = 4 (C) Dhruv Batra Slide Credit: Carlos Guestrin 51

Second level of tree (Similar recursion in the Recursively build a tree from the seven other cases) records in which there are four cylinders and the maker was based in Asia (C) Dhruv Batra Slide Credit: Carlos Guestrin 52

The final tree (C) Dhruv Batra Slide Credit: Carlos Guestrin 53

Choosing a good attribute X 1 X 2 Y T T T T F T T T T T F T F T T F F F F T F F F F (C) Dhruv Batra Slide Credit: Carlos Guestrin 54

Measuring uncertainty • Good split if we are more certain about classification after split – Deterministic good (all true or all false) – Uniform distribution bad 1 P(Y=F | X 1 = T) = P(Y=T | X 1 = T) = 0.5 0 1 0 F T 1 P(Y=F | X 2 =F) = P(Y=T | X 2 =F) = 0.5 1/2 1/2 0 F T (C) Dhruv Batra 55

Entropy Entropy H(X) of a random variable Y More uncertainty, more entropy! Information Theory interpretation: H(Y) is the expected number of bits needed to encode a randomly drawn value of Y (under most efficient code) (C) Dhruv Batra Slide Credit: Carlos Guestrin 56

Information gain • Advantage of attribute – decrease in uncertainty – Entropy of Y before you split – Entropy after split • Weight by probability of following each branch, i.e., normalized number of records • Information gain is difference – (Technically it’s mutual information; but in this context also referred to as information gain) (C) Dhruv Batra Slide Credit: Carlos Guestrin 57

Learning decision trees • Start from empty decision tree • Split on next best attribute (feature) – Use, for example, information gain to select attribute – Split on • Recurse (C) Dhruv Batra Slide Credit: Carlos Guestrin 58

Suppose we want to predict MPG Look at all the information gains … (C) Dhruv Batra Slide Credit: Carlos Guestrin 59

ECE 5984: Introduction to Machine Learning Topics: - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: Decision/Classification Trees Readings: Murphy 16.1-16.2; Hastie 9.2 Dhruv Batra Virginia Tech Project Proposals Graded Mean 3.6/5 = 72% (C) Dhruv Batra 2 Administrativia

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

ECE 5984: Introduction to Machine Learning Topics: Neural Networks Backprop Readings:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introduction to Machine Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Society is nothing but the combination of individuals for cooperative effort. - Ludwig Von

Satire vs Fake News: You Can Tell by the Way They Say It Dipto Das and Anthony J Clark Computer

Instruc(onal Rounds: An Overview Lee Teitel Harvard

Patient Involvement in HTA: An example of How and Why 26 September 2017 Background A pharma

2/8/2019 2019 TAX FILING SEASON UPDATE KEY ISSUES AND DEVELOPMENTS FEBRUARY 8, 2019 Roger A.

Community Dashboards A Journey of Data, Information, and Storytelling 1 Webinar Instructions

Disaster Recovery Partnerships and Leveraging Resources including Opportunity Zones 2019 CDBG-DR

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla