Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1

Course Staff 3

Course Staff Team A 4

Course Staff Team B 5

Course Staff Team C 6

Course Staff Team D 7

Course Staff 8

Q&A Q: When and how do we decide to stop growing trees? What if the set of values an attribute could take was really large or even infinite? A: We’ll address this question for discrete attributes today. If an attribute is real-valued, there’s a clever trick that only considers O(L) splits where L = # of values the attribute takes in the training set. Can you guess what it does? 9

Reminders • Homework 2: Decision Trees – Out: Wed, Jan. 22 – Due: Wed, Feb. 05 at 11:59pm • Required Readings: – 10601 Notation Crib Sheet – Command Line and File I/O Tutorial (check out our colab.google.com template!) 11

SPLITTING CRITERIA FOR DECISION TREES 12

Decision Tree Learning • Definition : a splitting criterion is a function that measures the effectiveness of splitting on a particular attribute • Our decision tree learner selects the “best” attribute as the one that maximizes the splitting criterion • Lots of options for a splitting criterion: – error rate (or accuracy if we want to pick the tree that maximizes the criterion) – Gini gain – Mutual information – random – … 13

Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B error rate select for 1 0 - the next split? - 1 0 1. A + 1 0 2. B + 1 0 1 1 + 3. A or B (tie) + 1 1 4. Neither + 1 1 1 1 + 14

Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 15

Gini Impurity Chalkboard – Expected Misclassification Rate: • Predicting a Weighted Coin with another Weighted Coin • Predicting a Weighted Dice Roll with another Weighted Dice Roll – Gini Impurity – Gini Impurity of a Bernoulli random variable – Gini Gain as a splitting criterion 17

Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B Gini gain select for 1 0 - the next split? - 1 0 1. A + 1 0 2. B + 1 0 1 1 + 3. A or B (tie) + 1 1 4. Neither + 1 1 1 1 + 18

Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - G(Y) = 1 – (6/8) 2 – (2/8) 2 = 0.375 1) - 1 0 2) P(A=1) = 8/8 = 1 3) P(A=0) = 0/8 = 0 + 1 0 4) G(Y | A=1) = G(Y) 5) G(Y | A=0) = undef 6) GiniGain(Y | A) = + 1 0 0.375 – 0(undef) – 1(0.375) = 0 1 1 + 7) P(B=1) = 4/8 = 0.5 8) P(B=0) = 4/8 = 0.5 + 1 1 9) G(Y | B=1) = 1 – (4/4) 2 – (0/4) 2 = 0 10) G(Y | B=0) = 1 – (2/4) 2 – (2/4) 2 = 0.5 + 1 1 11) GiniGain(Y | B) = 0.375 – 0.5(0) – 0.5(0.5) = 0.125 1 1 + 20

Mutual Information • For a decision tree, we can use mutual information of the output class Y and some attribute X on which to split as a splitting criterion • Given a dataset D of training examples, we can estimate the required probabilities as… 22

Mutual Information • Entropy measures the expected # of bits to code one random draw from X . • For a decision tree, we want to reduce the entropy of the random variable we are trying to predict ! • For a decision tree, we can use Conditional entropy is the expected value of specific conditional entropy mutual information of the output E P(X=x) [H(Y | X = x)] class Y and some attribute X on which to split as a splitting criterion • Given a dataset D of training Informally , we say that mutual information is a measure of the following: examples, we can estimate the If we know X, how much does this reduce our uncertainty about Y? required probabilities as… 23

Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B mutual information 1 0 - select for the next - 1 0 split? + 1 0 1. A + 1 0 1 1 2. B + + 1 1 3. A or B (tie) + 1 1 4. Neither 1 1 + 24

T e s Tennis Example t y o u r u n Dataset: d e r s t a n Day Outlook Temperature Humidity Wind PlayTennis? d i n g 27 Figure from Tom Mitchell

T e s Tennis Example t y o u r u n d e Which attribute yields the best classifier? r s t a n d i n g H =0.940 H =0.940 H =0.985 H =0.592 H =0.811 H =1.0 28 Figure from Tom Mitchell

T e s Tennis Example t y o u r u n d e r s t a n d i n g 31 Figure from Tom Mitchell

EMPIRICAL COMPARISON OF SPLITTING CRITERIA 32

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3 Course Staff Team A 4

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

PI3K Inhibitors Anas Younes, M.D. chief, Lymphoma Service Memorial Sloan Kettering Cancer Center

node biopsy in cervical cancer Survey within the GCIG On behalf of the working group Jalid

ALI 334: STRESS THE FIRE WITHIN US UNDERSTANDING CHRONIC STRESS By: Aziza Amarshi, BSc, RPh,

Outline Outline u My Background My Background A Context for my Comments A Context for my

Variable selection in model-based classification G. Celeux 1 , M.-L. Martin-Magniette 2 , C. Maugis

BIOE 301/362 Lecture Four: Leading Causes of Mortality, Ages 45-60 Global Health Challenges

disease prediction using administrative claim data Dr Shahadat Uddin Senior lecturer Complex

Bringing the health systems strengthening message to life to Close the Cancer Divide August

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3 Course Staff Team A 4

Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen Overfitting Can Happen

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

The Problem of Overfitting The Problem of Overfitting BR data: neural network with 20%

Learning From Data Lecture 11 Overfitting What is Overfitting When does Overfitting Occur

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Overfitting Validation process. Overfitting Ettore Lanzarone March 18, 2020 LESSON 3 Lesson 3

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G.

c i,j max k,m c k,m 4 Wednesday, 2 Oct. 2019 Machine Learning (COMP 135) 3 Wednesday, 2

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun &amp; Rich Zemels lectures

c i,j max k,m c k,m 4 Wednesday, 26 Feb. 2020 Machine Learning (COMP 135) 3 Wednesday, 26

Inference and Estimation Using Nearest Neighbors 2019 The Second Korea-Japan Machine Learning

New directions in approximate nearest neighbors for the angular distance Thijs Laarhoven

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

PI3K Inhibitors Anas Younes, M.D. chief, Lymphoma Service Memorial Sloan Kettering Cancer Center

node biopsy in cervical cancer Survey within the GCIG On behalf of the working group Jalid

ALI 334: STRESS THE FIRE WITHIN US UNDERSTANDING CHRONIC STRESS By: Aziza Amarshi, BSc, RPh,

Outline Outline u My Background My Background A Context for my Comments A Context for my

Variable selection in model-based classification G. Celeux 1 , M.-L. Martin-Magniette 2 , C. Maugis

BIOE 301/362 Lecture Four: Leading Causes of Mortality, Ages 45-60 Global Health Challenges

disease prediction using administrative claim data Dr Shahadat Uddin Senior lecturer Complex

Bringing the health systems strengthening message to life to Close the Cancer Divide August

CSC 411: Lecture 05: Nearest Neighbors Class based on Raquel Urtasun & Rich Zemels lectures