overfitting k nearest neighbors
play

Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1 Course Staff 3 Course Staff Team A 4


  1. 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Overfitting + k-Nearest Neighbors Matt Gormley Lecture 4 Jan. 27, 2020 1

  2. Course Staff 3

  3. Course Staff Team A 4

  4. Course Staff Team B 5

  5. Course Staff Team C 6

  6. Course Staff Team D 7

  7. Course Staff 8

  8. Q&A Q: When and how do we decide to stop growing trees? What if the set of values an attribute could take was really large or even infinite? A: We’ll address this question for discrete attributes today. If an attribute is real-valued, there’s a clever trick that only considers O(L) splits where L = # of values the attribute takes in the training set. Can you guess what it does? 9

  9. Reminders • Homework 2: Decision Trees – Out: Wed, Jan. 22 – Due: Wed, Feb. 05 at 11:59pm • Required Readings: – 10601 Notation Crib Sheet – Command Line and File I/O Tutorial (check out our colab.google.com template!) 11

  10. SPLITTING CRITERIA FOR DECISION TREES 12

  11. Decision Tree Learning • Definition : a splitting criterion is a function that measures the effectiveness of splitting on a particular attribute • Our decision tree learner selects the “best” attribute as the one that maximizes the splitting criterion • Lots of options for a splitting criterion: – error rate (or accuracy if we want to pick the tree that maximizes the criterion) – Gini gain – Mutual information – random – … 13

  12. Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B error rate select for 1 0 - the next split? - 1 0 1. A + 1 0 2. B + 1 0 1 1 + 3. A or B (tie) + 1 1 4. Neither + 1 1 1 1 + 14

  13. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 15

  14. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 16

  15. Gini Impurity Chalkboard – Expected Misclassification Rate: • Predicting a Weighted Coin with another Weighted Coin • Predicting a Weighted Dice Roll with another Weighted Dice Roll – Gini Impurity – Gini Impurity of a Bernoulli random variable – Gini Gain as a splitting criterion 17

  16. Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B Gini gain select for 1 0 - the next split? - 1 0 1. A + 1 0 2. B + 1 0 1 1 + 3. A or B (tie) + 1 1 4. Neither + 1 1 1 1 + 18

  17. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 19

  18. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - G(Y) = 1 – (6/8) 2 – (2/8) 2 = 0.375 1) - 1 0 2) P(A=1) = 8/8 = 1 3) P(A=0) = 0/8 = 0 + 1 0 4) G(Y | A=1) = G(Y) 5) G(Y | A=0) = undef 6) GiniGain(Y | A) = + 1 0 0.375 – 0(undef) – 1(0.375) = 0 1 1 + 7) P(B=1) = 4/8 = 0.5 8) P(B=0) = 4/8 = 0.5 + 1 1 9) G(Y | B=1) = 1 – (4/4) 2 – (0/4) 2 = 0 10) G(Y | B=0) = 1 – (2/4) 2 – (2/4) 2 = 0.5 + 1 1 11) GiniGain(Y | B) = 0.375 – 0.5(0) – 0.5(0.5) = 0.125 1 1 + 20

  19. Mutual Information • For a decision tree, we can use mutual information of the output class Y and some attribute X on which to split as a splitting criterion • Given a dataset D of training examples, we can estimate the required probabilities as… 22

  20. Mutual Information • Entropy measures the expected # of bits to code one random draw from X . • For a decision tree, we want to reduce the entropy of the random variable we are trying to predict ! • For a decision tree, we can use Conditional entropy is the expected value of specific conditional entropy mutual information of the output E P(X=x) [H(Y | X = x)] class Y and some attribute X on which to split as a splitting criterion • Given a dataset D of training Informally , we say that mutual information is a measure of the following: examples, we can estimate the If we know X, how much does this reduce our uncertainty about Y? required probabilities as… 23

  21. Decision Tree Learning Example Dataset: In-Class Exercise Output Y, Attributes A and B Which attribute would Y A B mutual information 1 0 - select for the next - 1 0 split? + 1 0 1. A + 1 0 1 1 2. B + + 1 1 3. A or B (tie) + 1 1 4. Neither 1 1 + 24

  22. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 25

  23. Decision Tree Learning Example Dataset: Output Y, Attributes A and B Y A B 1 0 - - 1 0 + 1 0 + 1 0 1 1 + + 1 1 + 1 1 1 1 + 26

  24. T e s Tennis Example t y o u r u n Dataset: d e r s t a n Day Outlook Temperature Humidity Wind PlayTennis? d i n g 27 Figure from Tom Mitchell

  25. T e s Tennis Example t y o u r u n d e Which attribute yields the best classifier? r s t a n d i n g H =0.940 H =0.940 H =0.985 H =0.592 H =0.811 H =1.0 28 Figure from Tom Mitchell

  26. T e s Tennis Example t y o u r u n d e Which attribute yields the best classifier? r s t a n d i n g H =0.940 H =0.940 H =0.985 H =0.592 H =0.811 H =1.0 29 Figure from Tom Mitchell

  27. T e s Tennis Example t y o u r u n d e Which attribute yields the best classifier? r s t a n d i n g H =0.940 H =0.940 H =0.985 H =0.592 H =0.811 H =1.0 30 Figure from Tom Mitchell

  28. T e s Tennis Example t y o u r u n d e r s t a n d i n g 31 Figure from Tom Mitchell

  29. EMPIRICAL COMPARISON OF SPLITTING CRITERIA 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend