chapter 6 cla lassi ssific icatio ion
play

Chapter 6: Cla lassi ssific icatio ion Jilles Vreeken IRDM - PowerPoint PPT Presentation

Chapter 6: Cla lassi ssific icatio ion Jilles Vreeken IRDM 15/16 17 Nov 2015 IRDM Chapter 6, overview Basic idea 1. Instance-based classification 2. Decision trees 3. Probabilistic classification 4. Youll find this covered in


  1. Chapter 6: Cla lassi ssific icatio ion Jilles Vreeken IRDM ‘15/16 17 Nov 2015

  2. IRDM Chapter 6, overview Basic idea 1. Instance-based classification 2. Decision trees 3. Probabilistic classification 4. You’ll find this covered in Aggarwal Ch. 10 Zaki & Meira, Ch. 18, 19, (22) VI: 2 IRDM ‘15/16

  3. Chapter 6.1: The he Basi asic I Idea ea Aggarwal Ch. 10.1-10.2 VI: 3 IRDM ‘15/16

  4. Definitions Data for classification comes in tuples ( 𝑦 , 𝑧 )  vector 𝑦 is the attribute (feature) set  attributes can be binary, categorical or numerical Attribute set Class  value 𝑧 is the class label  we concentrate on binary or Home Marital Annual Defaulted nominal class labels TID Owner Status Income Borrower  compare classification 1 Yes Single 125K No with regression! 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No A classifier is a function that 5 No Divorced 95K Yes maps attribute sets to class 6 No Married 60K No 7 Yes Divorced 220K No labels, 𝑔 ( 𝑦 ) = 𝑧 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes VI: 4 IRDM ‘15/16

  5. Classification function as a black box Classification Attribute set 𝒚 Class label 𝑧 function VI: 5 IRDM ‘15/16

  6. Descriptive vs. Predictive In descriptive data mining the goal is to give a description of the data  those who have bought diapers have also bought beer  these are the clusters of documents from this corpus In predictive data mining the goal is to predict the future  those who will buy diapers will also buy beer  if new documents arrive, they will be similar to one of the cluster centroids The difference between predictive data mining and machine learning is hard to define VI: 6 IRDM ‘15/16

  7. Descriptive vs. Predictive In descriptive data mining the goal is to give a description of the data  those who have bought diapers have also bought beer  these are the clusters of documents from this corpus In Data Mining we care more about insightfulness In predictive data mining the goal is to predict the future than prediction performance  those who will buy diapers will also buy beer  if new documents arrive, they will be similar to one of the cluster centroids The difference between predictive data mining and machine learning is hard to define VI: 7 IRDM ‘15/16

  8. Descriptive vs. Predictive Who are the borrowers that will default?  descriptive If a new borrower comes, will they default?  predictive Home Marital Annual Defaulted TID Owner Status Income Borrower 1 Yes Single 125K No Predictive classification is the 2 No Married 100K No 3 No Single 70K No usual application 4 Yes Married 120K No  and what we concentrate on 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes VI: 8 IRDM ‘15/16

  9. General classification Framework VI: 9 IRDM ‘15/16

  10. Classification model evaluation Predicted class Recall contingency tables Actual class Class=1 Class=0  a conf nfus usion mat atrix is simply a contingency table between 𝑡 11 𝑡 10 Class=1 actual and predicted class labels 𝑡 01 𝑡 00 Class=0 Many measures available  we focus on accura curacy cy and error r rate 𝑡 11 +𝑡 00 𝑡 10 +𝑡 01 𝑡 11 +𝑡 00 +𝑡 10 +𝑡 01 𝑡 11 +𝑡 00 +𝑡 10 +𝑡 01 = 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑧 = 𝑓𝑏𝑏𝑓𝑏 𝑏𝑏𝑠𝑓 = 𝑄 𝑔 𝑦 ≠ 𝑧 = 𝑄 𝑔 𝑦 = 1, 𝑧 = − 1 + 𝑄 𝑔 𝑦 = − 1, 𝑧 = 1 = 𝑧 = 1 𝑄 ( 𝑧 = 1) 𝑞 𝑔 𝑦 = 1 𝑧 = − 1 𝑄 𝑧 = − 1 + 𝑄 𝑔 𝑦 = − 1  there’s also precision, recall, F-scores, etc. (here I use the 𝑡 𝑗𝑗 notation to make clear we consider absolute numbers, in the wild 𝑔 𝑗𝑗 can mean either absolute or relative – pay close attention) VI: 10 IRDM ‘15/16

  11. Supervised vs. unsupervised learning In super pervised l lea earn rning  training data is accompanied by class labels  new data is classified based on the training set  classification In unsupervis ised le learnin ing  the class labels are unknown  the aim is to establish the existence of classes in the data, based on measurements, observations, etc.  clustering VI: 11 IRDM ‘15/16

  12. Chapter 6.2: Inst nstanc ance-ba based sed c class ssific ificat atio ion Aggarwal Ch. 10.8 VI: 12 IRDM ‘15/16

  13. Classification per instance Let us first consider the most simple effective classifier “ similar instances have similar labels” Key idea is to find instances in the training data that are similar to the test instance. VI: 13 IRDM ‘15/16

  14. 𝑙 -Nearest Neighbors The most basic classifier is 𝑙 -nearest est neighbo hbours urs Given database 𝑬 of labeled instances, a distance function 𝑒 , and parameter 𝑙 , for test instance 𝒚 , find the 𝑙 instances from 𝑬 most similar to 𝒚 , and assign it the major jorit ity la label l over this top- 𝑙 . We can make it more locally-sensitive by weighing by distance 𝜀 𝑔 𝜀 = 𝑓 −𝜀 2 / 𝑢 2 VI: 14 IRDM ‘15/16

  15. 𝑙 -Nearest Neighbors, ctd. 𝑙 NN classifiers work surprisingly well in practice, iff we have ample training data and your distance function is chosen wisely How to choose 𝑙 ?  odd, to avoid ties.  not too small, or it will not be robust against noise  not too large, or it will lose local sensitivity Computational complexity  training is instant, 𝑃 (0)  testing is slow, 𝑃 ( 𝑜 ) VI: 15 IRDM ‘15/16

  16. Chapter 6.3: Decisio ision T Trees es Aggarwal Ch. 10.3-10.4 VI: 16 IRDM ‘15/16

  17. Basic idea We define the label by asking seri eries o of f questi stions about the attributes  each question depends on the answer to the previous one  ultimately, all samples with satisfying attribute values have the same label and we’re done The flow-chart of the questions can be drawn as a tree We can classify new instances by following the proper edges of the tree until we meet a leaf  decision tree leafs are always class labels VI: 17 IRDM ‘15/16

  18. Example: training data age income student credit_rating buys PS4 ≤ 30 high no fair no ≤ 30 high no excellent no 30 … 40 high no fair yes > 40 medium no fair yes > 40 low yes fair yes > 40 low yes excellent no 30 … 40 low yes excellent yes ≤ 30 medium no fair no ≤ 30 low Yes fair yes > 40 medium yes fair yes ≤ 30 medium yes excellent yes 30 … 40 medium no excellent yes 30 … 40 high yes fair yes > 40 medium no excellent no VI: 18 IRDM ‘15/16

  19. Example: decision tree age? 31…40 > 40 ≤ 30 student? yes credit rating? no yes fair excellent no yes no yes VI: 19 IRDM ‘15/16

  20. Hunt’s algorithm The number of decision trees for a given set of attributes is exponential Finding the most accurate tree is NP-hard Practical algorithms use greedy h dy heuristi stics  the decision tree is grown by making a series of locally optimal decisions on which attributes to use and how to split on them Most algorithms are based on Hunt’s algorithm VI: 20 IRDM ‘15/16

  21. Hunt’s algorithm Le Let 𝑌 𝑢 be the set of training records for node 𝑠 1. 1. Le Let 𝑧 = { 𝑧 1 , … , 𝑧 𝑑 } be the class labels 2. 2. If If 𝑌 𝑢 contains records that belong to more than one class 3. 3. select attribute test condition to partition the 1. records into smaller subsets create a child node for each outcome of test condition 2. apply algorithm recursively to each child 3. el else i e if f all records in 𝑌 𝑢 belong to the same class 𝑧 𝑗 , 4. 4. the hen n 𝑠 is a leaf node with label 𝑧 𝑗 VI: 21 IRDM ‘15/16

  22. Example: Decision tree 𝑏𝑓𝑓𝑠 Home Marital Annual Defaulted TID Owner Status Income Borrower 1 Yes Single 125K No 2 No Married 100K No Defaulted=No 3 No Single 70K No 4 Yes Married 120K No Has multiple labels, 5 No Divorced 95K Yes best label = ‘no’ 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes VI: 22 IRDM ‘15/16

  23. Example: Decision tree Home owner Home Marital Annual Defaulted TID Owner Status Income Borrower 1 Yes Single 125K No 2 No Married 100K No yes no 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No No Yes 7 Yes Divorced 220K No 8 No Single 85K Yes Only one Has multiple 9 No Married 75K No label labels 10 No Single 90K Yes VI: 23 IRDM ‘15/16

  24. Example: Decision tree Home owner Home Marital Annual Defaulted TID Owner Status Income Borrower yes no 1 Yes Single 125K No 2 No Married 100K No No Yes 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes Only one Has multiple 6 No Married 60K No label labels 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes VI: 24 IRDM ‘15/16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend