data warehousing and machine learning
play

Data Warehousing and Machine Learning Probabilistic Classifiers - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 34 Probabilistic Classifiers Conditional class probabilities Id. Savings


  1. Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 34

  2. Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good . . . . . . . . . . . . . . . Probabilistic Classifiers DWML Spring 2008 2 / 34

  3. Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good . . . . . . . . . . . . . . . P ( Risk = Good | Savings = Medium , Assets = High , Income = 75 ) = 2 / 3 P ( Risk = Bad | Savings = Medium , Assets = High , Income = 75 ) = 1 / 3 Probabilistic Classifiers DWML Spring 2008 2 / 34

  4. Probabilistic Classifiers Empirical Distribution The training data defines the empirical distribution, which can be represented in a table. Empirical distribution obtained from 1000 data instances: P Gender Blood Pressure Weight Smoker Stroke m low under no no 32/1000 m low under no yes 1/1000 m low under yes no 27/1000 . . . . . . . . . . . . . . . . . . f normal normal no yes 0/1000 . . . . . . . . . . . . . . . . . . f high over yes yes 54/1000 Such a table is not a suitable probabilistic model, because • Size of representation • It overfits the data Probabilistic Classifiers DWML Spring 2008 3 / 34

  5. Probabilistic Classifiers Model View data as being produced by a random process that is described by a joint probability distribution P on States ( A 1 , . . . , A n , C ) , i.e. P assigns a probability P ( a 1 , . . . , a n , c ) ∈ [ 0 , 1 ] to every tuple ( a 1 , . . . , a n , c ) of values for the attribute and class variables, s.t. P ( a 1 , . . . , a n , c ) = 1 X ( a 1 ,..., a n , c ) ∈ States ( A 1 ,..., A n , C ) (for discrete attributes; integration instead of summation for continuous attributes) Conditional Probability The joint distribution P also defines the conditional probability distribution of C , given A 1 , . . . , A n , i.e. values P ( c | a 1 , . . . , a n ) := P ( a 1 , . . . , a n , c ) P ( a 1 , . . . , a n , c ) = P ( a 1 , . . . , a n ) c ′ P ( a 1 , . . . , a n , c ′ ) P that represent the probability that C = c given that it is known that A 1 = a 1 , . . . , A n = a n . Probabilistic Classifiers DWML Spring 2008 4 / 34

  6. Probabilistic Classifiers Classification Rule For a loss function L ( c , c ′ ) an instance is classified according to C ( a 1 , . . . , a n ) := arg L ( c , c ′ ) P ( c | a 1 , . . . , a n ) X min c ′ ∈ States ( C ) c ∈ States ( C ) Examples Predicted Predicted Cancer Normal c c’ Cancer 1 1000 c 0 1 true true True 1 0 c’ 1 0 L ( c , c ′ ) 0/1 loss Probabilistic Classifiers DWML Spring 2008 5 / 34

  7. Probabilistic Classifiers Classification Rule For a loss function L ( c , c ′ ) an instance is classified according to C ( a 1 , . . . , a n ) := arg L ( c , c ′ ) P ( c | a 1 , . . . , a n ) X min c ′ ∈ States ( C ) c ∈ States ( C ) Under 0/1-loss we get C ( a 1 , . . . , a n ) := arg max c ∈ States ( C ) P ( c | a 1 , . . . , a n ) In binary case, e.g. States ( C ) = { notinfected , infected } , also with variable threshold t : C ( a 1 , . . . , a n ) = notinfected : ⇔ P ( notinfected | a 1 , . . . , a n ) ≥ t . (this can also be generalized for non-binary attributes). Probabilistic Classifiers DWML Spring 2008 5 / 34

  8. Naive Bayes The Naive Bayes Model Structural assumption: P ( a 1 , . . . , a n , c ) = P ( a 1 | c ) · P ( a 2 | c ) · · · P ( a n | c ) · P ( c ) Graphical representation as a Bayesian Network : C A 3 A 4 A 5 A 6 A 7 A 1 A 2 Interpretation: Given the true class labels, the different attributes take their value independently. Probabilistic Classifiers DWML Spring 2008 6 / 34

  9. Naive Bayes The naive Bayes assumption I 1 2 3 4 5 6 7 8 9 For example: P ( Cell − 2 = b | Cell − 5 = b , Symbol = 1 ) > P ( Cell − 2 = b | Symbol = 1 ) Attributes not independent given Symbol =1! Probabilistic Classifiers DWML Spring 2008 7 / 34

  10. Naive Bayes The naive Bayes assumption II For spam example e.g.: P ( Body’nigeria’=y | Body’confidential’=y , Spam=y ) ≫ P ( Body’nigeria’=y | Spam=y ) Attributes not independent given Spam =yes! � Naive Bayes assumption often not realistic. Nevertheless, Naive Bayes often successful. Probabilistic Classifiers DWML Spring 2008 8 / 34

  11. Naive Bayes Learning a Naive Bayes Classifier • Determine parameters P ( a i | c ) ( a i ∈ States ( A i ) , c ∈ States ( C ) ) from empirical counts in the data. • Missing values are easily handled: instances for which A i is missing are ignored for P ( a i | c ) . • Discrete and continuous attributes can be mixed. Probabilistic Classifiers DWML Spring 2008 9 / 34

  12. Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (real) 1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 States ( A 1 , . . . , A n ) Probabilistic Classifiers DWML Spring 2008 10 / 34

  13. Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (real) 1 ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (Naive Bayes) ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 States ( A 1 , . . . , A n ) Probabilistic Classifiers DWML Spring 2008 10 / 34

  14. Naive Bayes When Naive Bayes must fail No Naive Bayes Classifier can produce the following classification: A B Class yes yes ⊕ yes no ⊖ ⊖ no yes no no ⊕ because assume it did, then: P ( A = y | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) P ( A = y | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) 1 . > P ( A = y | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) P ( A = y | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) 2 . > P ( A = n | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) P ( A = n | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) 3 . > P ( A = n | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) P ( A = n | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) 4 . > Probabilistic Classifiers DWML Spring 2008 11 / 34

  15. Naive Bayes When Naive Bayes must fail (cont.) P ( A = y | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) P ( A = y | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) 1 . > P ( A = y | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) P ( A = y | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) 2 . > P ( A = n | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) P ( A = n | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) 3 . > P ( A = n | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) P ( A = n | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) 4 . > Multiplying the four left sides and the four right sides of these inequalities: 4 4 ( leftsideof i . ) > ( rightsideofi . ) Y Y i = 1 i = 1 But this is false, because both products are actually equal. Probabilistic Classifiers DWML Spring 2008 12 / 34

  16. Naive Bayes Tree Augmented Naive Bayes A 2 A 7 Model: all Bayesian network structures where A 6 C - The class node is parent of each A 3 attribute node - The substructure on the attribute A 1 nodes is a tree A 4 A 5 Learning TAN classifier: learning the tree structure and parameters. Optimal tree structure can be found efficiently (Chow, Liu 1968, Friedman et al. 1997). Probabilistic Classifiers DWML Spring 2008 13 / 34

  17. Naive Bayes A B Class yes yes ⊕ TAN classifier for : yes no ⊖ no yes ⊖ no no ⊕ C yes no A ⊕ 0 . 5 0 . 5 ⊖ 0 . 5 0 . 5 ⊕ ⊖ C 0 . 5 0 . 5 C A yes no ⊕ yes 1 . 0 0 . 0 B ⊕ no 0 . 0 1 . 0 ⊖ yes 0 . 0 1 . 0 ⊖ 1 . 0 0 . 0 no Probabilistic Classifiers DWML Spring 2008 14 / 34

  18. Tree Augmented Naive Bayes Learning a TAN Classifier: a rough overview • Learn a (class conditional) maximum likelihood tree structure of the attributes. • Insert the class variable as a parent of all the attributes. Probabilistic Classifiers DWML Spring 2008 15 / 34

  19. Tree Augmented Naive Bayes Learning a TAN Classifier: a rough overview • Learn a (class conditional) maximum likelihood tree structure of the attributes. • Insert the class variable as a parent of all the attributes. Learning a Chow-Liu tree A Chow-Liu tree of maximal likelihood can be constructed as follows: Calculate MI ( A i , A j ) for each pair ( A i , A j ) . 1 Build a maximum-weight spanning tree over the attributes. 2 3 Direct the resulting tree. Learn the parameters. 4 P # ( A i , A j ) ! MI ( A i , A j ) = X P # ( A i , A j ) log 2 P # ( A i ) P ( A j ) A i , A j Probabilistic Classifiers DWML Spring 2008 15 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend