Data Warehousing and Machine Learning Probabilistic Classifiers - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 34

Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good . . . . . . . . . . . . . . . Probabilistic Classifiers DWML Spring 2008 2 / 34

Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good . . . . . . . . . . . . . . . P ( Risk = Good | Savings = Medium , Assets = High , Income = 75 ) = 2 / 3 P ( Risk = Bad | Savings = Medium , Assets = High , Income = 75 ) = 1 / 3 Probabilistic Classifiers DWML Spring 2008 2 / 34

Probabilistic Classifiers Empirical Distribution The training data defines the empirical distribution, which can be represented in a table. Empirical distribution obtained from 1000 data instances: P Gender Blood Pressure Weight Smoker Stroke m low under no no 32/1000 m low under no yes 1/1000 m low under yes no 27/1000 . . . . . . . . . . . . . . . . . . f normal normal no yes 0/1000 . . . . . . . . . . . . . . . . . . f high over yes yes 54/1000 Such a table is not a suitable probabilistic model, because • Size of representation • It overfits the data Probabilistic Classifiers DWML Spring 2008 3 / 34

Probabilistic Classifiers Model View data as being produced by a random process that is described by a joint probability distribution P on States ( A 1 , . . . , A n , C ) , i.e. P assigns a probability P ( a 1 , . . . , a n , c ) ∈ [ 0 , 1 ] to every tuple ( a 1 , . . . , a n , c ) of values for the attribute and class variables, s.t. P ( a 1 , . . . , a n , c ) = 1 X ( a 1 ,..., a n , c ) ∈ States ( A 1 ,..., A n , C ) (for discrete attributes; integration instead of summation for continuous attributes) Conditional Probability The joint distribution P also defines the conditional probability distribution of C , given A 1 , . . . , A n , i.e. values P ( c | a 1 , . . . , a n ) := P ( a 1 , . . . , a n , c ) P ( a 1 , . . . , a n , c ) = P ( a 1 , . . . , a n ) c ′ P ( a 1 , . . . , a n , c ′ ) P that represent the probability that C = c given that it is known that A 1 = a 1 , . . . , A n = a n . Probabilistic Classifiers DWML Spring 2008 4 / 34

Probabilistic Classifiers Classification Rule For a loss function L ( c , c ′ ) an instance is classified according to C ( a 1 , . . . , a n ) := arg L ( c , c ′ ) P ( c | a 1 , . . . , a n ) X min c ′ ∈ States ( C ) c ∈ States ( C ) Examples Predicted Predicted Cancer Normal c c’ Cancer 1 1000 c 0 1 true true True 1 0 c’ 1 0 L ( c , c ′ ) 0/1 loss Probabilistic Classifiers DWML Spring 2008 5 / 34

Probabilistic Classifiers Classification Rule For a loss function L ( c , c ′ ) an instance is classified according to C ( a 1 , . . . , a n ) := arg L ( c , c ′ ) P ( c | a 1 , . . . , a n ) X min c ′ ∈ States ( C ) c ∈ States ( C ) Under 0/1-loss we get C ( a 1 , . . . , a n ) := arg max c ∈ States ( C ) P ( c | a 1 , . . . , a n ) In binary case, e.g. States ( C ) = { notinfected , infected } , also with variable threshold t : C ( a 1 , . . . , a n ) = notinfected : ⇔ P ( notinfected | a 1 , . . . , a n ) ≥ t . (this can also be generalized for non-binary attributes). Probabilistic Classifiers DWML Spring 2008 5 / 34

Naive Bayes The Naive Bayes Model Structural assumption: P ( a 1 , . . . , a n , c ) = P ( a 1 | c ) · P ( a 2 | c ) · · · P ( a n | c ) · P ( c ) Graphical representation as a Bayesian Network : C A 3 A 4 A 5 A 6 A 7 A 1 A 2 Interpretation: Given the true class labels, the different attributes take their value independently. Probabilistic Classifiers DWML Spring 2008 6 / 34

Naive Bayes The naive Bayes assumption I 1 2 3 4 5 6 7 8 9 For example: P ( Cell − 2 = b | Cell − 5 = b , Symbol = 1 ) > P ( Cell − 2 = b | Symbol = 1 ) Attributes not independent given Symbol =1! Probabilistic Classifiers DWML Spring 2008 7 / 34

Naive Bayes The naive Bayes assumption II For spam example e.g.: P ( Body’nigeria’=y | Body’confidential’=y , Spam=y ) ≫ P ( Body’nigeria’=y | Spam=y ) Attributes not independent given Spam =yes! � Naive Bayes assumption often not realistic. Nevertheless, Naive Bayes often successful. Probabilistic Classifiers DWML Spring 2008 8 / 34

Naive Bayes Learning a Naive Bayes Classifier • Determine parameters P ( a i | c ) ( a i ∈ States ( A i ) , c ∈ States ( C ) ) from empirical counts in the data. • Missing values are easily handled: instances for which A i is missing are ignored for P ( a i | c ) . • Discrete and continuous attributes can be mixed. Probabilistic Classifiers DWML Spring 2008 9 / 34

Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (real) 1 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 States ( A 1 , . . . , A n ) Probabilistic Classifiers DWML Spring 2008 10 / 34

Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (real) 1 ⊕ : P ( C = ⊕ | a 1 , . . . , a n ) (Naive Bayes) ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0.5 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ 0 States ( A 1 , . . . , A n ) Probabilistic Classifiers DWML Spring 2008 10 / 34

Naive Bayes When Naive Bayes must fail No Naive Bayes Classifier can produce the following classification: A B Class yes yes ⊕ yes no ⊖ ⊖ no yes no no ⊕ because assume it did, then: P ( A = y | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) P ( A = y | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) 1 . > P ( A = y | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) P ( A = y | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) 2 . > P ( A = n | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) P ( A = n | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) 3 . > P ( A = n | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) P ( A = n | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) 4 . > Probabilistic Classifiers DWML Spring 2008 11 / 34

Naive Bayes When Naive Bayes must fail (cont.) P ( A = y | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) P ( A = y | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) 1 . > P ( A = y | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) P ( A = y | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) 2 . > P ( A = n | ⊖ ) P ( B = y | ⊖ ) P ( ⊖ ) P ( A = n | ⊕ ) P ( B = y | ⊕ ) P ( ⊕ ) 3 . > P ( A = n | ⊕ ) P ( B = n | ⊕ ) P ( ⊕ ) P ( A = n | ⊖ ) P ( B = n | ⊖ ) P ( ⊖ ) 4 . > Multiplying the four left sides and the four right sides of these inequalities: 4 4 ( leftsideof i . ) > ( rightsideofi . ) Y Y i = 1 i = 1 But this is false, because both products are actually equal. Probabilistic Classifiers DWML Spring 2008 12 / 34

Naive Bayes Tree Augmented Naive Bayes A 2 A 7 Model: all Bayesian network structures where A 6 C - The class node is parent of each A 3 attribute node - The substructure on the attribute A 1 nodes is a tree A 4 A 5 Learning TAN classifier: learning the tree structure and parameters. Optimal tree structure can be found efficiently (Chow, Liu 1968, Friedman et al. 1997). Probabilistic Classifiers DWML Spring 2008 13 / 34

Naive Bayes A B Class yes yes ⊕ TAN classifier for : yes no ⊖ no yes ⊖ no no ⊕ C yes no A ⊕ 0 . 5 0 . 5 ⊖ 0 . 5 0 . 5 ⊕ ⊖ C 0 . 5 0 . 5 C A yes no ⊕ yes 1 . 0 0 . 0 B ⊕ no 0 . 0 1 . 0 ⊖ yes 0 . 0 1 . 0 ⊖ 1 . 0 0 . 0 no Probabilistic Classifiers DWML Spring 2008 14 / 34

Tree Augmented Naive Bayes Learning a TAN Classifier: a rough overview • Learn a (class conditional) maximum likelihood tree structure of the attributes. • Insert the class variable as a parent of all the attributes. Probabilistic Classifiers DWML Spring 2008 15 / 34

Tree Augmented Naive Bayes Learning a TAN Classifier: a rough overview • Learn a (class conditional) maximum likelihood tree structure of the attributes. • Insert the class variable as a parent of all the attributes. Learning a Chow-Liu tree A Chow-Liu tree of maximal likelihood can be constructed as follows: Calculate MI ( A i , A j ) for each pair ( A i , A j ) . 1 Build a maximum-weight spanning tree over the attributes. 2 3 Direct the resulting tree. Learn the parameters. 4 P # ( A i , A j ) ! MI ( A i , A j ) = X P # ( A i , A j ) log 2 P # ( A i ) P ( A j ) A i , A j Probabilistic Classifiers DWML Spring 2008 15 / 34

Data Warehousing and Machine Learning Probabilistic Classifiers - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 34 Probabilistic Classifiers Conditional class probabilities Id. Savings

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

CS520 Data Integration, Warehousing, and Provenance 6. Data Warehousing IIT DBGroup Boris

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business

Warehousing Warehousing are the activities involved in the design and operation of warehouses

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring & warehousing Our

TAPPI Shipping, Receiving & Warehousing Workshop TAPPI Shipping, Receiving & Warehousing

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Management and Analysis with Business Applications Data Warehousing Andrea Brunello

Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

CSCE 478/878 Lecture 7: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchells slides)

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Introduction Gene family Several similar genes that have evolved from a common ancestor

Data Warehousing and Machine Learning Probabilistic Classifiers - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 34 Probabilistic Classifiers Conditional class probabilities Id. Savings

Database Management Objectives of Lecture 5 Systems Data Warehousing and OLAP Data Warehousing

Data Warehousing Outline Overview of data warehousing Dimensional Modeling Online

CS520 Data Integration, Warehousing, and Provenance 6. Data Warehousing IIT DBGroup Boris

A Generic Solution for A Generic Solution for Warehousing Business Warehousing Business

Warehousing Warehousing are the activities involved in the design and operation of warehouses

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring &amp; warehousing Our

TAPPI Shipping, Receiving &amp; Warehousing Workshop TAPPI Shipping, Receiving &amp; Warehousing

4/14/20 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Data Management and Analysis with Business Applications Data Warehousing Andrea Brunello

Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

CSCE 478/878 Lecture 7: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchells slides)

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Introduction Gene family Several similar genes that have evolved from a common ancestor

Lifting your cargoes faster shorecranes up to 208 tons rhb stevedoring & warehousing Our

TAPPI Shipping, Receiving & Warehousing Workshop TAPPI Shipping, Receiving & Warehousing