Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - - PowerPoint PPT Presentation

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Maximum a posteriori rule Calculating the posterior probability           P Y y P / Y y    k k P Y y /   Bayes  k P     theorem     P Y y P / Y y  k k K          P Y y P / Y y l l  l 1 MAP – Maximum a posteriori rule      y arg max P Y y / k * k k           y arg max P Y y P / Y y k * k k k How to estimate P(X/Y=y k ) Prior probability of class k : P(Y = y k ) Assumptions are introduced in order to obtain Estimated by empirical frequency n k /n a convenient calculation of this likelihood Ricco Rakotomalala 2 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Ricco Rakotomalala 3 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Conditional independence assumption Conditional independence for the J      P ( / Y y ) P ( X / Y y ) calculation of the likelihood k j k  j 1 The attributes are all conditionally independent of one another given the value of Y For a categorical attribute X, the conditional    P ( X x Y y )    probability for the value x l is computed as follows… l k ( / ) P X x Y y  l k P ( Y y ) k           The probability is estimated using   # , X ( ) x Y ( ) y n ˆ     l k kl P X x / Y y   the conditional relative frequency      l k # , Y ( ) y n k k  Y \ X x l The Laplace rule of succession is often used to estimate the conditional probability y n n k kl k    n 1 ˆ     kl P X x / Y y p  l k l / k  n K n k This is a kind of smoothing; it enables also to overcome the (n kl = 0) problem. Ricco Rakotomalala 4 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

An example using a toy dataset Maladie Marié Etud.Sup Direct estimation of the posterior probability Présent Non Oui Présent Non Oui 1 ˆ      Absent Non Non P ( Maladie Absent / Marié oui , Etu oui ) 1 1 Absent Oui Oui Présent Non Oui 0 ˆ      Absent Non Non P ( Maladie Présent / Marié oui , Etu oui ) 0 1 Absent Oui Non Présent Non Oui  If Etu = oui and Marié = oui Then Maladie = Absent Absent Oui Non Présent Oui Non (+) No assumptions, (-) small number of covered examples Conditional independence assumption NB Maladie Maladie Total ˆ    Absent 5 P ( Maladie Absent / Marié oui , Etu oui ) Présent 5 ˆ ˆ ˆ         P ( Maladie Absent ) P ( Marié oui / M Abs .) P ( Etu oui / M Abs .) Total général 10    5 1 3 1 1 1     0 . 082 NB Maladie Marié    10 2 5 2 5 2 Maladie Non Oui Total général Absent 2 3 5 ˆ    P ( Maladie présent / Marié oui , Etu oui ) Présent 4 1 5 Total général 6 4 10 ˆ ˆ ˆ         ( ) ( / .) ( / .) P Maladie présent P Marié oui M Abs P Etu oui M Abs    NB Maladie Etud.Sup 5 1 1 1 4 1     0 . 102 Oui Maladie Non Total général    10 2 5 2 5 2 Absent 4 1 5 Présent 1 4 5  If Etu = oui and Marié = oui Then Maladie = Présent Total général 5 5 10 (-) Questionable assumption, (+) more reliable estimation of probabilities Ricco Rakotomalala 5 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Advantage and shortcoming (end of the course?) >> Simplicity, quickness, ability to handle very large dataset, no possible crash during the calculations >> Incrementality (we store only the contingency tables) >> Statistically robust (even if the assumption is very questionable) >> This is a linear classifier  similar classification performance (see the numerous experiments described in scientific papers) >> No indication about the relevance of the attributes (really ?) >> Very high number of rules (in practice, the logical rules are not computed, the contingency tables for the calculation of the conditional frequency are deployed e.g. PMML format) >> Not explicit model (really ?)  not used in marketing domain, etc. We see often these conclusions in the literature… Is it possible to go beyond that? Ricco Rakotomalala 6 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Logarithmic transformation J      y arg max P ( Y y ) P ( X / Y y ) k * k j k k  j 1   J         y arg max ln P ( Y y ) ln P ( X / Y y ) k * k j k   k  j 1 Ricco Rakotomalala 7 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Model using one predictive attribute A discrete attribute X with L levels     d ( y , X ) ln P ( Y y ) ln P ( X / Y y ) k k k From X, we can create L dummy variables L        d ( y , X ) ln P ( Y y ) ln P ( X x / Y y ) I k k l k l  l 1 L        ln P ( Y y ) ln P ( X x / Y y ) I k l k l  l 1 L     a a I 0 , k l , k l  l 1 We obtain a linear combination of the dummy variables i.e. an explicit model which is easy to deploy  K linear classification functions (such as linear discriminant analysis) Ricco Rakotomalala 8 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

An example (Y : Maladie; X : Etu.Sup) NB Maladie Maladie Total Absent 5 Présent 5 Total général 10 NB Maladie Etud.Sup Oui Maladie Non Total général Absent 4 1 5 Présent 1 4 5 Total général 5 5 10    5 1 4 1 1 1        d ( absent , X ) ln ln ( X non ) ln ( X oui )    10 2 5 2 5 2         0 . 6931 0 . 3365 ( X non ) 1 . 2528 ( X oui )         d ( present , X ) 0 . 6931 1 . 2528 ( X non ) 0 . 3365 ( X oui ) For an instance (Etu.Sup = NON)      d ( absent , X ) 0 . 6931 0 . 3365 1 . 0296 Prediction : Maladie = non      d ( present , X ) 0 . 9631 1 . 2528 1 . 9495 Ricco Rakotomalala 9 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Implemented solution into TANAGRA (Using [L-1] dummy variables for an attribute X with L levels) since     L   I I I 1       1 2 L d ( y , X ) ln P ( Y y ) ln P ( X x / Y y ) I k k l k l  l 1    L 1 P ( X x / Y y )         l k ln P ( Y y ) ln P ( X x / Y y ) ln I   k L k l P ( X x / Y y )  l 1 L k  L 1     b b I 0 , k l , k l  l 1 One level [x L ] becomes the reference level The dummy coding is the most commonly used coding scheme Ricco Rakotomalala 10 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Maladie Marié Etud.Sup Présent Non Oui Extension to J predictive attributes Présent Non Oui Absent Non Non Absent Oui Oui Présent Non Oui Dummy coding scheme Absent Non Non Absent Oui Non X j with L j levels  (L j -1) dummy variables Présent Non Oui Absent Oui Non Présent Oui Non Linear classification functions using the indicator variables Ricco Rakotomalala 11 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

The particular case of the binary classification (K = 2) Construction of the SCORE function The class attribute has 2 levels :: Y={+,-}         ( , ) d X a a X a X a X Decision rule      , 0 , 1 1 , 2 2 , J J         d ( , X ) a a X a X a X      D(X) > 0  Y = + , 0 , 1 1 , 2 2 , J J       d ( X ) c c X c X c X 1 1 2 2 J J Interpretation >> D(X) is the SCORE function. It assigns a score proportional to positive class probability estimate to the instances >> The sign of the coefficients allows to interpret the influence of the descriptors Notre Classification exemple : functions SCORE Descriptors Présent Absent D(X) Not being married makes sick… Marié = Non 0.916291 -0.287682 1.203973 To study makes sick… Etud.Sup = Oui 0.916291 -0.916291 1.832582 constant -3.198673 -1.589235 -1.609438 Ricco Rakotomalala 12 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - - PowerPoint PPT Presentation

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra - http://data-mining-tutorials.blogspot.fr/ Maximum a posteriori rule Calculating the posterior probability P Y y P / Y y

Ricco RAKOTOMALALA Ricco.Rakotomalala@univ-lyon2.fr Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Universit Lumire Lyon 2 Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Universit Lumire Lyon 2 Ricco Rakotomalala 1 Tutoriels Tanagra -

Neural network for supervised learning Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels

Multivariate characterization of differences between groups Ricco RAKOTOMALALA Ricco Rakotomalala

MARKET BASKET ANALYSIS Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

(Predictive Discriminant Analysis) Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

With numeric and categorical variables (active and/or illustrative) Ricco RAKOTOMALALA

Support Vector Machine Supervised Learning - Classification Ricco Rakotomalala Universit

Grouping categorical variables Grouping categories of nominal variables Ricco RAKOTOMALALA

Ensemble method for supervised learning Using an explicit loss function Ricco RAKOTOMALALA

How to use (can we use) the multiple linear regression method for a classification problem ?

The neighbours of Baxter numbers Lattice paths Veronica Guerrini University of Siena, DIISM 31

Bouncing Universes in Loop Quantum Cosmology Edward Wilson-Ewing Albert Einstein Institute Max

Boundary value problems for elliptic operators with real non-symmetric coefficients Svitlana

An extension of a theorem of Zermelo Jouko Vnnen Department of Mathematics and Statistics,

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London,

Decidability of a Sound Set of Inference Rules for Computational Indistinguishability Adrien

Catalytic Networks Mark Baumback Introduction Summary w Artificial Chemistry review w Self