Outline 1 Introduction 2 Discrete Predictors 3 Validation of - PowerPoint PPT Presentation

Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra˜ naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit´ ecnica de Madrid Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia Pedro Larra˜ naga Bayesian Network Classifiers 1 / 52

Introduction Discrete Predictors Validation Summary Outline 1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary Pedro Larra˜ naga Bayesian Network Classifiers 2 / 52

Introduction Discrete Predictors Validation Summary Supervised classification X 1 . . . X n C x ( 1 ) x ( 1 ) ( x ( 1 ) , c ( 1 ) ) c ( 1 ) . . . n 1 x ( 2 ) x ( 2 ) ( x ( 2 ) , c ( 2 ) ) c ( 2 ) . . . n 1 . . . . . . . . . x ( N ) x ( N ) ( x ( N ) , c ( N ) ) c ( N ) . . . n 1 x ( N + 1 ) x ( N + 1 ) x ( N + 1 ) . . . ??? n 1 Pedro Larra˜ naga Bayesian Network Classifiers 4 / 52

Introduction Discrete Predictors Validation Summary Applications domains Supervised pattern recognition Decision support systems for diagnosis and prognosis Loan decision Spam detection Prediction of sport results Hand writing character recognition Weather forecast Prediction of the secondary structure of proteins . . . Pedro Larra˜ naga Bayesian Network Classifiers 5 / 52

Introduction Discrete Predictors Validation Summary Optical character recognition Figure: Hand writing character recognition Pedro Larra˜ naga Bayesian Network Classifiers 6 / 52

Introduction Discrete Predictors Validation Summary Weather forecast Figure: Methereology Pedro Larra˜ naga Bayesian Network Classifiers 7 / 52

Introduction Discrete Predictors Validation Summary Computational biology Figure: Prediction of the secondary structure of proteins Pedro Larra˜ naga Bayesian Network Classifiers 8 / 52

Introduction Discrete Predictors Validation Summary Paradigms for supervised classification Statistical and machine learning Bayesian networks (Pearl, 1988) Classification trees (Quinlan, 1986; Breiman et al. 1984) Classifier systems (Holland, 1975) Discriminant analysis (Fisher, 1936) k –NN classifiers (Covert and Hart, 1967; Dasarathy, 1991) Logistic regression (Hosmer and Lemeshov, 1989) Neural networks (McCulloch and Pitts, 1943) Rule induction (Clark and Nibblet, 1989; Cohen, 1995; Holte, 1993) Support vector machines (Cristianini and Shawe–Taylor, 2000) Pedro Larra˜ naga Bayesian Network Classifiers 9 / 52

Introduction Discrete Predictors Validation Summary Bayesian network based classifiers Hierarchy of classifiers Na¨ ıve Bayes (NB) (Minsky, 1961) Semina¨ ıve Bayes (Pazzani, 1997) Tree augmented na¨ ıve Bayes (TAN) (Friedman et al., 1997) k -dependence Bayesian classifier ( k -DB) (Sahami, 1996) Markov blanket (Sierra and Larraaga, 1998) Bayesian multinets (Kontkanen et al., 2000) Pedro Larra˜ naga Bayesian Network Classifiers 10 / 52

Introduction Discrete Predictors Validation Summary Introduction Fundamentals Cost matrix: cost ( r , s ) with r predicted class and s true class r , s = 1 , . . . r 0 Minimization of the total cost error (Bayes rule) r 0 � γ ( x ) = arg min cost ( c , k ) P ( c | x 1 , . . . , x n ) c k = 1 In the case of a 0 / 1 loss function: γ ( x ) = arg max P ( c | x 1 , . . . , x n ) c Pedro Larra˜ naga Bayesian Network Classifiers 12 / 52

Introduction Discrete Predictors Validation Summary Generative versus discriminative classifiers Generative classifiers P ( c | x 1 , . . . , x n ) obtained in an undirected way P ( c | x 1 , . . . , x n ) ∝ P ( c , x 1 , . . . , x n ) ∝ P ( c ) P ( x 1 , . . . , x n | c ) Parameters estimated from the joint log–likelihood N � � ( x ( 1 ) , c ( 1 ) ) , . . . , ( x ( N ) , c ( N ) ) � log P ( x ( j ) , c ( j ) ) L = j = 1 Discriminant analysis Na¨ ıve Bayes Pedro Larra˜ naga Bayesian Network Classifiers 13 / 52

Introduction Discrete Predictors Validation Summary Generative versus discriminative classifiers Discriminative classifiers Discriminative classifiers P ( c | x 1 , . . . , x n ) directly Parameters are estimated from the conditional log–likelihood: N � � � ( c ( 1 ) | x ( 1 ) ) , . . . , ( c ( N ) | x ( N ) ) log P ( c ( j ) | x ( j ) ) L = j = 1 Logistic regression Pedro Larra˜ naga Bayesian Network Classifiers 14 / 52

Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Classical diagnosis problem. Multiple diseases X 1 . . . X n Y 1 . . . Y m x ( 1 ) x ( 1 ) y ( 1 ) y ( 1 ) ( x ( 1 ) , y ( 1 ) ) . . . . . . n m 1 1 x ( 2 ) x ( 2 ) y ( 2 ) y ( 2 ) ( x ( 2 ) , y ( 2 ) ) . . . . . . n m 1 1 . . . . . . . . . x ( N ) x ( N ) y ( N ) y ( N ) ( x ( N ) , y ( N ) ) . . . . . . n m 1 1 Table: Classical diagnosis problem Pedro Larra˜ naga Bayesian Network Classifiers 15 / 52

Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Classical diagnosis problem. Multiple diseases ( y ∗ 1 , . . . , y ∗ m ) = arg ( y 1 ,..., y m ) P ( Y 1 = y 1 , . . . , Y m = y m | X 1 = x 1 , . . . , X n = x n ) max P ( Y 1 = y 1 , . . . , Y m = y m | X 1 = x 1 , . . . , X n = x n ) ∝ P ( Y 1 = y 1 , . . . , Y m = y m ) P ( X 1 = x 1 , . . . , X n = x n | Y 1 = y 1 , . . . , Y m = y m ) Number of parameters: 2 m − 1 + ( 2 n − 1 ) 2 m m = 3 , n = 10 number of parameters ≃ 8 · 10 3 number of parameters ≃ 33 · 10 6 m = 5 , n = 20 number of parameters ≃ 11 · 10 17 m = 10 , n = 50 Pedro Larra˜ naga Bayesian Network Classifiers 16 / 52

Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Single disease c ∗ = arg max P ( C = c | X 1 = x 1 , . . . , X n = x n ) c P ( C = c | X 1 = x 1 , . . . , X n = x n ) ∝ P ( C = c ) P ( X 1 = x 1 , . . . , X n = x n | C = c ) Number of parameters: ( r 0 − 1 ) + r 0 ( 2 n − 1 ) number of parameters ≃ 3 · 10 3 r 0 = 3 , n = 10 number of parameters ≃ 5 · 10 6 r 0 = 5 , n = 20 number of parameters ≃ 11 · 10 15 r 0 = 10 , n = 50 Pedro Larra˜ naga Bayesian Network Classifiers 17 / 52

Introduction Discrete Predictors Validation Summary From the classical diagnosis problem to the na¨ ıve Bayes Single disease and symptoms conditionally independent given the disease c ∗ = arg max P ( C = c | X 1 = x 1 , . . . , X n = x n ) c n � = arg max P ( C = c ) P ( X i = x i | C = c ) c i = 1 Number of parameters: r 0 − 1 + r 0 n r 0 = 3 , n = 10 , number of parameters = 32 r 0 = 5 , n = 20 , number of parameters = 104 r 0 = 10 , n = 50 , number of parameters = 509 Pedro Larra˜ naga Bayesian Network Classifiers 18 / 52

Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes as a probabilistic graphical model Na¨ ıve Bayes (Minsky, 1961) Predictor variables conditionally independent given C c ∗ = arg max c P ( C = c ) � n i = 1 P ( X i = x i | C = c ) Figure: Structure of a na¨ ıve Bayes Pedro Larra˜ naga Bayesian Network Classifiers 19 / 52

Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes (Minsky, 1961) Pattern recognition versus machine learning Long tradition in the pattern recognition community: Minsky (1961), van Woerkom and Brodman (1961), Warner et al. (1961), Bailey (1964), Boyle et al. (1966), Maron (1961), Duda and Hart (1973) Introduced in the machine learning field by Cestnik et al. (1987). Different names: idiot Bayes : Ohmann et al. (1988) na¨ ıve Bayes : Kononenko (1990) simple Bayes : Gammerman and Thatcher (1991) independent Bayes : Todd and Stamper (1994) Pedro Larra˜ naga Bayesian Network Classifiers 20 / 52

Introduction Discrete Predictors Validation Summary Na¨ ıve Bayes (Minsky, 1961) Theoretical results Minsky (1961). The decision surfaces in a na¨ ıve Bayes classifier with binary predictor variables are hyperplanes Peot (1996). Generalization of the previous result for the case of nominal (no binary) predictor variables Duda and Hart (1973). For ordinal predictor variables, the decision surfaces are polynomials Domingos and Pazzani (1997). Although the estimation of p ( c | x 1 , . . . , x n ) is not well calibrated, na¨ ıve Bayes can obtain competitive accuracies Pedro Larra˜ naga Bayesian Network Classifiers 21 / 52

Outline 1 Introduction 2 Discrete Predictors 3 Validation of - PowerPoint PPT Presentation

Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit ecnica de Madrid Bayesian Networks: From Theory

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Peer Prediction Mechanisms and their Connections to Machine Learning Jens Witkowski ETH

Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic

Network Economics -- Lecture 3: Incentives in online systems II: robust reputation systems and

Evaluation metrics and proper scoring rules Classifier Calibration Tutorial ECML PKDD 2020 Dr.

Bloggers and Bitcoin Prices: A Textual Machine Learning Analysis Eric Ghysels UNC Chapel Hill

t rs r Prts

Pivot to Online For STEM Educators Melanie Meyers BCcampus mmeyers@bccampus.ca @MelaMeyers

=05Z& +,?>"$& A%>&"I$R&,S& !"#$% & LI$>M$>&

Outline 1 Introduction 2 Discrete Predictors 3 Validation of - PowerPoint PPT Presentation

Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit ecnica de Madrid Bayesian Networks: From Theory

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Peer Prediction Mechanisms and their Connections to Machine Learning Jens Witkowski ETH

Learning for Single-Shot Confidence Calibration in Deep Neural Networks through Stochastic

Network Economics -- Lecture 3: Incentives in online systems II: robust reputation systems and

Evaluation metrics and proper scoring rules Classifier Calibration Tutorial ECML PKDD 2020 Dr.

Bloggers and Bitcoin Prices: A Textual Machine Learning Analysis Eric Ghysels UNC Chapel Hill

t rs r Prts

Pivot to Online For STEM Educators Melanie Meyers BCcampus mmeyers@bccampus.ca @MelaMeyers

=05Z&amp; +,?&gt;&quot;$&amp; A%&gt;&amp;&quot;I$R&amp;,S&amp; !&quot;#$% &amp; LI$&gt;M$&gt;&amp;

=05Z& +,?>"$& A%>&"I$R&,S& !"#$% & LI$>M$>&