Outline 1 Introduction 2 Discrete Predictors 3 Validation of - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 1 Introduction 2 Discrete Predictors 3 Validation of - - PowerPoint PPT Presentation

Introduction Discrete Predictors Validation Summary B AYESIAN N ETWORK C LASSIFIERS Pedro Larra naga Computational Intelligence Group Artificial Intelligence Department Universidad Polit ecnica de Madrid Bayesian Networks: From Theory


slide-1
SLIDE 1

Introduction Discrete Predictors Validation Summary

BAYESIAN NETWORK CLASSIFIERS

Pedro Larra˜ naga

Computational Intelligence Group Artificial Intelligence Department Universidad Polit´ ecnica de Madrid

Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia Pedro Larra˜ naga Bayesian Network Classifiers 1 / 52

slide-2
SLIDE 2

Introduction Discrete Predictors Validation Summary

Outline

1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary

Pedro Larra˜ naga Bayesian Network Classifiers 2 / 52

slide-3
SLIDE 3

Introduction Discrete Predictors Validation Summary

Outline

1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary

Pedro Larra˜ naga Bayesian Network Classifiers 3 / 52

slide-4
SLIDE 4

Introduction Discrete Predictors Validation Summary

Supervised classification

X1 . . . Xn C (x(1), c(1)) x(1)

1

. . . x(1)

n

c(1) (x(2), c(2)) x(2)

1

. . . x(2)

n

c(2) . . . . . . . . . (x(N), c(N)) x(N)

1

. . . x(N)

n

c(N) x(N+1) x(N+1)

1

. . . x(N+1)

n

???

Pedro Larra˜ naga Bayesian Network Classifiers 4 / 52

slide-5
SLIDE 5

Introduction Discrete Predictors Validation Summary

Applications domains

Supervised pattern recognition Decision support systems for diagnosis and prognosis Loan decision Spam detection Prediction of sport results Hand writing character recognition Weather forecast Prediction of the secondary structure of proteins . . .

Pedro Larra˜ naga Bayesian Network Classifiers 5 / 52

slide-6
SLIDE 6

Introduction Discrete Predictors Validation Summary

Optical character recognition

Figure: Hand writing character recognition

Pedro Larra˜ naga Bayesian Network Classifiers 6 / 52

slide-7
SLIDE 7

Introduction Discrete Predictors Validation Summary

Weather forecast

Figure: Methereology

Pedro Larra˜ naga Bayesian Network Classifiers 7 / 52

slide-8
SLIDE 8

Introduction Discrete Predictors Validation Summary

Computational biology

Figure: Prediction of the secondary structure of proteins

Pedro Larra˜ naga Bayesian Network Classifiers 8 / 52

slide-9
SLIDE 9

Introduction Discrete Predictors Validation Summary

Paradigms for supervised classification

Statistical and machine learning Bayesian networks (Pearl, 1988) Classification trees (Quinlan, 1986; Breiman et al. 1984) Classifier systems (Holland, 1975) Discriminant analysis (Fisher, 1936) k–NN classifiers (Covert and Hart, 1967; Dasarathy, 1991) Logistic regression (Hosmer and Lemeshov, 1989) Neural networks (McCulloch and Pitts, 1943) Rule induction (Clark and Nibblet, 1989; Cohen, 1995; Holte, 1993) Support vector machines (Cristianini and Shawe–Taylor, 2000)

Pedro Larra˜ naga Bayesian Network Classifiers 9 / 52

slide-10
SLIDE 10

Introduction Discrete Predictors Validation Summary

Bayesian network based classifiers

Hierarchy of classifiers Na¨ ıve Bayes (NB) (Minsky, 1961) Semina¨ ıve Bayes (Pazzani, 1997) Tree augmented na¨ ıve Bayes (TAN) (Friedman et al., 1997) k-dependence Bayesian classifier (k-DB) (Sahami, 1996) Markov blanket (Sierra and Larraaga, 1998) Bayesian multinets (Kontkanen et al., 2000)

Pedro Larra˜ naga Bayesian Network Classifiers 10 / 52

slide-11
SLIDE 11

Introduction Discrete Predictors Validation Summary

Outline

1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary

Pedro Larra˜ naga Bayesian Network Classifiers 11 / 52

slide-12
SLIDE 12

Introduction Discrete Predictors Validation Summary

Introduction

Fundamentals Cost matrix: cost(r, s) with r predicted class and s true class r, s = 1, . . . r0 Minimization of the total cost error (Bayes rule) γ(x) = arg min

c r0

  • k=1

cost(c, k)P(c|x1, . . . , xn)

In the case of a 0/1 loss function: γ(x) = arg max

c

P(c|x1, . . . , xn)

Pedro Larra˜ naga Bayesian Network Classifiers 12 / 52

slide-13
SLIDE 13

Introduction Discrete Predictors Validation Summary

Generative versus discriminative classifiers

Generative classifiers P(c|x1, . . . , xn) obtained in an undirected way P(c|x1, . . . , xn) ∝ P(c, x1, . . . , xn) ∝ P(c)P(x1, . . . , xn|c) Parameters estimated from the joint log–likelihood L

  • (x(1), c(1)), . . . , (x(N), c(N))
  • =

N

  • j=1

log P(x(j), c(j))

Discriminant analysis Na¨ ıve Bayes

Pedro Larra˜ naga Bayesian Network Classifiers 13 / 52

slide-14
SLIDE 14

Introduction Discrete Predictors Validation Summary

Generative versus discriminative classifiers

Discriminative classifiers Discriminative classifiers P(c|x1, . . . , xn) directly Parameters are estimated from the conditional log–likelihood: L

  • (c(1)|x(1)), . . . , (c(N)|x(N))
  • =

N

  • j=1

log P(c(j)|x(j))

Logistic regression

Pedro Larra˜ naga Bayesian Network Classifiers 14 / 52

slide-15
SLIDE 15

Introduction Discrete Predictors Validation Summary

From the classical diagnosis problem to the na¨ ıve Bayes

Classical diagnosis problem. Multiple diseases X1 . . . Xn Y1 . . . Ym (x(1), y(1)) x(1)

1

. . . x(1)

n

y(1)

1

. . . y(1)

m

(x(2), y(2)) x(2)

1

. . . x(2)

n

y(2)

1

. . . y(2)

m

. . . . . . . . . (x(N), y(N)) x(N)

1

. . . x(N)

n

y(N)

1

. . . y(N)

m

Table: Classical diagnosis problem

Pedro Larra˜ naga Bayesian Network Classifiers 15 / 52

slide-16
SLIDE 16

Introduction Discrete Predictors Validation Summary

From the classical diagnosis problem to the na¨ ıve Bayes

Classical diagnosis problem. Multiple diseases

(y∗

1 , . . . , y∗ m) = arg

max

(y1,...,ym) P(Y1 = y1, . . . , Ym = ym|X1 = x1, . . . , Xn = xn)

P(Y1 = y1, . . . , Ym = ym|X1 = x1, . . . , Xn = xn) ∝ P(Y1 = y1, . . . , Ym = ym)P(X1 = x1, . . . , Xn = xn|Y1 = y1, . . . , Ym = ym)

Number of parameters: 2m − 1 + (2n − 1)2m m = 3, n = 10 number of parameters ≃ 8 · 103 m = 5, n = 20 number of parameters ≃ 33 · 106 m = 10, n = 50 number of parameters ≃ 11 · 1017

Pedro Larra˜ naga Bayesian Network Classifiers 16 / 52

slide-17
SLIDE 17

Introduction Discrete Predictors Validation Summary

From the classical diagnosis problem to the na¨ ıve Bayes

Single disease

c∗ = arg max

c

P(C = c|X1 = x1, . . . , Xn = xn) P(C = c|X1 = x1, . . . , Xn = xn) ∝ P(C = c)P(X1 = x1, . . . , Xn = xn|C = c)

Number of parameters: (r0 − 1) + r0(2n − 1) r0 = 3, n = 10 number of parameters ≃ 3 · 103 r0 = 5, n = 20 number of parameters ≃ 5 · 106 r0 = 10, n = 50 number of parameters ≃ 11 · 1015

Pedro Larra˜ naga Bayesian Network Classifiers 17 / 52

slide-18
SLIDE 18

Introduction Discrete Predictors Validation Summary

From the classical diagnosis problem to the na¨ ıve Bayes

Single disease and symptoms conditionally independent given the disease c∗ = arg max

c

P(C = c|X1 = x1, . . . , Xn = xn) = arg max

c

P(C = c)

n

  • i=1

P(Xi = xi|C = c) Number of parameters: r0 − 1 + r0n r0 = 3, n = 10, number of parameters = 32 r0 = 5, n = 20, number of parameters = 104 r0 = 10, n = 50, number of parameters = 509

Pedro Larra˜ naga Bayesian Network Classifiers 18 / 52

slide-19
SLIDE 19

Introduction Discrete Predictors Validation Summary

Na¨ ıve Bayes as a probabilistic graphical model

Na¨ ıve Bayes (Minsky, 1961) Predictor variables conditionally independent given C c∗ = arg maxc P(C = c) n

i=1 P(Xi = xi|C = c)

Figure: Structure of a na¨ ıve Bayes

Pedro Larra˜ naga Bayesian Network Classifiers 19 / 52

slide-20
SLIDE 20

Introduction Discrete Predictors Validation Summary

Na¨ ıve Bayes (Minsky, 1961)

Pattern recognition versus machine learning Long tradition in the pattern recognition community: Minsky (1961), van Woerkom and Brodman (1961), Warner et al. (1961), Bailey (1964), Boyle et al. (1966), Maron (1961), Duda and Hart (1973) Introduced in the machine learning field by Cestnik et al. (1987). Different names:

idiot Bayes: Ohmann et al. (1988) na¨ ıve Bayes: Kononenko (1990) simple Bayes: Gammerman and Thatcher (1991) independent Bayes: Todd and Stamper (1994)

Pedro Larra˜ naga Bayesian Network Classifiers 20 / 52

slide-21
SLIDE 21

Introduction Discrete Predictors Validation Summary

Na¨ ıve Bayes (Minsky, 1961)

Theoretical results Minsky (1961). The decision surfaces in a na¨ ıve Bayes classifier with binary predictor variables are hyperplanes Peot (1996). Generalization of the previous result for the case of nominal (no binary) predictor variables Duda and Hart (1973). For ordinal predictor variables, the decision surfaces are polynomials Domingos and Pazzani (1997). Although the estimation of p(c|x1, . . . , xn) is not well calibrated, na¨ ıve Bayes can

  • btain competitive accuracies

Pedro Larra˜ naga Bayesian Network Classifiers 21 / 52

slide-22
SLIDE 22

Introduction Discrete Predictors Validation Summary

Seminave Bayes (Pazzani, 1997)

Step 1. Initialize the set of variables to be used to the null set. Classify all the examples as being of the class with higher p(c) Step 2. Repeat in every iteration the best option between: (a) Consider each variable that is not in the model as a new

  • ne to be included in it.

The chosen variable should be added as conditionally independent of the variables in the model given the class (b) Join each variable not present in the model with a variable that is in the model Evaluate each possible option by means of the estimation

  • f the percentage of cases well classified

Until no improvement can be obtained Figure: Pseudocode of the forward sequential selection and joining (FSSJ) algorithm (Pazzani, 1997)

Pedro Larra˜ naga Bayesian Network Classifiers 22 / 52

slide-23
SLIDE 23

Introduction Discrete Predictors Validation Summary

Building process (FSSJ)

(a) (b) (c)

(a) The selective naive Bayes with X2 has yielded the best accuracy. (b) After building the models with these sets of predictor variables: {X2, X1}, {X2, X3}, {X2, X4}, {(X2, X1)}, {(X2, X3)} and {(X2, X4)}, the last option is selected according to its accuracy. (c) The winner model out of {X1, (X2, X4)}, {X3, (X2, X4)}, {(X1, X2, X4)}, and {(X3, X2, X4)}. The accuracy does not improve with {X1, X3, (X2, X4)}, {(X1, X3), (X2, X4)}, and {X3, (X1, X2, X4)}, and the process stops Pedro Larra˜ naga Bayesian Network Classifiers 23 / 52

slide-24
SLIDE 24

Introduction Discrete Predictors Validation Summary

Tree augmented nave Bayes (TAN) (Friedman et al., 1997)

Mutual information between X e Y MI(X, Y) =

rX

  • i=1

rY

  • j=1

P(xi, yj) log P(xi, yj) P(xi)P(yj) measures the uncertainty reduction in one of the variables when the value of the other variable is known Mutual information between X and Y conditioned by C MI(X, Y|C) =

r0

  • k=1

P(ck)MI(X, Y|C = ck) =

rX

  • i=1

rY

  • j=1

r0

  • k=1

P(xi, yj, ck) log P(xi, yj|ck) P(xi|ck)P(yj|ck)

Pedro Larra˜ naga Bayesian Network Classifiers 24 / 52

slide-25
SLIDE 25

Introduction Discrete Predictors Validation Summary

TAN algorithm (Friedman et al. 1997)

Step 1. Calculate I(Xi, Xj | C) with i < j i, j = 1, . . . , n Step 2. Build an undirected complete graph, where the nodes correspond to the predictor variables: X1, . . . , Xn. Assign to the edge connecting variables Xi and Xj a weight given by I(Xi, Xj | C) Step 3. Assign the largest two branches to the tree to be constructed Step 4. Examine the next largest branch and add it to the tree unless it forms a loop. In the latter case discard it and examine the next largest branch Step 5. Repeat Step 4 until n − 1 branches have been added Step 6. Transform the undirected graph in a directed one, by choosing a random variable as the root Step 7. Build the TAN structure adding a node labelled as C, and later add one arc from C to each of the predictor variables Figure: Pseudocode of tree augmented na¨

ıve Bayes (TAN) algorithm (Friedman et al. 1997) Pedro Larra˜ naga Bayesian Network Classifiers 25 / 52

slide-26
SLIDE 26

Introduction Discrete Predictors Validation Summary

TAN building process

I(X1, X3|C) > I(X2, X4|C) > I(X1, X2|C) > I(X3, X4|C) > I(X1, X4|C) > I(X3, X5|C) > I(X1, X5|C) > I(X2, X3|C) > I(X2, X5|C) > I(X4, X5|C) (a) (b) (c) (c) (d) (e) (g) (h) (a-c) Edges are added according to conditional mutual information quantities arranged in ascending order. (d-e) Edges X3 − X4 and X1 − X4 (dashed lines) cannot be added since they form a cycle. (f) Maximum weighted spanning tree. (g) The directed tree obtained by choosing X1 as the root node. (h) Final TAN structure Pedro Larra˜ naga Bayesian Network Classifiers 26 / 52

slide-27
SLIDE 27

Introduction Discrete Predictors Validation Summary

k-DB algorithm (Sahami, 1996)

Scheme of k-DB Calculate I(Xi, C) and I(Xi, Xj|C) for each pair of variables At every iteration, add the Xmax variable not included in the model with the highest I(Xi, C) Set C and the k variables with the highest I(Xj, Xmax|C) as the parents of Xmax

Pedro Larra˜ naga Bayesian Network Classifiers 27 / 52

slide-28
SLIDE 28

Introduction Discrete Predictors Validation Summary

k-DB building process

Figure: Example for k-DB with k = 2. I(X3, C) > I(X1, C) > I(X4, C) > I(X5, C) > I(X2, C) I(X3, X4|C) > I(X2, X5|C) > I(X1, X3|C) > I(X1, X2|C) > I(X2, X4|C) > I(X2, X3|C) > I(X1, X4|C) > I(X4, X5|C) > I(X1, X5|C) > I(X3, X5|C)

Pedro Larra˜ naga Bayesian Network Classifiers 28 / 52

slide-29
SLIDE 29

Introduction Discrete Predictors Validation Summary

k-DB building process

Figure: Example for k-DB with k = 2. I(X3, C) > I(X1, C) > I(X4, C) > I(X5, C) > I(X2, C) I(X3, X4|C) > I(X2, X5|C) > I(X1, X3|C) > I(X1, X2|C) > I(X2, X4|C) > I(X2, X3|C) > I(X1, X4|C) > I(X4, X5|C) > I(X1, X5|C) > I(X3, X5|C)

Pedro Larra˜ naga Bayesian Network Classifiers 29 / 52

slide-30
SLIDE 30

Introduction Discrete Predictors Validation Summary

k-DB building process

Figure: Example for k-DB with k = 2. I(X3, C) > I(X1, C) > I(X4, C) > I(X5, C) > I(X2, C) I(X3, X4|C) > I(X2, X5|C) > I(X1, X3|C) > I(X1, X2|C) > I(X2, X4|C) > I(X2, X3|C) > I(X1, X4|C) > I(X4, X5|C) > I(X1, X5|C) > I(X3, X5|C) P(c|x1, x2, x3, x4, x5) ∝ P(c)P(x1|x3, c)P(x2|x1, x5, c)P(x3|c)P(x4|x1, x3, c)P(x5|x1, x4, c)

Pedro Larra˜ naga Bayesian Network Classifiers 30 / 52

slide-31
SLIDE 31

Introduction Discrete Predictors Validation Summary

Markov blanket (Sierra and Larraaga, 1998)

Parents, children and parents of the children The value of a variable only depends on the value of: its parents, its children, and the parent of the children This is called the Markov blanket (MB) of the variable P(C|MB(C), X \ MB(C)) = P(C|MB(C)) Figure: Markov blanket of variable A

Pedro Larra˜ naga Bayesian Network Classifiers 31 / 52

slide-32
SLIDE 32

Introduction Discrete Predictors Validation Summary

Markov blanket

Prediction with a Markov blanket Figure: Example of a Markov blanket for C. P(c|x1, . . . , x9) = P(c|x1, x2, x3, x4, x6) ∝ P(c, x1, x2, x3, x4, x6) = P(x6)P(x3)P(x4)P(c|x6)P(x1|x3, c)P(x2|c, x4)

Pedro Larra˜ naga Bayesian Network Classifiers 32 / 52

slide-33
SLIDE 33

Introduction Discrete Predictors Validation Summary

Bayesian multinet (Kontkanen et al. 2000)

Prediction with a Bayesian multinet Figure: Example of a Bayesian multinet for classification

P(C = 0|x, y, v, z) ∝ P(C = 0)P(x|C = 0)P(y|x, C = 0)P(v|x, C = 0)P(z|y, C = 0) P(C = 1|x, y, v, z) ∝ P(C = 1)P(x|C = 1)P(y|x, C = 1)P(v|x, C = 1)P(z|v, C = 1)

Pedro Larra˜ naga Bayesian Network Classifiers 33 / 52

slide-34
SLIDE 34

Introduction Discrete Predictors Validation Summary

Outline

1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary

Pedro Larra˜ naga Bayesian Network Classifiers 34 / 52

slide-35
SLIDE 35

Introduction Discrete Predictors Validation Summary

Measuring the performance of a supervised classifier

Comparison criteria Accuracy (Nadeau and Bengio, 2003) Brier score Area under the ROC curve (Zou, 2002) Complexity of the inductor Transparency of the model Simplicity of the model Comprehensibility of the model ...

Pedro Larra˜ naga Bayesian Network Classifiers 35 / 52

slide-36
SLIDE 36

Introduction Discrete Predictors Validation Summary

Measuring the performance of a supervised classifier

Confusion matrix C True class +

  • CM Predicted class

+ a b

  • c

d Figures of merit

Accuracy:

a+d a+b+c+d

Error rate:

c+b a+b+c+d

Rate of true positives (sensibility):

a a+c

Rate of true negatives (specificity):

d b+d Pedro Larra˜ naga Bayesian Network Classifiers 36 / 52

slide-37
SLIDE 37

Introduction Discrete Predictors Validation Summary

Estimating methods. No honest

ˆ pM = 1 N

N

  • i=1

δ(c(i) = c(i)

M )

Pedro Larra˜ naga Bayesian Network Classifiers 37 / 52

slide-38
SLIDE 38

Introduction Discrete Predictors Validation Summary

Estimating methods. Train and test

ˆ pM = 1 N − N1

N−N1

  • i=1

δ(c(N1+i) = c(N1+i)

M

)

Pedro Larra˜ naga Bayesian Network Classifiers 38 / 52

slide-39
SLIDE 39

Introduction Discrete Predictors Validation Summary

Estimating methods. Train and test several times

ˆ pM = 1 B

B

  • i=1

ˆ pi

Pedro Larra˜ naga Bayesian Network Classifiers 39 / 52

slide-40
SLIDE 40

Introduction Discrete Predictors Validation Summary

Estimating methods. k–fold cross validation

ˆ pM = 1 k

k

  • i=1

ˆ pi

Pedro Larra˜ naga Bayesian Network Classifiers 40 / 52

slide-41
SLIDE 41

Introduction Discrete Predictors Validation Summary

Estimating methods. 0, 632 bootstrapping

ˆ pa = 1 B

B

  • i=1

ˆ pi,a ˆ p0 = 1 B

B

  • i=1

ˆ pi,0 ˆ pM = ˆ p0.632Bo = (0.368ˆ pa + 0.632ˆ p0)

Pedro Larra˜ naga Bayesian Network Classifiers 41 / 52

slide-42
SLIDE 42

Introduction Discrete Predictors Validation Summary

About the different estimation methods

Train and test: when N is large Train and test several times: there is no control about which cases are used for training (or testing) the classifier k–fold cross validation: unbiased estimation of the probability of success of the classifier, but with a high variance 0.632 bootstrapping: asymptotically unbiased and with small variance

Pedro Larra˜ naga Bayesian Network Classifiers 42 / 52

slide-43
SLIDE 43

Introduction Discrete Predictors Validation Summary

Brier score

Calibrating a probabilistic classifier Calibration measure for a classifier that assigns a posterior probability for each value of the class Assuming that the true class value for x is C = 0, we are interested in distinguishing: p(CM = 0|x) = 0.51 and p(CM = 0|x) = 0.97 We would like classifiers that are almost sure when taking decisions These classifiers will have a lower value for the Brier measure

Pedro Larra˜ naga Bayesian Network Classifiers 43 / 52

slide-44
SLIDE 44

Introduction Discrete Predictors Validation Summary

Brier score

X1 . . . Xn C p(CM = 0|x) p(CM = 1|x) (x(1), c(1)) x(1)

1

. . . x(1)

n

1 0, 18 0, 82 (x(2), c(2)) x(2)

1

. . . x(2)

n

0, 51 0, 49 . . . . . . . . . . . . (x(N), c(N)) x(N)

1

. . . x(N)

n

1 0, 55 0.45 B = 1 N

N

  • i=1

2

  • c=1

[p(CM = c|x(i)) − δ(c(i), c(i)

M )]2

B = 1

N [(0, 18 − 0)2 + (0, 82 − 1)2 + (0.51 − 1)2 + (0.49 − 0)2 +

... + (0, 55 − 0)2 + (0.45 − 1)2]

Pedro Larra˜ naga Bayesian Network Classifiers 44 / 52

slide-45
SLIDE 45

Introduction Discrete Predictors Validation Summary

Cost sensitive classification

Total costs vs accuracy In general the costs of false positive and false negative are not the same Interesting to search for the classifier providing the smallest total cost This classifier can be different to the one with the highest accuracy (probability of correct classification)

Pedro Larra˜ naga Bayesian Network Classifiers 45 / 52

slide-46
SLIDE 46

Introduction Discrete Predictors Validation Summary

Cost sensitive classification

Pedro Larra˜ naga Bayesian Network Classifiers 46 / 52

slide-47
SLIDE 47

Introduction Discrete Predictors Validation Summary

ROC curve

(1- sensibility, specificity) In some situations it is difficult to estimate the cost matrix Receiver operating characteristic (ROC) curve based analysis

In 1970 ROC curve is used for the first time in a medical diagnosis problem In the 90s in machine learning CM = 1 ⇐ ⇒ p(C = 1|x) > t, where t denotes a threshold ROC curve: For each t in the interval [0, 1] we represent the corresponding bidimensional point (false positive ratet, true positive ratet)

Pedro Larra˜ naga Bayesian Network Classifiers 47 / 52

slide-48
SLIDE 48

Introduction Discrete Predictors Validation Summary

ROC curve

Pedro Larra˜ naga Bayesian Network Classifiers 48 / 52

slide-49
SLIDE 49

Introduction Discrete Predictors Validation Summary

ROC curve. Comparing two classifier with the area under the ROC curve (AUC)

Pedro Larra˜ naga Bayesian Network Classifiers 49 / 52

slide-50
SLIDE 50

Introduction Discrete Predictors Validation Summary

Outline

1 Introduction 2 Discrete Predictors 3 Validation of Supervised Classifiers 4 Summary

Pedro Larra˜ naga Bayesian Network Classifiers 50 / 52

slide-51
SLIDE 51

Introduction Discrete Predictors Validation Summary

Bayesian network based classifiers

Provides a posterior probability for each possible value of the class Competitive results (accuracy, Brier, ROC) with the state of the art in supervised classifiers Knowledge discovery from the structure of the Bayesian network Honest validation is mandatory

Pedro Larra˜ naga Bayesian Network Classifiers 51 / 52

slide-52
SLIDE 52

Introduction Discrete Predictors Validation Summary

BAYESIAN NETWORK CLASSIFIERS

Pedro Larra˜ naga

Computational Intelligence Group Artificial Intelligence Department Universidad Polit´ ecnica de Madrid

Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia Pedro Larra˜ naga Bayesian Network Classifiers 52 / 52