Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - - PowerPoint PPT Presentation

bayesian decision theory
SMART_READER_LITE
LIVE PREVIEW

Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1 Classification Losses & Risks Discriminant Functions Association Rules Outline


slide-1
SLIDE 1

Classification Losses & Risks Discriminant Functions Association Rules

Bayesian Decision Theory

Steven J Zeil

Old Dominion Univ.

Fall 2010

1

slide-2
SLIDE 2

Classification Losses & Risks Discriminant Functions Association Rules

Outline

1

Classification

2

Losses & Risks

3

Discriminant Functions

4

Association Rules

2

slide-3
SLIDE 3

Classification Losses & Risks Discriminant Functions Association Rules

Bernoulli Distribution

Random variable X ∈ 0, 1 Bernoulli: P{X = 1} = pX

0 (1 − p0)(1−X)

Given a sample X = {xt}N

t=1

we can estimate ˆ p0 =

  • t xt

N

3

slide-4
SLIDE 4

Classification Losses & Risks Discriminant Functions Association Rules

Classification

Input x = [x1, x2], Output C ∈ {0, 1} Prediction: choose C = 1 if P(C = 1| x) > 0.5 C = 0

  • therwise

Equivalently: choose C = 1 if P(C = 1| x) > P(C = 0| x) C = 0

  • therwise

E.g., Credit scoring

inputs are income and savings Output is low-risk versus high-risk

4

slide-5
SLIDE 5

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

5

slide-6
SLIDE 6

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

5

slide-7
SLIDE 7

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

5

slide-8
SLIDE 8

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

5

slide-9
SLIDE 9

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

5

slide-10
SLIDE 10

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

If we knew that an item really was in C, what is the prob that it would have values x?

5

slide-11
SLIDE 11

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

If we knew that an item really was in C, what is the prob that it would have values x? In effect, the reverse of what we are trying to find out.

5

slide-12
SLIDE 12

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

If we knew that an item really was in C, what is the prob that it would have values x? In effect, the reverse of what we are trying to find out.

P( x): evidence

5

slide-13
SLIDE 13

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

If we knew that an item really was in C, what is the prob that it would have values x? In effect, the reverse of what we are trying to find out.

P( x): evidence

If we ignore the classes, how like are we to see a value x?

5

slide-14
SLIDE 14

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule - Multiple Classes

P(Ci| x) = P(Ci)p( x|Ci) p( x) = P(Ci)p( x|Ci) K

k=1 p(

x|Ck)P(Ck) P(Ci) ≥ 0) and K

i=1 P(Ci) = 1

choose Ci if P(Ci| x) = maxk P(Ck| x)

6

slide-15
SLIDE 15

Classification Losses & Risks Discriminant Functions Association Rules

Unequal Risks

In many situations, different actions carry different potential gains and costs. Actions: αi Let λik denote the loss incurred by taking action αi when the current state is actually in Ck Expected risk of taking action αi: R(αi| x) =

K

  • k=1

λikP(Ck| x)

This is simply the expected value of the loss function given that we have chosen αi

Choose αi if R(αi| x) = mink R(αk| x)

7

slide-16
SLIDE 16

Classification Losses & Risks Discriminant Functions Association Rules

Special Case: Equal Risks

Suppose λik = ifi = k 1 ifi = k Expected risk of taking action αi: R(αi| x) =

  • k=1

KλikP(Ck| x) =

  • k=i

P(Ck| x) = (1 − P(Ci| x)) Choose αi if R(αi| x) = mink R(αk| x)

which happens when P(Ci| x) is largest

So if all actions have equal cost, choose the action for the most probable class.

8

slide-17
SLIDE 17

Classification Losses & Risks Discriminant Functions Association Rules

Special Case: Indecision

Suppose that making the wrong decision is more expensive than making no decision at all (i.e., falling back to some other procedure)

Introduce a special reject action αK+1 that denotes the decision to not select a “real” action Cost of a reject is λ, 0 < λ < 1

λik =    ifi = k λ ifi = K + 1 1 ifi = k

9

slide-18
SLIDE 18

Classification Losses & Risks Discriminant Functions Association Rules

The Risk of Indecision

Risk: R(αK+1| x) =

K

  • k=1

λP(Ck| x) = λ R(αi| x) =

K

  • k=i

P(Ck| x) = 1 = P(Ci| x) Choose αi if P(Ci| x) > P(Ck| x) ∀k = i and P(Ci| x) > 1 − λ

  • therwise reject all actions

10

slide-19
SLIDE 19

Classification Losses & Risks Discriminant Functions Association Rules

Discriminant Functions

An alternate vision. Instead of searching for the most probable class we seek a set of functions that divide the space into K decision regions R1, . . . RK Ri =

  • x|gi(

x) = max

k

gk( x)

  • 11
slide-20
SLIDE 20

Classification Losses & Risks Discriminant Functions Association Rules

Why Discriminants?

In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities.

12

slide-21
SLIDE 21

Classification Losses & Risks Discriminant Functions Association Rules

Why Discriminants?

In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution

12

slide-22
SLIDE 22

Classification Losses & Risks Discriminant Functions Association Rules

Why Discriminants?

In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution Later techniques will seek discriminant functions directly.

12

slide-23
SLIDE 23

Classification Losses & Risks Discriminant Functions Association Rules

Bayes Classifier as Discriminant Functions

We can form a discriminant function for the Bayes classifier very simply: gi( x) = −R(αi| x) If we have a constant loss function, we can use gi( x) = P(Ci| x) = P(Ci)p( x|Ci) p( x)

13

slide-24
SLIDE 24

Classification Losses & Risks Discriminant Functions Association Rules

Bayes Classifier as Discriminant Functions (cont.)

gi( x) = P(Ci)p( x|Ci) p( x) Because all the gi above would have the same denominator, we could alternatively do: gi( x) = P(Ci)p( x|Ci)

14

slide-25
SLIDE 25

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

15

slide-26
SLIDE 26

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

15

slide-27
SLIDE 27

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest)

15

slide-28
SLIDE 28

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

15

slide-29
SLIDE 29

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

15

slide-30
SLIDE 30

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

15

slide-31
SLIDE 31

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

15

slide-32
SLIDE 32

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

15

slide-33
SLIDE 33

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

If X and Y are indep., lift should be 1

15

slide-34
SLIDE 34

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

If X and Y are indep., lift should be 1 Lift > 1 means that having X makes Y more likely

15

slide-35
SLIDE 35

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

If X and Y are indep., lift should be 1 Lift > 1 means that having X makes Y more likely Lift < 1 means that having X makes Y less likely

15

slide-36
SLIDE 36

Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Support(X → Y ) ≡ P(X, Y ) Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

Support and confidence are more common Support and lift are symmetric

16

slide-37
SLIDE 37

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y

17

slide-38
SLIDE 38

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y Support({Xi} → Y ) ≡ P(X1, X2, . . . , Y )

17

slide-39
SLIDE 39

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y Support({Xi} → Y ) ≡ P(X1, X2, . . . , Y ) Confidence({Xi} → Y ) ≡ P(Y |{Xi}) = P(X1,X2,...,Y )

P(X1,X2,...)

17

slide-40
SLIDE 40

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y Support({Xi} → Y ) ≡ P(X1, X2, . . . , Y ) Confidence({Xi} → Y ) ≡ P(Y |{Xi}) = P(X1,X2,...,Y )

P(X1,X2,...)

Let’s say we have a large database of tuples {X j

i }j

17

slide-41
SLIDE 41

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y Support({Xi} → Y ) ≡ P(X1, X2, . . . , Y ) Confidence({Xi} → Y ) ≡ P(Y |{Xi}) = P(X1,X2,...,Y )

P(X1,X2,...)

Let’s say we have a large database of tuples {X j

i }j

We want to find rules with support and confidence above designated thresholds

17

slide-42
SLIDE 42

Classification Losses & Risks Discriminant Functions Association Rules

Agarwal’s Apriori Algorithm

1 Start by finding {Xi} with high support 18

slide-43
SLIDE 43

Classification Losses & Risks Discriminant Functions Association Rules

Agarwal’s Apriori Algorithm

1 Start by finding {Xi} with high support 2 Then find an association rule {Xi}i=k → Xk, using the values

in each such tuple, that has high confidence

18

slide-44
SLIDE 44

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Subsets

1 Start by finding {Xi} with high support

Support({Xi} → Y ) ≡ P(X1, X2, . . .)

2 Then find an association rule {Xi}i=k → Xk, using the values

in each such tuple, that has high confidence

19

slide-45
SLIDE 45

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Subsets

1 Start by finding {Xi} with high support

Support({Xi} → Y ) ≡ P(X1, X2, . . .) If {Xi} has high support, then all subsets of it must have high support.

2 Then find an association rule {Xi}i=k → Xk, using the values

in each such tuple, that has high confidence

19

slide-46
SLIDE 46

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Support

Support({Xi} → Y ) ≡ P(X1, X2, . . .)

1 Start with “frequent” (high support) single items 2 By induction, 1

Given a set of frequent k-item sets,

2

Construct all candidate k + 1-item sets

3

Make a pass evaluating the support of these candidates, discarding low-support sets.

3 Continue until no new sets discovered of until all inputsare

gathered into one set.

20

slide-47
SLIDE 47

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 21

slide-48
SLIDE 48

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

21

slide-49
SLIDE 49

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

21

slide-50
SLIDE 50

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 21

slide-51
SLIDE 51

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 1

Given a set of rules with j consequents,

21

slide-52
SLIDE 52

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 1

Given a set of rules with j consequents,

2

Construct all candidates with j + 1 consequents

21

slide-53
SLIDE 53

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 1

Given a set of rules with j consequents,

2

Construct all candidates with j + 1 consequents

3

Discard those with low confidence

21

slide-54
SLIDE 54

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 1

Given a set of rules with j consequents,

2

Construct all candidates with j + 1 consequents

3

Discard those with low confidence

3 Repeat until no higher-consequent rules are found 21