Outline Classification 1 Bayesian Decision Theory Losses & - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Classification 1 Bayesian Decision Theory Losses & - - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Classification Losses & Risks Discriminant Functions Association Rules Outline Classification 1 Bayesian Decision Theory Losses & Risks 2 Steven J Zeil


slide-1
SLIDE 1

Classification Losses & Risks Discriminant Functions Association Rules

Bayesian Decision Theory

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Classification Losses & Risks Discriminant Functions Association Rules

Outline

1

Classification

2

Losses & Risks

3

Discriminant Functions

4

Association Rules

2 Classification Losses & Risks Discriminant Functions Association Rules

Bernoulli Distribution

Random variable X ∈ 0, 1 Bernoulli: P{X = 1} = pX

0 (1 − p0)(1−X)

Given a sample X = {xt}N

t=1

we can estimate ˆ p0 =

  • t xt

N

3 Classification Losses & Risks Discriminant Functions Association Rules

Classification

Input x = [x1, x2], Output C ∈ {0, 1} Prediction: choose C = 1 if P(C = 1| x) > 0.5 C = 0

  • therwise

Equivalently: choose C = 1 if P(C = 1| x) > P(C = 0| x) C = 0

  • therwise

E.g., Credit scoring

inputs are income and savings Output is low-risk versus high-risk

4

slide-2
SLIDE 2

Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule

P(C| x) = P(C)p( x|C) p( x) P(C| x): posterior probability

Given that we have learned something ( x), what is the prob that x is in class C?

P(C): prior probability

What would we expect for the prob of getting something in C if we had no info about the specific case?

P( x|C): likelihood

If we knew that an item really was in C, what is the prob that it would have values x? In effect, the reverse of what we are trying to find out.

P( x): evidence

If we ignore the classes, how like are we to see a value x?

5 Classification Losses & Risks Discriminant Functions Association Rules

Bayes’ Rule - Multiple Classes

P(Ci| x) = P(Ci)p( x|Ci) p( x) = P(Ci)p( x|Ci) K

k=1 p(

x|Ck)P(Ck) P(Ci) ≥ 0) and K

i=1 P(Ci) = 1

choose Ci if P(Ci| x) = maxk P(Ck| x)

6 Classification Losses & Risks Discriminant Functions Association Rules

Unequal Risks

In many situations, different actions carry different potential gains and costs. Actions: αi Let λik denote the loss incurred by taking action αi when the current state is actually in Ck Expected risk of taking action αi: R(αi| x) =

K

  • k=1

λikP(Ck| x)

This is simply the expected value of the loss function given that we have chosen αi

Choose αi if R(αi| x) = mink R(αk| x)

7 Classification Losses & Risks Discriminant Functions Association Rules

Special Case: Equal Risks

Suppose λik = ifi = k 1 ifi = k Expected risk of taking action αi: R(αi| x) =

  • k=1

KλikP(Ck| x) =

  • k=i

P(Ck| x) = (1 − P(Ci| x)) Choose αi if R(αi| x) = mink R(αk| x)

which happens when P(Ci| x) is largest

So if all actions have equal cost, choose the action for the most probable class.

8

slide-3
SLIDE 3

Classification Losses & Risks Discriminant Functions Association Rules

Special Case: Indecision

Suppose that making the wrong decision is more expensive than making no decision at all (i.e., falling back to some other procedure)

Introduce a special reject action αK+1 that denotes the decision to not select a “real” action Cost of a reject is λ, 0 < λ < 1

λik =    ifi = k λ ifi = K + 1 1 ifi = k

9 Classification Losses & Risks Discriminant Functions Association Rules

The Risk of Indecision

Risk: R(αK+1| x) =

K

  • k=1

λP(Ck| x) = λ R(αi| x) =

K

  • k=i

P(Ck| x) = 1 = P(Ci| x) Choose αi if P(Ci| x) > P(Ck| x) ∀k = i and P(Ci| x) > 1 − λ

  • therwise reject all actions

10 Classification Losses & Risks Discriminant Functions Association Rules

Discriminant Functions

An alternate vision. Instead of searching for the most probable class we seek a set of functions that divide the space into K decision regions R1, . . . RK Ri =

  • x|gi(

x) = max

k

gk( x)

  • 11

Classification Losses & Risks Discriminant Functions Association Rules

Why Discriminants?

In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution Later techniques will seek discriminant functions directly.

12

slide-4
SLIDE 4

Classification Losses & Risks Discriminant Functions Association Rules

Bayes Classifier as Discriminant Functions

We can form a discriminant function for the Bayes classifier very simply: gi( x) = −R(αi| x) If we have a constant loss function, we can use gi( x) = P(Ci| x) = P(Ci)p( x|Ci) p( x)

13 Classification Losses & Risks Discriminant Functions Association Rules

Bayes Classifier as Discriminant Functions (cont.)

gi( x) = P(Ci)p( x|Ci) p( x) Because all the gi above would have the same denominator, we could alternatively do: gi( x) = P(Ci)p( x|Ci)

14 Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Suppose that we want to learn an association rule X → Y

e.g., customers who buy X often buy Y as well

Three common measures: support, confidence, & lift (a.k.a., interest) Support(X → Y ) ≡ P(X, Y )

e.g., #customerswhoboughtboth #customers

Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

e.g., #customerswhoboughtboth #customerswhoboughtX

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

If X and Y are indep., lift should be 1 Lift > 1 means that having X makes Y more likely Lift < 1 means that having X makes Y less likely

15 Classification Losses & Risks Discriminant Functions Association Rules

Association Rules

Support(X → Y ) ≡ P(X, Y ) Confidence(X → Y ) ≡ P(Y |X) = P(X,Y )

P(X)

Lift(X → Y ) ≡

P(X,Y ) P(X)P(Y ) = P(Y |X) P(Y )

Support and confidence are more common Support and lift are symmetric

16

slide-5
SLIDE 5

Classification Losses & Risks Discriminant Functions Association Rules

Generalized Association Rules

Suppose that we want to learn an association rule {Xi} → Y Support({Xi} → Y ) ≡ P(X1, X2, . . . , Y ) Confidence({Xi} → Y ) ≡ P(Y |{Xi}) = P(X1,X2,...,Y )

P(X1,X2,...)

Let’s say we have a large database of tuples {X j

i }j

We want to find rules with support and confidence above designated thresholds

17 Classification Losses & Risks Discriminant Functions Association Rules

Agarwal’s Apriori Algorithm

1 Start by finding {Xi} with high support 2 Then find an association rule {Xi}i=k → Xk, using the values

in each such tuple, that has high confidence

18 Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Subsets

1 Start by finding {Xi} with high support

Support({Xi} → Y ) ≡ P(X1, X2, . . .) If {Xi} has high support, then all subsets of it must have high support.

2 Then find an association rule {Xi}i=k → Xk, using the values

in each such tuple, that has high confidence

19 Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Support

Support({Xi} → Y ) ≡ P(X1, X2, . . .)

1 Start with “frequent” (high support) single items 2 By induction, 1

Given a set of frequent k-item sets,

2

Construct all candidate k + 1-item sets

3

Make a pass evaluating the support of these candidates, discarding low-support sets.

3 Continue until no new sets discovered of until all inputsare

gathered into one set.

20

slide-6
SLIDE 6

Classification Losses & Risks Discriminant Functions Association Rules

Apriori Algorithm - Finding Confidence

Confidence({Xi} → Y ) = P(X1,X2,...,Y )

P(X1,X2,...)

Given a k-item set with high support:

1 Try each possible single-consequent rule. 1

Move a term from the antecedent into the consequent

2

Evaluate confidence and discard if low.

2 By induction, 1

Given a set of rules with j consequents,

2

Construct all candidates with j + 1 consequents

3

Discard those with low confidence

3 Repeat until no higher-consequent rules are found 21