An Example Is this an Australian Flag? What is the probability this - - PDF document

an example
SMART_READER_LITE
LIVE PREVIEW

An Example Is this an Australian Flag? What is the probability this - - PDF document

Statistical Pattern Recognition Prepared by Xiao-Ping Zhang and Ling Guan References: Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., 1. San Diego, Academic Press, 1990. Scott, Multivariate Density Estimation: Theory,


slide-1
SLIDE 1

1

Statistical Pattern Recognition

Prepared by Xiao-Ping Zhang and Ling Guan

References:

1.

Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, Academic Press, 1990.

2.

Scott, Multivariate Density Estimation: Theory, Practice, and

  • Visualization. New York: John Wiley, 1992.

3.

  • R. Sharma et al, “Toward multimodal human-computer interface,”
  • Proc. IEEE, May 1998.

2

An Example

 Is this an Australian Flag?  What is the probability this is an Australian Flag?  From what you see to guess what it belongs to.  Can computers do it in a similar way?

slide-2
SLIDE 2

Bayesian Decision Theory

  • In a nutshell...
  • Use a training set of samples of the data to

derive some basic probabilistic relationships

  • Use these probabilities to make a decision

about how to categorise the next (unseen) data sample

3

Classification Example

Sea Bass Salmon

4

slide-3
SLIDE 3

Simple approach:

  • Consider a world comprised of

states of nature ωi

  • State of nature ωi = category/class
  • Assume only 2 types of fish:
  • Basic way of deciding which

category an unknown fish belongs to:

  • Use PRIOR knowledge

5

Simple approach [2]

6

slide-4
SLIDE 4

Better Approach?

recap: observations aid decisions

7

Bayesian Decision Theory (BDT)

Use observations to condition our decisions rather than rely on fixed thresholds based on prior knowledge

8

slide-5
SLIDE 5

Conditional Probability

P(x|ω1) = P(ω1|x) = ω1 x

9

Conditional Probability [2]

...

ω1 ω2 ω3 ω4 ωc More generally:

x

10

slide-6
SLIDE 6

Bayes Rule

P( ωi | x ) = ____________________

where p( x ) = Σi=1..c [ p( x | ωi ) · P(ωi) ]

p( x | ωi ) · P( ωi ) p( x )

11

BDT

  • Model likelihoods from an initial sample set
  • Use likelihood models + BAYES rule to estimate posteriors for a given

sample

  • Use posteriors to make a decision about which class sample belongs to

12

slide-7
SLIDE 7

Decision Theory

  • Concerned with the relationship between

the choice of decision boundary and associated cost

– Threshold = decision boundary

  • Decision boundary

– Chosen to minimise some notion of cost

  • Cost ↔ Error / Loss

13

Cost, Error & Loss

  • In general, we want to reduce the chances of making bad

decisions!

  • What constitutes a bad decision??

14

slide-8
SLIDE 8

Bayes Decision Rule

  • decide ωi if P( ω1 | x ) > P( ω2 | x )

15

What is the error incurred for each decision we make?

  • P(error | x) =

How to minimise error ??

...

ω1 ω2

x

16

slide-9
SLIDE 9

Average probability of error

  • Why is Bayes decision

rule good in this case?

17 18

Bayes Rule

 Posterior probabilities (ωi

Ck )

factor ion normalizat prior likelihood posterior C P C P C p p p C P C p C P

M k k M k k k k k k

    

 

 

1 ) | ( ) ( ) | ( ) ( ) ( ) ( ) | ( ) | (

1 1

x x x x x x

X X X X

slide-10
SLIDE 10

19

Definitions

 P(Ck | x) – posterior, given the observed value x,

the probability x belongs to class Ck

 Px(x | Ck) – likelihood, given that the sample is

from class Ck, the probability x is observed

 P(Ck) – prior, the probability that samples from

class Ck are observed (discrete values)

 Px(x) – normalization factor, the probability x is

  • bserved

20

Minimum Error Probability Decision

 Decision rule  Error probability of misclassification

 Given x, the above decision rule will minimize P(x|Cj)P(Cj) (kick

  • ut the largest one)

k j C P C P C R

j k k k

    ), | ( ) | ( if : x x x

  

  

    

k k j R j j j k k j j k k k j j k e

d C P C P C P C R P C R P P

k

x x x x ) ( ) | ( ) ( ) | ( ) , (

slide-11
SLIDE 11

21

Discriminant Functions

 Define a set of discriminant functions y1(x),…, yM(x)

such that

 Choose  We can also write

k j x y x y C R

j k k k

    ), ( ) ( if : x ) ( ) | ( ) ( ) | ( ) (

k k k k k

C P C p y C P y x x x x

X

  ) ( ln ) | ( ln ) (

k k k

C P C p y   x x

X 22

Maximum Likelihood ((ML) Decision

 If the a priori probabilities P(Ck) of the classes are

not known, we assume that they are uniformly distributed (maximum entropy), P(C1)=P(C2)=…, then

 Likelihood function (maximum likelihood)

k j C p C p C R

j k k k

    ), | ( ) | ( if : x x x ) | ( ln ) (

k

C p L x x

X

slide-12
SLIDE 12

23

The Gaussian Distribution (1)

 PDF  Discrimnant function

 Mahalanobis distance

] ) )( [( ], [ ) ( ) ( 2 1 exp | | ) 2 ( 1 ) | (

1 2 / 1 2 / T T d k

E E C p m x m x x m m x m x x

X

                

 ) ( ln | | ln 2 1 ) ( ) ( 2 1 ) ( ln ) | ( ln ) (

1 k k k T k k k k

C P C P C p y          

m x m x x x

X 24

The Gaussian Distribution (2)

 Assume the distribution of x is iid (independent &

identically distributed), then

becomes an identity matrix

 And the discrimnant function becomes

I   ) ( ln ) ( ) ( 2 1 ) ( ln ) | ( ln ) (

k k T k k k k

C P C P C p y        m x m x x x

X

slide-13
SLIDE 13

25

The Gaussian Distribution (3)

 Since P(Ck) is assumed uniformly distributed (maximum

entropy), lnP(Ck) is a constant. Then what is left is to maximize

 This is equivalent to minimize the Euclidian distance

between x and mk

2

|| || ) ( ) (

k k T k

m x m x m x    

2

|| || ) ( ) (

k k T k

m x m x m x      

26

The Gaussian Distribution (4)

 So the ML decision is equivalent to minimum

Euclidian distance decision under the following conditions:

 P(Ck) is assumed uniformly distributed  The distribution of x is Gaussian and iid

slide-14
SLIDE 14

27

Bayes Rule (revisit)

 Posterior probabilities

factor ion normalizat prior likelihood posterior C P C P C p p p C P C p C P

M k k M k k k k k k

    

 

 

1 ) | ( ) ( ) | ( ) ( ) ( ) ( ) | ( ) | (

1 1

x x x x x x

X X X X 28

FINITE MIXTURE MODEL

   

m k m m m p

p    | x | x

1

.

2 2 1 1

p p p    

The general model: When m = 2 Why Gaussian mixture is so important?

slide-15
SLIDE 15

29

LMM vs GMM (2 COMPONENTS)

  • T. Amin, M. Zeytinoglu and L. Guan, “Application of Laplacian mixture model for image and video retrieval,” IEEE Transactions on

Multimedia, vol. 9, no, 7, pp. 1416-1429, November 2007.

30

Content-based Multimedia Processing

 Content-based (object-based) description

 Segmentation and recognition

 Content-based information retrieval

 Indexing and retrieval  One popular ranking criterion  e.g. retrieve a song from the music database by humming a

small piece of it

 Multimedia information fusion

 Multi-modality -- Multi-dimensional space

2

|| || ) ( ) (

k k T k

m x m x m x    