Machine Learning Probabilistic KNN. Mark Girolami - - PowerPoint PPT Presentation

machine learning probabilistic knn
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Probabilistic KNN. Mark Girolami - - PowerPoint PPT Presentation

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow Probabilistic KNN June 21, 2007 p. 1/3 Probabilistic KNN KNN is a remarkably simple algorithm with proven


slide-1
SLIDE 1

Machine Learning Probabilistic KNN.

Mark Girolami

girolami@dcs.gla.ac.uk

Department of Computing Science University of Glasgow

Probabilistic KNN June 21, 2007 – p. 1/3

slide-2
SLIDE 2

Probabilistic KNN

  • KNN is a remarkably simple algorithm with proven

error-rates

Probabilistic KNN June 21, 2007 – p. 2/3

slide-3
SLIDE 3

Probabilistic KNN

  • KNN is a remarkably simple algorithm with proven

error-rates

  • One drawback is that it is not built on any probabilistic

framework

Probabilistic KNN June 21, 2007 – p. 2/3

slide-4
SLIDE 4

Probabilistic KNN

  • KNN is a remarkably simple algorithm with proven

error-rates

  • One drawback is that it is not built on any probabilistic

framework

  • No posterior probabilities of class membership

Probabilistic KNN June 21, 2007 – p. 2/3

slide-5
SLIDE 5

Probabilistic KNN

  • KNN is a remarkably simple algorithm with proven

error-rates

  • One drawback is that it is not built on any probabilistic

framework

  • No posterior probabilities of class membership
  • No way to infer number of neighbours or metric

parameters probabilistically

Probabilistic KNN June 21, 2007 – p. 2/3

slide-6
SLIDE 6

Probabilistic KNN

  • KNN is a remarkably simple algorithm with proven

error-rates

  • One drawback is that it is not built on any probabilistic

framework

  • No posterior probabilities of class membership
  • No way to infer number of neighbours or metric

parameters probabilistically

  • Let us try and get around this ’problem’

Probabilistic KNN June 21, 2007 – p. 2/3

slide-7
SLIDE 7

Probabilistic KNN

  • The first thing which is needed is a likelihood

Probabilistic KNN June 21, 2007 – p. 3/3

slide-8
SLIDE 8

Probabilistic KNN

  • The first thing which is needed is a likelihood
  • Consider a finite data sample {(t1, x1), · · · , (tN, xN)}

where each tn ∈ {1, · · · , C} denotes the class label and

D-dimensional feature vector xn ∈ RD. The feature

space RD has an associated metric with parameters θ denoted as Mθ.

Probabilistic KNN June 21, 2007 – p. 3/3

slide-9
SLIDE 9

Probabilistic KNN

  • The first thing which is needed is a likelihood
  • Consider a finite data sample {(t1, x1), · · · , (tN, xN)}

where each tn ∈ {1, · · · , C} denotes the class label and

D-dimensional feature vector xn ∈ RD. The feature

space RD has an associated metric with parameters θ denoted as Mθ.

  • A likelihood can be formed as

p(t|X, β, k, θ, M) ≈

  • n=1

exp

  • β

k Mθ

  • j∼n|k

δtntj

  • C
  • c=1

exp

  • β

k Mθ

  • j∼n|k

δctn

  • Probabilistic KNN June 21, 2007 – p. 3/3
slide-10
SLIDE 10

Probabilistic KNN

  • The number of nearest neighbours is k and β defines a

scaling variable. The expression

  • j∼n|k

δtntj

denotes the number of the nearest k neighbours of xn, as measured under the metric Mθ within N − 1 samples from X remaining when xn is removed which we denote as X−n, and have the class label value of tn, whilst each

  • f the terms in the summation of the denominator

provides a count of the number of the k neighbours of

xn which have class label equaling c.

Probabilistic KNN June 21, 2007 – p. 4/3

slide-11
SLIDE 11

Probabilistic KNN

  • Likelihood formed by product of terms

p(tn|xn, X−n, t−n, β, k, θ, M)

Probabilistic KNN June 21, 2007 – p. 5/3

slide-12
SLIDE 12

Probabilistic KNN

  • Likelihood formed by product of terms

p(tn|xn, X−n, t−n, β, k, θ, M)

  • This is a Leave-One-Out (LOO) predictive likelihood,

where t−n denotes the vector t with the n’th element removed

Probabilistic KNN June 21, 2007 – p. 5/3

slide-13
SLIDE 13

Probabilistic KNN

  • Likelihood formed by product of terms

p(tn|xn, X−n, t−n, β, k, θ, M)

  • This is a Leave-One-Out (LOO) predictive likelihood,

where t−n denotes the vector t with the n’th element removed

  • Approximate joint likelihood provides an overall measure
  • f the LOO predictive likelihood

Probabilistic KNN June 21, 2007 – p. 5/3

slide-14
SLIDE 14

Probabilistic KNN

  • Likelihood formed by product of terms

p(tn|xn, X−n, t−n, β, k, θ, M)

  • This is a Leave-One-Out (LOO) predictive likelihood,

where t−n denotes the vector t with the n’th element removed

  • Approximate joint likelihood provides an overall measure
  • f the LOO predictive likelihood
  • Should exhibit some resiliance to overfitting due to the

LOO nature of the approximate likelihood

Probabilistic KNN June 21, 2007 – p. 5/3

slide-15
SLIDE 15

Probabilistic KNN

  • Posterior inference will follow by obtaining the parameter

posterior distribution p(β, k, θ|t, X, M)

Probabilistic KNN June 21, 2007 – p. 6/3

slide-16
SLIDE 16

Probabilistic KNN

  • Posterior inference will follow by obtaining the parameter

posterior distribution p(β, k, θ|t, X, M)

  • Predictions of the target class label t∗ of a new datum

x∗ are made by posterior averaging such that p(t∗|x∗, t, X, M) equals

  • k
  • p(t∗|x∗, t, X, β, k, θ, M)p(β, k, θ|t, X, M)dβdθ

Probabilistic KNN June 21, 2007 – p. 6/3

slide-17
SLIDE 17

Probabilistic KNN

  • Posterior inference will follow by obtaining the parameter

posterior distribution p(β, k, θ|t, X, M)

  • Predictions of the target class label t∗ of a new datum

x∗ are made by posterior averaging such that p(t∗|x∗, t, X, M) equals

  • k
  • p(t∗|x∗, t, X, β, k, θ, M)p(β, k, θ|t, X, M)dβdθ
  • Posterior takes an intractable form so MCMC procedure

is proposed so that the following Monte-Carlo estimate is employed

ˆ p(t∗|x∗, t, X, M) = 1 Ns

Ns

  • s=1

p(t∗|x∗, t, X, β(s), k(s), θ(s), M)

Probabilistic KNN June 21, 2007 – p. 6/3

slide-18
SLIDE 18

Probabilistic KNN

  • Posterior sampling algorithm simple Metropolis algorithm

Probabilistic KNN June 21, 2007 – p. 7/3

slide-19
SLIDE 19

Probabilistic KNN

  • Posterior sampling algorithm simple Metropolis algorithm
  • Assume priors on k and β are uniform over all possible

values (integer & real)

Probabilistic KNN June 21, 2007 – p. 7/3

slide-20
SLIDE 20

Probabilistic KNN

  • Posterior sampling algorithm simple Metropolis algorithm
  • Assume priors on k and β are uniform over all possible

values (integer & real)

  • Proposal distribution for βnew is Gaussian i.e. N(β(i), h)

Probabilistic KNN June 21, 2007 – p. 7/3

slide-21
SLIDE 21

Probabilistic KNN

  • Posterior sampling algorithm simple Metropolis algorithm
  • Assume priors on k and β are uniform over all possible

values (integer & real)

  • Proposal distribution for βnew is Gaussian i.e. N(β(i), h)
  • Proposal distribution for k is uniform between Min &

Max values

index ∼ U(0, kstep + 1) knew = kold + kinc(index);

Probabilistic KNN June 21, 2007 – p. 7/3

slide-22
SLIDE 22

Probabilistic KNN

  • Need to accept this new move using Metropolis ratio

Probabilistic KNN June 21, 2007 – p. 8/3

slide-23
SLIDE 23

Probabilistic KNN

  • Need to accept this new move using Metropolis ratio

min

  • 1, p(t|X, βnew, knew, θnew, M)

p(t|X, β, k, θ, M)

  • Probabilistic KNN June 21, 2007 – p. 8/3
slide-24
SLIDE 24

Probabilistic KNN

  • Need to accept this new move using Metropolis ratio

min

  • 1, p(t|X, βnew, knew, θnew, M)

p(t|X, β, k, θ, M)

  • Builds up a Markov Chain whose stationary distribution

is p(β, k, θ|t, X, M)

Probabilistic KNN June 21, 2007 – p. 8/3

slide-25
SLIDE 25

Probabilistic KNN

  • Need to accept this new move using Metropolis ratio

min

  • 1, p(t|X, βnew, knew, θnew, M)

p(t|X, β, k, θ, M)

  • Builds up a Markov Chain whose stationary distribution

is p(β, k, θ|t, X, M)

  • Very simple algorithm to implement - Matlab and C

implementations available

Probabilistic KNN June 21, 2007 – p. 8/3

slide-26
SLIDE 26

Probabilistic KNN

  • Trace of Metropolis Sampler for β & k

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10

4

2 4 6 8 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10

4

20 40 60 80 100

Probabilistic KNN June 21, 2007 – p. 9/3

slide-27
SLIDE 27

Probabilistic KNN

10 20 30 40 50 60 70 80 90 2000 4000 6000 8000 10000 12000 14000 16000 18000

K

  • No. of SAMPLES

10 20 30 40 50 60 70 80 90 12 14 16 18 20 22 24 26 28

K %CV−ERROR

Figure 1: The top graph shows a histogram of the marginal posterior for K on the synthetic Ripley dataset and the bottom shows the 10CV error against the value of K.

Probabilistic KNN June 21, 2007 – p. 10/3

slide-28
SLIDE 28

Probabilistic KNN

50 100 150 200 250 8 10 12 14 16 18 Size of Data Set Percentage Test Error PKNN KNN

Figure 2: The percentage test error obtained with training sets of varying size from

25 to 250 data points. For each sub-sample size, 50 random subsets were sampled and each of these used to obtain a KNN and PKNN classifier which were then used to make predictions on the 1000 independent test points. The mean percentage performance and associated standard error obtained for each training set are shown in the above figure for each classifier.

Probabilistic KNN June 21, 2007 – p. 11/3

slide-29
SLIDE 29

Probabilistic KNN

Data KNN PKNN P-Value Glass 29.91 ± 9.22 26.67 ± 8.81 0.517 Iris 5.33 ± 5.25 4.00 ± 5.62 0.537 Crabs 15.00 ± 8.82 19.50 ± 6.85 0.240 Pima 27.00 ± 8.88 24.00 ± 14.68 0.645 Soybean 14.50 ± 16.74 4.50 ± 9.56 0.155 Wine 3.922 ± 3.77 3.37 ± 2.89 0.805 Balance 11.52 ± 2.99 10.23 ± 3.02 0.324 Heart 15.18 ± 5.91 15.18 ± 4.43 1.000 Liver 33.60 ± 6.98 36.26 ± 12.93 0.705 Diabetes 25.91 ± 7.15 25.25 ± 8.11 0.970 Vehicle 36.28 ±5.16 37.22 ± 4.53 0.732

Probabilistic KNN June 21, 2007 – p. 12/3

slide-30
SLIDE 30

Probabilistic KNN

Data KNN PKNN Glass 39.55 243.52 Iris 7.58 91.8 Crabs 21.99 156.30 Pima 24.10 103.60 Soybean 1.16 38.38 Wine 27.9 144.90 Balance 609.86 555.72 Heart 96.11 145.22 Liver 116.71 189.73 Diabetes 1643.09 567.03 Vehicle 4226.69 1063.13

Table 1: The running times (seconds) for KNN with no

Probabilistic KNN June 21, 2007 – p. 13/3

slide-31
SLIDE 31

Probabilistic KNN

  • PKNN is a fully Bayesian method for KNN classification

Probabilistic KNN June 21, 2007 – p. 14/3

slide-32
SLIDE 32

Probabilistic KNN

  • PKNN is a fully Bayesian method for KNN classification
  • Requires MCMC therefore slow

Probabilistic KNN June 21, 2007 – p. 14/3

slide-33
SLIDE 33

Probabilistic KNN

  • PKNN is a fully Bayesian method for KNN classification
  • Requires MCMC therefore slow
  • Possible to learn metric though this is computationally

demanding

Probabilistic KNN June 21, 2007 – p. 14/3

slide-34
SLIDE 34

Probabilistic KNN

  • PKNN is a fully Bayesian method for KNN classification
  • Requires MCMC therefore slow
  • Possible to learn metric though this is computationally

demanding

  • Predictive probabilities more useful in certain

applications - e.g. clinical prediction

Probabilistic KNN June 21, 2007 – p. 14/3

slide-35
SLIDE 35

Probabilistic KNN

  • PKNN is a fully Bayesian method for KNN classification
  • Requires MCMC therefore slow
  • Possible to learn metric though this is computationally

demanding

  • Predictive probabilities more useful in certain

applications - e.g. clinical prediction

  • On 0-1 loss no statistically significant difference with CV

& KNN

Probabilistic KNN June 21, 2007 – p. 14/3