Machine Learning Probabilistic KNN. Mark Girolami - PowerPoint PPT Presentation

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow Probabilistic KNN June 21, 2007 – p. 1/3

Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates Probabilistic KNN June 21, 2007 – p. 2/3

Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework Probabilistic KNN June 21, 2007 – p. 2/3

Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership Probabilistic KNN June 21, 2007 – p. 2/3

Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership • No way to infer number of neighbours or metric parameters probabilistically Probabilistic KNN June 21, 2007 – p. 2/3

Probabilistic KNN • KNN is a remarkably simple algorithm with proven error-rates • One drawback is that it is not built on any probabilistic framework • No posterior probabilities of class membership • No way to infer number of neighbours or metric parameters probabilistically • Let us try and get around this ’problem’ Probabilistic KNN June 21, 2007 – p. 2/3

Probabilistic KNN • The first thing which is needed is a likelihood Probabilistic KNN June 21, 2007 – p. 3/3

Probabilistic KNN • The first thing which is needed is a likelihood • Consider a finite data sample { ( t 1 , x 1 ) , · · · , ( t N , x N ) } where each t n ∈ { 1 , · · · , C } denotes the class label and D -dimensional feature vector x n ∈ R D . The feature space R D has an associated metric with parameters θ denoted as M θ . Probabilistic KNN June 21, 2007 – p. 3/3

Probabilistic KNN • The first thing which is needed is a likelihood • Consider a finite data sample { ( t 1 , x 1 ) , · · · , ( t N , x N ) } where each t n ∈ { 1 , · · · , C } denotes the class label and D -dimensional feature vector x n ∈ R D . The feature space R D has an associated metric with parameters θ denoted as M θ . • A likelihood can be formed as � � M θ β � exp δ t n t j k j ∼ n | k � p ( t | X , β, k, θ , M ) ≈ � � M θ C n =1 β � � exp δ ct n k c =1 j ∼ n | k Probabilistic KNN June 21, 2007 – p. 3/3

Probabilistic KNN • The number of nearest neighbours is k and β defines a scaling variable. The expression M θ � δ t n t j j ∼ n | k denotes the number of the nearest k neighbours of x n , as measured under the metric M θ within N − 1 samples from X remaining when x n is removed which we denote as X − n , and have the class label value of t n , whilst each of the terms in the summation of the denominator provides a count of the number of the k neighbours of x n which have class label equaling c . Probabilistic KNN June 21, 2007 – p. 4/3

Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) Probabilistic KNN June 21, 2007 – p. 5/3

Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed Probabilistic KNN June 21, 2007 – p. 5/3

Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed • Approximate joint likelihood provides an overall measure of the LOO predictive likelihood Probabilistic KNN June 21, 2007 – p. 5/3

Probabilistic KNN • Likelihood formed by product of terms p ( t n | x n , X − n , t − n , β, k, θ , M ) • This is a Leave-One-Out (LOO) predictive likelihood, where t − n denotes the vector t with the n ’th element removed • Approximate joint likelihood provides an overall measure of the LOO predictive likelihood • Should exhibit some resiliance to overfitting due to the LOO nature of the approximate likelihood Probabilistic KNN June 21, 2007 – p. 5/3

Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) Probabilistic KNN June 21, 2007 – p. 6/3

Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) • Predictions of the target class label t ∗ of a new datum x ∗ are made by posterior averaging such that p ( t ∗ | x ∗ , t , X , M ) equals � � p ( t ∗ | x ∗ , t , X , β, k, θ , M ) p ( β, k, θ | t , X , M ) dβd θ k Probabilistic KNN June 21, 2007 – p. 6/3

Probabilistic KNN • Posterior inference will follow by obtaining the parameter posterior distribution p ( β, k, θ | t , X , M ) • Predictions of the target class label t ∗ of a new datum x ∗ are made by posterior averaging such that p ( t ∗ | x ∗ , t , X , M ) equals � � p ( t ∗ | x ∗ , t , X , β, k, θ , M ) p ( β, k, θ | t , X , M ) dβd θ k • Posterior takes an intractable form so MCMC procedure is proposed so that the following Monte-Carlo estimate is employed N s p ( t ∗ | x ∗ , t , X , M ) = 1 � p ( t ∗ | x ∗ , t , X , β ( s ) , k ( s ) , θ ( s ) , M ) ˆ N s s =1 Probabilistic KNN June 21, 2007 – p. 6/3

Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm Probabilistic KNN June 21, 2007 – p. 7/3

Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) Probabilistic KNN June 21, 2007 – p. 7/3

Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) • Proposal distribution for β new is Gaussian i.e. N ( β ( i ) , h ) Probabilistic KNN June 21, 2007 – p. 7/3

Probabilistic KNN • Posterior sampling algorithm simple Metropolis algorithm • Assume priors on k and β are uniform over all possible values (integer & real) • Proposal distribution for β new is Gaussian i.e. N ( β ( i ) , h ) • Proposal distribution for k is uniform between Min & Max values index ∼ U (0 , k step + 1) k new = k old + k inc ( index ); Probabilistic KNN June 21, 2007 – p. 7/3

Probabilistic KNN • Need to accept this new move using Metropolis ratio Probabilistic KNN June 21, 2007 – p. 8/3

Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) Probabilistic KNN June 21, 2007 – p. 8/3

Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) • Builds up a Markov Chain whose stationary distribution is p ( β, k, θ | t , X , M ) Probabilistic KNN June 21, 2007 – p. 8/3

Probabilistic KNN • Need to accept this new move using Metropolis ratio � 1 , p ( t | X , β new , k new , θ new , M ) � min p ( t | X , β, k, θ , M ) • Builds up a Markov Chain whose stationary distribution is p ( β, k, θ | t , X , M ) • Very simple algorithm to implement - Matlab and C implementations available Probabilistic KNN June 21, 2007 – p. 8/3

Probabilistic KNN • Trace of Metropolis Sampler for β & k 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4 x 10 100 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 4 x 10 Probabilistic KNN June 21, 2007 – p. 9/3

Probabilistic KNN 18000 16000 28 14000 26 No. of SAMPLES 12000 24 10000 %CV−ERROR 22 8000 20 6000 18 4000 16 2000 14 0 0 10 20 30 40 50 60 70 80 90 12 0 10 20 30 40 50 60 70 80 90 K K Figure 1: The top graph shows a histogram of the marginal posterior for K on the synthetic Ripley dataset and the bottom shows the 10CV error against the value of K . Probabilistic KNN June 21, 2007 – p. 10/3

Probabilistic KNN 18 PKNN KNN 16 Percentage Test Error 14 12 10 8 50 100 150 200 250 Size of Data Set Figure 2: The percentage test error obtained with training sets of varying size from 25 to 250 data points. For each sub-sample size, 50 random subsets were sampled and each of these used to obtain a KNN and PKNN classifier which were then used to make predictions on the 1000 independent test points. The mean percentage performance and associated standard error obtained for each training set are shown in the above figure for each classifier. Probabilistic KNN June 21, 2007 – p. 11/3

Machine Learning Probabilistic KNN. Mark Girolami - PowerPoint PPT Presentation

Machine Learning Probabilistic KNN. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow Probabilistic KNN June 21, 2007 p. 1/3 Probabilistic KNN KNN is a remarkably simple algorithm with proven

Final Project Specifications CMPE 650 kNN Overview K-N earest N eighbors (kNN) is a

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

10-701 Fall 2017 Recitation 3 Agenda Q1 - Decision Tree to KNN A1 Q2.1 - KNN to Decision

CS 445 Introduction to Machine Learning Features and the KNN Classifier Instructor: Dr. Kevin

Scaling data and KNN Regression Nathan George Data Science Professor DataCamp Machine Learning

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

CS 445 Introduction to Machine Learning Features and the KNN Classifier Instructor: Dr. Kevin

Experimental Design + k-Nearest Neighbors KNN Readings: Prob. Readings: (next

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course

Lesson 3 Approximating Fourier series 1 Last lecture, we saw that the trapezoidal rule was

State-of-the-Art ! 30-85 errors are made per 1000 lines of source CS 619 Introduction to OO Design

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Day 4: Resampling Methods Lucas Leemann Essex Summer School Introduction to Statistical Learning

CS 6316 Machine Learning Introduction to Learning Theory Yangfeng Ji Department of Computer

Computational treatment of the error distribution in nonparametric regression with right-censored

Comparing against a benchmark IN TRODUCTION TO P ORTF OLIO AN ALYS IS IN P YTH ON Charlotte

4. Asymptotic Approximations http://aofa.cs.princeton.edu A N A L Y T I C C O M B I N A T O R I