Instance-based Learning
Hamid R. Rabiee
Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1
Instance-based Learning Hamid R. Rabiee Spring 2015 - - PowerPoint PPT Presentation
Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 Agenda Agenda Instance-based learning K-Nearest Neighbor Locally-weighted Regression Radial Basis Function Networks Case-based
Hamid R. Rabiee
Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1
Agenda Agenda
Instance-based learning K-Nearest Neighbor Locally-weighted Regression Radial Basis Function Networks Case-based Reasoning
2
Instance Instance-ba based sed Le Learn arning ing
Key idea: In contrast to learning methods that construct a general, explicit description of the target function when training examples are provided, instance-based learning constructs the target function only when a new instance must be classified. Only store all training examples <𝒚𝒋, 𝒈(𝒚𝒋)> where 𝒚𝒋 describes the attributes of each instance and 𝒈(𝒚𝒋) denotes its class (or value). Examples: K-Nearest Neighbor
3
K-Neare Nearest st Neighb Neighbor
Simple 2-D case, each instance described only by two values (x, y co-
Given query instance , take vote among its k nearest neighbors to decide its class, (return most common value of f among the k nearest training elements to xquery ) Need to consider
1. Similarity (how to calculate distance) 2. Number (and weight) of similar (near) instances
4
Sim Simil ilari arity ty
Euclidean distance, more precisely let an arbitrary instance x be described by the feature vector (set of attributes) as follows: where ar(x) denotes the value of the rth attribute of instance x. Then the distance between two instances xi and xj is defined to be d(xi, xj) where
5
2 1
n
n r j r i r j i
x a x a x x d
1
)) ( ) ( ( 2 ) , (
Trai Training ning data data
6
Number Lines Line types Rectangles Colors Mondrian? 1 6 1 10 4 No 2 4 2 8 5 No 3 5 2 7 4 Yes 4 5 1 8 4 Yes 5 5 1 10 5 No 6 6 1 8 6 Yes 7 7 1 14 5 No Number Lines Line types Rectangles Colors Mondrian? 8 7 2 9 4
Test Instance:
Keep data eep data in normali in normalised f sed form
One way to normalize the data ar(x) to a´r(x) is
7
t t t t
th r
th t
Normal
ized tr traini aining ng data data
8
Number Lines Line types Rectangles Colors Mondrian? 1 0.632
0.327
No 2
1.581
0.408 No 3
1.581
Yes 4
Yes 5
0.327 0.408 No 6 0.632
1.837 Yes 7 1.739
2.157 0.408 No Number Lines Line types Rectangles Colors Mondrian? 8 1.739 1.581
Test Instance:
Di Distances stances of
test instance instance fr from trai
ning data data
9
Example Distance
from example Mondrian? 1 2.517 No 2 3.644 No 3 2.395 Yes 4 3.164 Yes 5 3.472 No 6 3.808 Yes 7 3.490 No
Classification
1-NN Yes 3-NN Yes 5-NN No 7-NN No
Lazy Lazy vs vs Ea Eage ger r Le Learn arning ing
The K-NN method does not form an explicit hypothesis regarding the target classification function. It simply computes the classification for each new query instance as needed. Implied Hypothesis: the following diagram (Voronoi diagram) shows the shape of the implied hypothesis about the decision surface that can be derived for a simple 1- NN case.
The decision surface is a combination of complex polyhedra surrounding each of the training examples. For every training example, the polyhedron indicates the set of query points whose classification will be completely determined by that training example.
10
Cont Continuous inuous vs vs Di Discre screte te valu valued ed f fun unction ctions s (cl (classes) asses)
K-NN works well for discrete-valued target functions. Furthermore, the idea can be extended f or continuous (real) valued
neighbors:
11
k x f x f
k i i q
1
) ( ) ( ˆ
When When to to Cons Consider N ider Nea earest rest Neighb Neighbor
Instances map to points in Rn Average number of attributes (e.g. Less than 20 attributes per instance) Lots of training data When target function is complex but can be approximated by separate local simple approximations
12
Advantages: Disadvantages: Training is very fast Slow at query time Learn complex target functions Easily fooled by irrelevant attributes
Colla Collaborati borative ve Fil Filtering tering (AKA (AKA Recomm Recommende ender r Sys Systems tems)
Problem:
Predict whether someone will like a webpage, postings, movie, book, etc.
Previous Approach
Look at the content
Collaborative Filtering
Look at what similar users liked Similar users = Similar likes and dislikes
13
Col Collaborat laborativ ive e Filt Filtering ering
Represent each user by vector of ratings Two types
Yes/No Explicit ratings (e.g., 0 - * * *)
Predict rating Similarity (Pearson coefficient)
14
Col Collaborat laborativ ive e Fil Filteri tering ng
Primitive version Ni can be whole database, or only k nearest neighbors Rjk: Rating of user j on item k 𝑺𝒌: Average of all of user j's ratings Summation in Pearson coefficient is over all items rated by both users In principle, any prediction method can be used for collaborative filtering
15
Exa Exampl mple e (C (Coll
aborative ve Fil Filteri tering) ng)
16
Di Distance stance-weigh weighted ted k-NN NN
Might want to weigh nearer neighbors more heavily and d (xq , xi) is distance between xq and xi For continuous functions
17 2 1 V v
1 where )) ( , ( argmax ) ( ˆ ) , x d(x w x f v w x f
i q i i k i i
q
1 2 1
( ) 1 ˆ( ) where ( , )
k i i i q i k q i i i
w f x f x w d x x w
Cur Curse se of
Dimen mension sionalit ality
Imagine instances described by 20 attributes, but only 2 are relevant to target function: Instances that have identical values for the two relevant attributes may nevertheless be distant from one another in the 20- dimensional space. Curse of dimensionality: nearest neighbor is easily misled when high- dimensional X. (Compare to decision trees). One approach: Weight each attribute differently (Use training)
1. Stretch jth axis by weight zj , where z1, …., zn chosen to minimize prediction error 2. Use cross-validation to automatically choose weights z1, …., zn 3. Note setting zj to zero eliminates dimension i altogether
18
Loca Locall lly-weigh weighted ted R Reg egress ression ion
Basic idea: k-NN forms local approximation to f for each query point xq Why not form an explicit approximation f(x) for region surrounding xq
Fit linear function to k nearest neighbors Fit quadratic, ... Thus producing “piecewise approximation” to f
19
20
f1 (simple regression)
Training data Predicted value using locally weighted (piece-wise) regression Predicted value using simple regression
Locally-weighted regression f2 Locally-weighted regression f3 Locally-weighted regression f4
Several choices of error to minimize:
e.g Squared error over k nearest neighbors
21
f1 (simple regression) Locally-weighted regression f2 Locally-weighted regression f3 Locally-weighted regression f4
Radi Radial Basi al Basis F s Function unction Netw etworks
‘Global’ approximation to target function, in terms of linear combination
Used, e.g., for image classification A different kind of neural network Closely related to distance-weighted regression, but “eager” instead of “lazy”
22
Radi Radial Basi al Basis F s Function unction Netw etworks
where ai(x )are the attributes describing instance x, and One common choice for Ku(d(xu;x)) is
23
Trai Training ning Radi Radial Basi al Basis F s Function N unction Netw etworks
Question 1: What xu to use for each kernel function Ku(d(xu;x))
Scatter uniformly throughout instance space Or use training instances (reflects instance distribution)
Question 2: How to train weights (assume here Gaussian Ku)
First choose variance (and perhaps mean) for each Ku e.g., use EM Then hold Ku fixed, and train linear output layer efficient methods to fit linear function
24
Case Case-ba based sed Reas Reason
ing
Can apply instance-based learning even when X != Rn However, in this case we need different “distance” metrics. For example, case-based reasoning is instance-based learning applied to instances with symbolic logic descriptions:
((user-complaint error53-on-shutdown) (cpu-model PowerPC) (operating-system Windows) (network-connection PCIA) (memory 48meg) (installed-applications Excel Netscape VirusScan) (disk 1gig) (likely-cause ???))
25
Case Case-ba based sed R Rea eason soning ing (C (CBR BR)
Objects may include complex structural descriptions of cases & adaptation rules CBR cannot use Euclidean distance measures Must define distance measures for those complex objects instead (e.g. semantic nets) CBR tries to model human problem-solving uses past experience (cases) to solve new problems retains solutions to new problems CBR is an ongoing area of machine learning research with many applications
26
CB CBR R Exa Exampl mple
27
Case Location code Bedrooms Recep rooms Type floors Cond- ition Price £ 1 8 2 1 terraced 1 poor 20,500 2 8 2 2 terraced 1 fair 25,000 3 5 1 2 semi 2 good 48,000 4 5 1 2 terraced 2 good 41,000 Case Location code Bedrooms Recep rooms Type floors Cond- ition Price £ 5 7 2 2 semi 1 poor ??? Test Instance:
How rules
are gene generated rated
There is no unique way of doing it. Here is one possibility: Examine cases and look for ones that are almost identical
case 1 and case 2 R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000 case 3 and case 4 R2: If Type changes from semi to terraced then reduce price by £7,000
CBR Challenges
How should cases be represented? How should cases be indexed for fast retrieval? How can good adaptation heuristics be developed? When should old cases be removed?
28
Sharif University of Technology, Computer Engineering Department, Machine Learning Course
29
Conclusi Conclusions
Instance-based learning K-Nearest Neighbor: A simple Algorithm Locally-weighted Regression Radial Basis Function Networks Case-based Reasoning: for features which X != Rn
30
Any Q Any Questi uestion?
Spring 2015
http://ce.sharif.edu/courses/93-94/2/ce717-1