Instance-based Learning Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

Agenda Agenda  Instance-based learning  K-Nearest Neighbor  Locally-weighted Regression  Radial Basis Function Networks  Case-based Reasoning 2

Instance Instance-ba based sed Le Learn arning ing  Key idea: In contrast to learning methods that construct a general, explicit description of the target function when training examples are provided, instance-based learning constructs the target function only when a new instance must be classified.  Only store all training examples < 𝒚 𝒋 , 𝒈(𝒚 𝒋 ) > where 𝒚 𝒋 describes the attributes of each instance and 𝒈(𝒚 𝒋 ) denotes its class (or value).  Examples: K-Nearest Neighbor 3

K-Neare Nearest st Neighb Neighbor or  Simple 2-D case, each instance described only by two values (x, y co- ordinates)  Given query instance  , take vote among its k nearest neighbors to decide its class, (return most common value of f among the k nearest training elements to x query )  Need to consider 1. Similarity (how to calculate distance) 2. Number (and weight) of similar (near) instances 4

Sim Simil ilari arity ty  Euclidean distance, more precisely let an arbitrary instance x be described by the feature vector (set of attributes) as follows:   a ( x ), a ( x ),... a ( x ) 1 2 n  where a r (x) denotes the value of the r th attribute of instance x. Then the distance between two instances x i and x j is defined to be d(x i , x j ) where n  2   d ( x , x ) ( a ( x ) a ( x )) i j r i r j  r 1 5

Trai Training ning data data Number Lines Line types Rectangles Colors Mondrian? 1 6 1 10 4 No 4 2 8 5 2 No 3 5 2 7 4 Yes 4 5 1 8 4 Yes 5 5 1 10 5 No 6 6 1 8 6 Yes 7 1 14 5 7 No Test Instance: Number Lines Line types Rectangles Colors Mondrian? 8 7 2 9 4 6

Keep data eep data in normali in normalised f sed form orm  x x  One way to normalize the data a r (x) to a ´ r (x) is  t t x '  t t r  th  x mean of t attributes  t  th sta ndard deviation of t attributes  7

Normal ormalized ized tr traini aining ng data data Number Lines Line Rectangles Colors Mondrian? types 1 0.632 -0.632 0.327 -1.021 No 2 -1.581 1.581 -0.588 0.408 No 3 -0.474 1.581 -1.046 -1.021 Yes 4 -0.474 -0.632 -0.588 -1.021 Yes 5 -0.474 -0.632 0.327 0.408 No 6 0.632 -0.632 -0.588 1.837 Yes 7 1.739 -0.632 2.157 0.408 No Test Instance: Number Lines Line Rectangles Colors Mondrian? types 8 1.739 1.581 -0.131 -1.021 8

Di Distances stances of of test test instance instance fr from trai om training ning data data Example Distance Mondrian? Classification of test from Yes 1-NN example Yes 3-NN 1 2.517 No No 5-NN 2 3.644 No No 7-NN 3 2.395 Yes 4 3.164 Yes 5 3.472 No 6 3.808 Yes 7 3.490 No 9

Lazy Lazy vs vs Ea Eage ger r Le Learn arning ing  The K-NN method does not form an explicit hypothesis regarding the target classification function. It simply computes the classification for each new query instance as needed.  Implied Hypothesis: the following diagram (Voronoi diagram) shows the shape of the implied hypothesis about the decision surface that can be derived for a simple 1- NN case.  The decision surface is a combination of complex polyhedra surrounding each of the training examples. For every training example, the polyhedron indicates the set of query points whose classification will be completely determined by that training example. 10

Cont Continuous inuous vs vs Di Discre screte te valu valued ed f fun unction ctions s (cl (classes) asses)  K-NN works well for discrete-valued target functions.  Furthermore, the idea can be extended f or continuous (real) valued functions. In this case we can take mean of the f values of k nearest neighbors: k  f ( x ) i ˆ   i 1 f ( x ) q k 11

When When to to Cons Consider N ider Nea earest rest Neighb Neighbor or ?  Instances map to points in R n  Average number of attributes (e.g. Less than 20 attributes per instance)  Lots of training data  When target function is complex but can be approximated by separate local simple approximations Advantages: Disadvantages: Training is very fast Slow at query time Learn complex target Easily fooled by irrelevant functions attributes 12

Colla Collaborati borative ve Fil Filtering tering (AKA (AKA Recomm Recommende ender r Sys Systems tems)  Problem:  Predict whether someone will like a webpage, postings, movie, book, etc.  Previous Approach  Look at the content  Collaborative Filtering  Look at what similar users liked  Similar users = Similar likes and dislikes 13

Col Collaborat laborativ ive e Filt Filtering ering  Represent each user by vector of ratings  Two types  Yes/No  Explicit ratings (e.g., 0 - * * *)  Predict rating  Similarity (Pearson coefficient) 14

Col Collaborat laborativ ive e Fil Filteri tering ng  Primitive version   N i can be whole database, or only k nearest neighbors  R jk : Rating of user j on item k  𝑺 𝒌 : Average of all of user j's ratings  Summation in Pearson coefficient is over all items rated by both users  In principle, any prediction method can be used for collaborative filtering 15

Exa Exampl mple e (C (Coll ollaborati aborative ve Fil Filteri tering) ng) 16

Distance Di stance-weigh weighted ted k-NN NN  Might want to weigh nearer neighbors more heavily k 1  ˆ    f ( x ) w ( v , f ( x )) where w argmax i i i q 2 d(x , x )   q i v V i 1  and d (x q , x i ) is distance between x q and x i  For continuous functions k  w f x ( ) i i 1 ˆ (    i 1 f x ) where w q i k 2  d x x ( , ) q i w i  i 1 17

Cur Curse se of of Di Dimen mension sionalit ality  Imagine instances described by 20 attributes, but only 2 are relevant to target function: Instances that have identical values for the two relevant attributes may nevertheless be distant from one another in the 20- dimensional space.  Curse of dimensionality: nearest neighbor is easily misled when high- dimensional X. (Compare to decision trees).  One approach: Weight each attribute differently (Use training) Stretch j th axis by weight z j , where z 1 , …., z n chosen to minimize prediction 1. error Use cross-validation to automatically choose weights z 1 , …., z n 2. 3. Note setting z j to zero eliminates dimension i altogether 18

Loca Locall lly-weigh weighted ted R Reg egress ression ion  Basic idea:  k-NN forms local approximation to f for each query point x q  Why not form an explicit approximation f(x) for region surrounding x q  Fit linear function to k nearest neighbors  Fit quadratic, ...  Thus producing “piecewise approximation” to f 19

f1 (simple regression) Locally-weighted regression f2 Locally-weighted regression Locally-weighted regression f3 f4 Training data Predicted value using simple regression Predicted value using locally weighted (piece-wise) regression 20

f1 (simple regression) Locally-weighted regression f2 Locally-weighted regression Locally-weighted regression f3 f4  Several choices of error to minimize:  e.g Squared error over k nearest neighbors  or Distance-weighted square error over all neighbors  or ….. 21

Radi Radial Basi al Basis F s Function unction Netw etworks orks  ‘Global’ approximation to target function, in terms of linear combination of ‘local’ approximations  Used, e.g., for image classification  A different kind of neural network  Closely related to distance-weighted regression, but  “eager” instead of “lazy” 22

Radi Radial Basi al Basis F s Function unction Netw etworks orks  where a i (x )are the attributes describing instance x, and  One common choice for K u (d(x u ;x)) is 23

Trai Training ning Radi Radial Basi al Basis F s Function N unction Netw etworks orks  Question 1: What x u to use for each kernel function K u (d(x u ;x))  Scatter uniformly throughout instance space  Or use training instances (reflects instance distribution)  Question 2: How to train weights (assume here Gaussian K u )  First choose variance (and perhaps mean) for each K u  e.g., use EM  Then hold K u fixed, and train linear output layer  efficient methods to fit linear function 24

Case Case-ba based sed Reas Reason oning ing  Can apply instance-based learning even when X != R n  However, in this case we need different “distance” metrics.  For example, case-based reasoning is instance-based learning applied to instances with symbolic logic descriptions:  ((user-complaint error53-on-shutdown)  (cpu-model PowerPC)  (operating-system Windows)  (network-connection PCIA)  (memory 48meg)  (installed-applications Excel Netscape VirusScan)  (disk 1gig)  (likely-cause ???)) 25

Instance-based Learning Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 Agenda Agenda Instance-based learning K-Nearest Neighbor Locally-weighted Regression Radial Basis Function Networks Case-based

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

Learning for Categorization Sample Category Learning Problem A training example is an instance

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

For Friday No reading Program 3 due Program 3 Any questions? Basic Concept Not

XAI in Machine Learning Problems Taxonomy Explanation by Design Black Box eXplanation Example

Course wrap up November 27, 2008 CS 486/686 University of Waterloo Outline Course wrap up

Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk N I V E U

Navigation Around Humans Hey!! How you Do' in Importance, Approaches and the Future!!! WHY IS

Web Reasoning Using Fact Tagging Mehdi Terdjimi, Lionel Mdini and Michael Mrissa Laboratoire

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee,

Instance-based Learning Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Instance-based Learning Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 Agenda Agenda Instance-based learning K-Nearest Neighbor Locally-weighted Regression Radial Basis Function Networks Case-based

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

Nearest Neighbor Learning (Instance Based Learning) l Classify based on local similarity l Ranges

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

Learning for Categorization Sample Category Learning Problem A training example is an instance

Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang

Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO &amp; PARTNERS IS THE

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

CPSC 213 2.4.4-2.4.6 Textbook 2ed: 3.9.1 1ed: 3.9.1 Introduction to Computer

Instance-level recognition Cordelia Schmid INRIA, Grenoble Instance-level recognition Search

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

For Friday No reading Program 3 due Program 3 Any questions? Basic Concept Not

XAI in Machine Learning Problems Taxonomy Explanation by Design Black Box eXplanation Example

Course wrap up November 27, 2008 CS 486/686 University of Waterloo Outline Course wrap up

Knowledge Engineering Semester 2, 2004-05 Michael Rovatsos mrovatso@inf.ed.ac.uk N I V E U

Navigation Around Humans Hey!! How you Do' in Importance, Approaches and the Future!!! WHY IS

Web Reasoning Using Fact Tagging Mehdi Terdjimi, Lionel Mdini and Michael Mrissa Laboratoire

Technologien und Mobilkommunikation Self-Healing in Self-Organising Networks Oliver Scheit

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee,

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

About any instance (fi rst instance, appeal, cassation, the ARTYUSHENKO & PARTNERS IS THE