machine learning
play

Machine Learning Instance Based Learning Hamid Beigy Sharif - PowerPoint PPT Presentation

Machine Learning Instance Based Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 17 Table of contents Introduction 1 Nearest neighbor algorithms


  1. Machine Learning Instance Based Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 17

  2. Table of contents Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 2 / 17

  3. Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 17

  4. Introduction The methods described before such as decision tree, Bayesian classifiers, and 1 boosting, at the first find hypothesis and then this hypothesis will be used for classification of new test examples. These methods are called eager learning. 2 The instance based learning algorithms such as k-NN store all of the training 3 examples and then classify a new example x by finding the training example ( x i , y i ) that is nearest to x according to some distance metric. Instance based classifiers do not explicitly compute decision boundaries. However, 4 the boundaries form a subset of the Voronoi diagram of the training data. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 3 / 17

  5. Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 4 / 17

  6. Nearest neighbor algorithms Fix k ≥ 1, given a labeled sample 1 S = { ( x 1 , t 1 ) , . . . , ( x N , t N ) } where t i ∈ { 0 , 1 } . The k -NN for all test examples x returns the hypothesis h defined by    � �  . h ( x ) = I w i > w i i , t i =1 i , t i =0 where the weights w 1 , . . . , w N are chosen such that w i = 1 k if x i is among the k nearest neighbors of x . The boundaries form a subset of the Voronoi diagram of the training data. 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 4 / 17

  7. Nearest neighbor algorithms The k -NN only requires 1 An integer k . A set of labeled examples S . A metric to measure closeness. For all points x , y , z , a metric d must satisfy the following properties. 2 Non-negativity : d ( x , y ) ≥ 0. Reflexivity : d ( x , y ) = 0 ⇔ x = y . Symmetry : d ( x , y ) = d ( y , x ). Triangle inequality : d ( x , y ) + d ( y , z ) ≥ d ( x , z ). Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 5 / 17

  8. Distance functions The Minkowski distance for D -dimensional examples is the L p norm. 1 � D � 1 p � | x i − y i | p L p ( x , y ) = i =1 The Euclidean distance is the L 2 norm 2 � 1 � D 2 � | x i − y i | 2 L 2 ( x , y ) = i =1 The Manhattan or city block distance is the L 2 norm 3 D � L 1 ( x , y ) = | x i − y i | i =1 The L ∞ norm is the maximum of distances along axes 4 L ∞ ( x , y ) = max | x i − y i | i Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 6 / 17

  9. ����������� Nearest neighbor algorithm for regression The k -NN algorithm adapted for approximating continuous-valued target function. 1 We calculate the mean of k nearest neighborhood training examples rather than 2 � k i =1 f ( x i ) majority vote : ˆ f ( x ) = . k The effect of k on the performance of algorithm 1 3 1 Pictures are taken from P. Rai slide. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 7 / 17

  10. Nearest neighbor algorithms The k -NN algorithm is a lazy learning algorithm. 1 It defers the hypothesis finding until a test example x arrives. For test example x , the k -NN uses the stored training data. Discards the the found hypothesis and any intermediate results. This strategy is opposed to an eager learning algorithm which 2 It finds a hypothesis h using the training set It uses the found hypothesis h for classification of test example x . Trade offs 3 During training phase, lazy algorithms have fewer computational costs than eager algorithms. During testing phase, lazy algorithms have greater storage requirements and higher computational costs. What is inductive bias of k -NN? 4 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 8 / 17

  11. Properties of nearest neighbor algorithms Advantages 1 Analytically tractable Simple implementation Use local information, which results in highly adaptive behavior. It parallel implementation is very easy. Nearly optimal in the large sample ( N → ∞ ). E ( Bayes ) < E ( NN ) < 2 × E ( Bayes ) . Disadvantages 2 Large storage requirements. It needs a high computational cost during testing. Highly susceptible to the irrelevant features. Large values of k 3 Results in smoother decision boundaries. Provides more accurate probabilistic information But large values of k 4 Increases computational cost. Destroys the locality of estimation. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 9 / 17

  12. Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 10 / 17

  13. Distance-weighted nearest neighbor algorithms One refinement of k -NN is to weight the contribution of each k neighbors to their 1 distance to the query point x . For two class classification 2    � �  . h ( x ) = I w i > w i i , t i =1 i , t i =0 where 1 w i = d ( x , x i ) 2 For C class classification 3 k � h ( x ) = argmax w i δ ( c , t i ) . c ∈ C i =1 For regression 4 � k i =1 w i f ( x i ) ˆ f ( x ) = . w i Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 10 / 17

  14. Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 11 / 17

  15. Locally weighted regression In locally weighted regression (LWR), we use a linear model to do the local 1 approximation ˆ f : ˆ f ( x ) = w 0 + w 1 x 1 + w 2 x 2 + . . . + w D x D . Suppose we aim to minimize the total squared error: 2 E = 1 � ( f ( x ) − ˆ f ( x )) 2 2 x ∈ S Using gradient descent 3 � ( f ( x ) − ˆ ∆ w j = η f ( x )) x j x ∈ S where η is a small number (the learning rate). Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 11 / 17

  16. Locally weighted regression I How shall we modify this procedure to derive a local approximation rather than a 1 global one? The simple way is to redefine the error criterion E to emphasize fitting the local 2 training examples. Three possible criteria are given below. Note we write the error E ( x q ) to emphasize 3 the fact that now the error is being defined as a function of the query point x q . Minimize the squared error over just the k nearest neighbors: E 1 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 2 x ∈ KNN ( x q ) Minimize 1 squared error over the set S of training examples, while weighting the error of each training example by some decreasing function k of its distance from x q E 2 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 K ( d ( x q , x )) 2 x ∈ S Combine 1 and 2: E 3 ( x q ) = 1 � ( f ( x ) − ˆ f ( x )) 2 K ( d ( x q , x )) 2 x ∈ KNN ( x q ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 12 / 17

  17. Locally weighted regression II If we choose criterion three above and re-derive the gradient descent rule, we obtain 4 K ( d ( x q , x ))( f ( x ) − ˆ � ∆ w j = η f ( x )) x j x ∈ KNN ( x q ) where η is a small number (the learning rate). Criterion two is perhaps the most esthetically pleasing because it allows every 5 training example to have an impact on the classification of x q . However, this approach requires computation that grows linearly with the number 6 of training examples. Criterion (3) is a good approximation to criterion (2) and has the advantage that 7 computational cost is independent of the total number of training examples; its cost depends only on the number k of neighbors considered. Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 13 / 17

  18. Outline Introduction 1 Nearest neighbor algorithms 2 Distance-weighted nearest neighbor algorithms 3 Locally weighted regression 4 Finding KNN ( x ) efficiently 5 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 14 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend