Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Poˇ s´ ık Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Poˇ s´ ık c � 2020 Artificial Intelligence – 1 / 43

Nearest neighbors P. Poˇ s´ ık c � 2020 Artificial Intelligence – 2 / 43

Method of k nearest neighbors ■ Simple, non-parametric, instance-based method for supervised learning, applicable for both classification and regression . ■ Do not confuse k -NN with Nearest neighbors • kNN k -means (a clustering algorithm) ■ • Question • Class. example ■ NN (neural networks) • Regression example • k -NN Summary SVM Decision Trees Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 43

Method of k nearest neighbors ■ Simple, non-parametric, instance-based method for supervised learning, applicable for both classification and regression . ■ Do not confuse k -NN with Nearest neighbors • kNN k -means (a clustering algorithm) ■ • Question • Class. example ■ NN (neural networks) • Regression example • k -NN Summary ■ Training: Just remember the whole training dataset T . SVM ■ Prediction: To get the model prediction for a new data point x (query), Decision Trees ■ find the set N k ( x ) of k nearest neighbors of x in T using certain distance measure, Summary y = h ( x ) as the majority ■ in case of classification , determine the predicted class � vote among the nearest neighbors, i.e. I ( y ′ = y ) , ∑ y = h ( x ) = arg max � y ( x ′ , y ′ ) ∈ N k ( x ) where I ( P ) is an indicator function (returns 1 if P is true, 0 otherwise). ■ in case of regression , determine the predicted value � y = h ( x ) as the average of values y of the nearest neighbors, i.e. y = h ( x ) = 1 y ′ , ∑ � k ( x ′ , y ′ ) ∈ N k ( x ) ■ What is the influence of k to the final model? P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 43

Question The influence of method parameters on model flexibility: ■ Polynomial models: the larger the degree of the polynom, the higher the model flexibility. Nearest neighbors • kNN ■ Basis expansion: the more bases we derive, the higher the model flexibility. • Question • Class. example ■ Regularization: the higher the coefficient size penalty, the lower the model flexibility. • Regression example • k -NN Summary What is the influence of the number of neighbours k to the flexibility of k -NN? SVM Decision Trees Summary A The flexibility of k -NN does not depend on k . B The flexibility of k -NN grows with growing k . C The flexibility of k -NN drops with growing k . D The flexibility of k -NN first drops with growing k , then it grows again. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 43

KNN classification: Example 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 43

KNN classification: Example 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 ■ Only in 1-NN, all training examples are classified correctly (unless there are two exactly the same observations with a different evaluation). ■ Unbalanced classes may be an issue: the more frequent class takes over with increasing k . P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 43

k -NN Regression Example The training data: Nearest neighbors • kNN • Question • Class. example • Regression example • k -NN Summary SVM 10 Decision Trees Summary 5 10 0 5 10 5 0 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 43

k -NN regression example 10 10 10 5 5 5 10 10 10 0 0 0 5 5 5 10 10 10 5 5 5 0 0 0 10 10 10 5 5 5 10 10 10 0 0 0 5 5 5 10 10 10 5 5 5 0 0 0 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 43

k -NN regression example 10 10 10 5 5 5 10 10 10 0 0 0 5 5 5 10 10 10 5 5 5 0 0 0 10 10 10 5 5 5 10 10 10 0 0 0 5 5 5 10 10 10 5 5 5 0 0 0 ■ For small k , the surface is rugged. ■ For large k , too much averaging (smoothing) takes place. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 43

k -NN Summary Comments: ■ For 1-NN, the division of the input space into convex cells is called a Voronoi tessellation . Nearest neighbors • kNN ■ A weighted variant can be constructed: • Question • Class. example ■ Each of the k nearest neighbors has a weight inversely proportional to its • Regression example distance to the query point. • k -NN Summary ■ Prediction is then done using weighted voting (in case of classification) or SVM weighted averaging (in case of regression). Decision Trees Summary ■ In regression tasks, instead of averaging you can use e.g. (weighted) linear regression to compute the prediction. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 8 / 43

k -NN Summary Comments: ■ For 1-NN, the division of the input space into convex cells is called a Voronoi tessellation . Nearest neighbors • kNN ■ A weighted variant can be constructed: • Question • Class. example ■ Each of the k nearest neighbors has a weight inversely proportional to its • Regression example distance to the query point. • k -NN Summary ■ Prediction is then done using weighted voting (in case of classification) or SVM weighted averaging (in case of regression). Decision Trees Summary ■ In regression tasks, instead of averaging you can use e.g. (weighted) linear regression to compute the prediction. Advantages: ■ Simple and widely applicable method. ■ For both classification and regression tasks. ■ For both categorial and continuous predictors (independent variables). Disadvantages: ■ Must store the whole training set (there are methods for training set reduction). ■ During prediction, it must compute the distances to all the training data points (can be alleviated e.g. by using KD-tree structure for the training set). Overfitting prevention: ■ Choose the right value of k e.g. using crossvalidation. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 8 / 43

Support vector machine P. Poˇ s´ ık c � 2020 Artificial Intelligence – 9 / 43

Revision Optimal separating hyperplane: ■ A way to find a linear classifier optimal in certain sense by means of a quadratic program (dual task for soft margin version): Nearest neighbors SVM | T | | T | | T | • Revision α i α j y ( i ) y ( j ) x ( i ) x ( j ) T w.r.t. α 1 , . . . , α | T | , µ 1 , . . . , µ | T | , ∑ ∑ ∑ α i − • OSH + basis exp. maximize • Kernel trick i = 1 i = 1 j = 1 • SVM | T | • Linear SVM α i y ( i ) = 0. ∑ • Gaussian SVM subject to α i ≥ 0, µ i ≥ 0, α i + µ i = C , and • SVM: Summary i = 1 Decision Trees ■ The parameters of the hyperplane are given in terms of a weighted linear Summary combination of support vectors: | T | w 0 = y ( k ) − x ( k ) w T , α i y ( i ) x ( i ) , ∑ w = i = 1 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 10 / 43

Revision Optimal separating hyperplane: ■ A way to find a linear classifier optimal in certain sense by means of a quadratic program (dual task for soft margin version): Nearest neighbors SVM | T | | T | | T | • Revision α i α j y ( i ) y ( j ) x ( i ) x ( j ) T w.r.t. α 1 , . . . , α | T | , µ 1 , . . . , µ | T | , ∑ ∑ ∑ α i − • OSH + basis exp. maximize • Kernel trick i = 1 i = 1 j = 1 • SVM | T | • Linear SVM α i y ( i ) = 0. ∑ • Gaussian SVM subject to α i ≥ 0, µ i ≥ 0, α i + µ i = C , and • SVM: Summary i = 1 Decision Trees ■ The parameters of the hyperplane are given in terms of a weighted linear Summary combination of support vectors: | T | w 0 = y ( k ) − x ( k ) w T , α i y ( i ) x ( i ) , ∑ w = i = 1 Basis expansion: ■ Instead of a linear model � w , x � , create a linear model of nonlinearly transformed features � w ′ , Φ ( x ) � which represents a nonlinear model in the original space. P. Poˇ s´ ık c � 2020 Artificial Intelligence – 10 / 43

Revision Optimal separating hyperplane: ■ A way to find a linear classifier optimal in certain sense by means of a quadratic program (dual task for soft margin version): Nearest neighbors SVM | T | | T | | T | • Revision α i α j y ( i ) y ( j ) x ( i ) x ( j ) T w.r.t. α 1 , . . . , α | T | , µ 1 , . . . , µ | T | , ∑ ∑ ∑ α i − • OSH + basis exp. maximize • Kernel trick i = 1 i = 1 j = 1 • SVM | T | • Linear SVM α i y ( i ) = 0. ∑ • Gaussian SVM subject to α i ≥ 0, µ i ≥ 0, α i + µ i = C , and • SVM: Summary i = 1 Decision Trees ■ The parameters of the hyperplane are given in terms of a weighted linear Summary combination of support vectors: | T | w 0 = y ( k ) − x ( k ) w T , α i y ( i ) x ( i ) , ∑ w = i = 1 Basis expansion: ■ Instead of a linear model � w , x � , create a linear model of nonlinearly transformed features � w ′ , Φ ( x ) � which represents a nonlinear model in the original space. What if we put these two things together? P. Poˇ s´ ık c � 2020 Artificial Intelligence – 10 / 43

Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Po - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Po s k Czech Technical University in Prague Faculty of Electrical

Nearest neighbors. Kernel functions, SVM. Decision trees. Petr Pok Czech Technical

Approximate Nearest Neighbors Search Approximate Nearest Neighbors Search in High Dimensions in

K-Nearest Neighbors Nicolas Indelicato K-Nearest Neighbors Dataset Background How the

k-Nearest Neighbors Lecture 2 k-Nearest Neighbors September 16, 2015 1 Wentworth Institute of

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

SVM on Intel Graphics Jesse Barnes Intel Open Source Technology Center 1 What is SVM?

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

2009 Tobacco Control Update Supplemental Materials National Cancer Advisory Board February 3,

U:Kit open source software and hardware smoke detector Slavey Karadzhov slav@attachix.com

Tips on Vetting Pie in the Sky The Koala picks up the jetpack and everything turns 3d and

3D computer graphics Viktor Novienko Smithy of ideas 2011 Main idea I define for computer

Midland Section ACS Board Meeting February 3, 2020 Agenda Time Topic Presenter 7:00 Call to

What is organic? n USDA says its intended to promote and Making Your Garden Organic enhance

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng ,

A Three-Layer Planning Architecture for the Autonomous Control of Rehabilitation Therapies Based