CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D] Chapt. 3, [B] Sec. 2.5.2, [M] Sec. 1.4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Inductive Learning (recap) • Induction – Given a training set of examples of the form (", $ " ) • " is the input, $(") is the output – Return a function ℎ that approximates $ • ℎ is called the hypothesis University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Supervised Learning • Two types of problems 1. Classification 2. Regression NB: The nature (categorical or continuous) of the • domain (input space) of ! does not matter University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Classification Example • Problem: Will you enjoy an outdoor sport based on the weather? • Training set: Sky Humidity Wind Water Forecast EnjoySport Sunny Normal Strong Warm Same yes Sunny High Strong Warm Same yes Sunny High Strong Warm Change no Sunny High Strong Cool Change yes 8 9(8) • Possible Hypotheses: – ℎ " : $ = &'(() → +(,-)$.-/0 = )+& – ℎ 1 : 23 = 4--5 or 6 = &37+ → +(,-)$.-/0 = )+& University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Regression Example • Find function ℎ that fits " at instances # University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

More Examples Problem Domain Range Classification / Regression Spam Detection Stock price prediction Speech recognition Digit recognition Housing valuation Weather prediction University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Hypothesis Space • Hypothesis space ! – Set of all hypotheses ℎ that the learner may consider – Learning is a search through hypothesis space • Objective: find ℎ that minimizes – Misclassification – Or more generally some error function with respect to the training examples • But what about unseen examples? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Generalization • A good hypothesis will generalize well – i.e., predict unseen examples correctly • Usually … – Any hypothesis ℎ found to approximate the target function " well over a sufficiently large set of training examples will also approximate the target function well over any unobserved examples University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Inductive Learning • Goal: find an ℎ that agrees with " on training set – ℎ is consistent if it agrees with " on all examples • Finding a consistent hypothesis is not always possible – Insufficient hypothesis space: • E.g., it is not possible to learn exactly " # = %# + ' + #()*(#) when - = space of polynomials of finite degree – Noisy data • E.g., in weather prediction, identical conditions may lead to rainy and sunny days University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Inductive Learning • A learning problem is realizable if the hypothesis space contains the true function otherwise it is unrealizable . – Difficult to determine whether a learning problem is realizable since the true function is not known • It is possible to use a very large hypothesis space – For example: H = class of all Turing machines • But there is a tradeoff between expressiveness of a hypothesis class and the complexity of finding a good hypothesis University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Nearest Neighbour Classification • Classification function ℎ " = $ % ∗ where $ % ∗ is the label associated with the nearest neighbour " ∗ = '()*+, % - .(", " 1 ) • Distance measures: . ", " 1 7 " 6 − " 6 3 4 : . ", " 1 = ∑ 6 1 1 9 4/9 7 " 6 − " 6 3 9 : . ", " 1 = ∑ 6 … 1 ; 4/; 7 " 6 − " 6 3 ; : . ", " 1 = ∑ 6 1 ; 4/; 7 < Weighted dimensions: . ", " 1 = ∑ 6 6 " 6 − " 6 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Voronoi Diagram • Partition implied by nearest neighbor fn ℎ – Assuming Euclidean distance University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

K-Nearest Neighbour • Nearest neighbour often instable (noise) • Idea: assign most frequent label among k- nearest neighbours – Let !"" # be the ! -nearest neighbours of # according to distance $ – Label: % & ← ()$*( % & , # - ∈ !""(#) ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Effect of ! • ! controls the degree of smoothing. • Which partition do you prefer? Why? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Performance of a learning algorithm A learning algorithm is good if it produces a • hypothesis that does a good job of predicting classifications of unseen examples Verify performance with a test set • 1. Collect a large set of examples 2. Divide into 2 disjoint sets: training set and test set 3. Learn hypothesis ℎ with training set 4. Measure percentage of correctly classified examples by ℎ in the test set University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

The effect of K • Best ! depends on – Problem – Amount of training data % correct K University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Underfitting • Definition: underfitting occurs when an algorithm finds a hypothesis ℎ with training accuracy that is lower than the future accuracy of some other hypothesis ℎ’ • Amount of underfitting of ℎ : #$% {0, max ,- ./0/12344/1$45 ℎ′ − 01$89344/1$45 ℎ } ≈ #$% {0, max ,- 02<0344/1$45 ℎ′ − 01$89344/1$45 ℎ } • Common cause: – Classifier is not expressive enough University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Overfitting • Definition: overfitting occurs when an algorithm finds a hypothesis ℎ with higher training accuracy than its future accuracy. • Amount of overfitting of ℎ : max {0, ()*+,-../)*.0 ℎ − 2/(/)3-../)*.0 ℎ } ≈ max {0, ()*+,-../)*.0 ℎ − (36(-../)*.0 ℎ } • Common causes: – Classifier is too expressive – Noisy data – Lack of data University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Choosing K • How should we choose K? – Ideally: select K with highest future accuracy – Alternative: select K with highest test accuracy • Problem: since we are choosing K based on the test set, the test set effectively becomes part of the training set when optimizing K. Hence, we cannot trust anymore the test set accuracy to be representative of future accuracy. • Solution: split data into training, validation and test sets – Training set: compute nearest neighbour – Validation set: optimize hyperparameters such as K – Test set: measure performance University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Choosing K based on Validation Set Let ! be the number of neighbours For ! = 1 to max # of neighbours ℎ $ ← &'()*(!, &'()*)*-.(&() (001'(02 $ ← &34&(ℎ $ , 5(6)7(&)8*.(&() ! ∗ ← ('-:(; $ (001'(02 $ ℎ ← &'()*(! ∗ , &'()*)*-.(&( ∪ 5(6)7(&)8*.(&() (001'(02 ← &34&(ℎ, &34&.(&() Return ! ∗ , ℎ, (001'(02 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

Robust validation • How can we ensure that validation accuracy is representative of future accuracy? – Validation accuracy becomes more reliable as we increase the size of the validation set – However, this reduces the amount of data left for training • Popular solution: cross-validation University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Cross-Validation • Repeatedly split training data in two parts, one for training and one for validation. Report the average validation accuracy. • ! -fold cross validation : split training data in " equal size subsets. Run " experiments, each time validating on one subset and training on the remaining subsets. Compute the average validation accuracy of the " experiments. • Picture: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

Selecting the Number of Neighbours by Cross-Validation Let ! be the number of neighbours Let !′ be the number of trainingData splits For ! = 1 to max # of neighbours For $ = 1 to !′ do (where $ indexes trainingData splits) ℎ &' ← )*+$,(!, )*+$,$,/0+)+ 1..'31,'41..& 5 ) +778*+79 &' ← ):;)(ℎ &' , )*+$,$,/0+)+ ' ) +778*+79 & ← +<:*+/:( +778*+79 &' ∀' ) ! ∗ ← +*/?+@ & +778*+79 & ℎ ← )*+$,(! ∗ , )*+$,$,/0+)+ 1..& 5 ) +778*+79 ← ):;)(ℎ, ):;)0+)+) Return ! ∗ , ℎ, +778*+79 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

Weighted K-Nearest Neighbour • We can often improve K-nearest neighbours by weighting each neighbour based on some distance measure 1 ! ", " $ ∝ '()*+,-. ", " $ • Label !(", " $ ) / 0 ← +234+" 5 6 {0 8 |0 8 ∈;<< 0 ∧ 5>5 ?8 } where C,, " is the set of K nearest neighbours of " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D] Chapt. 3, [B] Sec. 2.5.2, [M] Sec. 1.4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Inductive Learning (recap) Induction

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Proposition #1 VOTING IS MARCH 22nd; 1PM-9PM AT CE & EJA SCHOOLS Instead of this: We could

Future Space The team that helps you connect with your future: Careers Skills audit, CV

Augmented Reality: Fusing the Real and Synthetic Worlds Mauro Dalla Mura, Michele Zanin,

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Testbed for Wearable Sensors (IGNSS 2016 UNSW) 7 December 2016 PSI I nnovation,

Welcome to Ralph Thoresby School Mr Stubbs and Ms Leader Ralph Thoresby & UFCA Partnership

Arbitrages and pricing of stock options Gonzalo Mateos Dept. of ECE and Goergen Institute for

Optimal Bookmaking Bin Zou University of Connecticut Financial/Actuarial Mathematics Seminar

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. - PowerPoint PPT Presentation

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D] Chapt. 3, [B] Sec. 2.5.2, [M] Sec. 1.4.2 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Inductive Learning (recap) Induction

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Proposition #1 VOTING IS MARCH 22nd; 1PM-9PM AT CE &amp; EJA SCHOOLS Instead of this: We could

Future Space The team that helps you connect with your future: Careers Skills audit, CV

Augmented Reality: Fusing the Real and Synthetic Worlds Mauro Dalla Mura, Michele Zanin,

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Testbed for Wearable Sensors (IGNSS 2016 UNSW) 7 December 2016 PSI I nnovation,

Welcome to Ralph Thoresby School Mr Stubbs and Ms Leader Ralph Thoresby &amp; UFCA Partnership

Arbitrages and pricing of stock options Gonzalo Mateos Dept. of ECE and Goergen Institute for

Optimal Bookmaking Bin Zou University of Connecticut Financial/Actuarial Mathematics Seminar

Proposition #1 VOTING IS MARCH 22nd; 1PM-9PM AT CE & EJA SCHOOLS Instead of this: We could

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Welcome to Ralph Thoresby School Mr Stubbs and Ms Leader Ralph Thoresby & UFCA Partnership