CSC 311: Introduction to Machine Learning
Lecture 1 - Introduction and Nearest Neighbors Roger Grosse Chris Maddison Juhan Bae Silviu Pitis
University of Toronto, Fall 2020
Intro ML (UofT) CSC311-Lec1 1 / 55
CSC 311: Introduction to Machine Learning Lecture 1 - Introduction - - PowerPoint PPT Presentation
CSC 311: Introduction to Machine Learning Lecture 1 - Introduction and Nearest Neighbors Roger Grosse Chris Maddison Juhan Bae Silviu Pitis University of Toronto, Fall 2020 Intro ML (UofT) CSC311-Lec1 1 / 55 This course Broad
Intro ML (UofT) CSC311-Lec1 1 / 55
◮ Algorithms and principles for supervised learning ◮ nearest neighbors, decision trees, ensembles, linear regression,
◮ Unsupervised learning: PCA, K-means, mixture models ◮ Basics of reinforcement learning
Intro ML (UofT) CSC311-Lec1 2 / 55
Intro ML (UofT) CSC311-Lec1 3 / 55
Intro ML (UofT) CSC311-Lec1 4 / 55
Intro ML (UofT) CSC311-Lec1 5 / 55
Intro ML (UofT) CSC311-Lec1 6 / 55
Intro ML (UofT) CSC311-Lec1 7 / 55
◮ Combination of pen & paper derivations and programming exercises ◮ Weighted equally
◮ Worth 5%, honor system
◮ Your higher mark will count for 15%, and your lower mark for 10% ◮ See website for times and dates (tentative)
◮ Will require you to apply several algorithms to a challenge problem
◮ Due during the final evaluation period ◮ More details TBA Intro ML (UofT) CSC311-Lec1 8 / 55
Intro ML (UofT) CSC311-Lec1 9 / 55
Intro ML (UofT) CSC311-Lec1 10 / 55
Intro ML (UofT) CSC311-Lec1 11 / 55
◮ recognizing people and objects ◮ understanding human speech
Intro ML (UofT) CSC311-Lec1 12 / 55
◮ recognizing people and objects ◮ understanding human speech
◮ hard to code up a solution by hand (e.g. vision, speech) ◮ system needs to adapt to a changing environment (e.g. spam
◮ want the system to perform better than the human programmers ◮ privacy/fairness (e.g. ranking search results) Intro ML (UofT) CSC311-Lec1 12 / 55
◮ Both fields try to uncover patterns in data ◮ Both fields draw heavily on calculus, probability, and linear algebra,
◮ Stats is more concerned with helping scientists and policymakers
◮ Stats puts more emphasis on interpretability and mathematical
Intro ML (UofT) CSC311-Lec1 13 / 55
◮ Symbolic reasoning ◮ Rule based system ◮ Tree search ◮ etc.
Intro ML (UofT) CSC311-Lec1 14 / 55
◮ Very data efficient ◮ An entire multitasking system (vision, language, motor control, etc.) ◮ Takes at least a few years :)
Intro ML (UofT) CSC311-Lec1 15 / 55
◮ Supervised learning: have labeled examples of the correct
◮ Reinforcement learning: learning system (agent) interacts with
◮ Unsupervised learning: no labeled examples – instead, looking
Intro ML (UofT) CSC311-Lec1 16 / 55
◮ Connectionist psychologists explored neural models of cognition ◮ 1984 — Leslie Valiant formalized the problem of learning as PAC
◮ 1988 — Backpropagation (re-)discovered by Geoffrey Hinton and
◮ 1988 — Judea Pearl’s book Probabilistic Reasoning in Intelligent
Intro ML (UofT) CSC311-Lec1 17 / 55
◮ Markov chain Monte Carlo ◮ variational inference ◮ kernels and support vector machines ◮ boosting ◮ convolutional networks ◮ reinforcement learning
◮ 2010–2012 — neural nets smashed previous records in
◮ increasing adoption by the tech industry ◮ 2016 — AlphaGo defeated the human Go champion ◮ 2018-now — generating photorealistic images and videos ◮ 2020 — GPT3 language model
Intro ML (UofT) CSC311-Lec1 18 / 55
Link Intro ML (UofT) CSC311-Lec1 19 / 55
Intro ML (UofT) CSC311-Lec1 20 / 55
Intro ML (UofT) CSC311-Lec1 21 / 55
Link Intro ML (UofT) CSC311-Lec1 22 / 55
Intro ML (UofT) CSC311-Lec1 23 / 55
◮ E.g., try logistic regression before building a deep neural net!
Intro ML (UofT) CSC311-Lec1 24 / 55
Intro ML (UofT) CSC311-Lec1 25 / 55
◮ Is there a pattern to detect? ◮ Can I solve it analytically? ◮ Do I have data?
◮ Preprocessing, cleaning, visualizing.
Intro ML (UofT) CSC311-Lec1 26 / 55
◮ vectorize computations (express them in terms of matrix/vector
◮ This also makes your code cleaner and more readable! Intro ML (UofT) CSC311-Lec1 27 / 55
◮ automatic differentiation ◮ compiling computation graphs ◮ libraries of algorithms and network primitives ◮ support for graphics processing units (GPUs)
◮ So you know what to do if something goes wrong! ◮ Debugging learning algorithms requires sophisticated detective
◮ That’s why we derive things by hand in this class! Intro ML (UofT) CSC311-Lec1 28 / 55
Intro ML (UofT) CSC311-Lec1 29 / 55
Intro ML (UofT) CSC311-Lec1 30 / 55
Intro ML (UofT) CSC311-Lec1 31 / 55
◮ Representation = mapping to another space that’s easy to
◮ Vectors are a great representation since we can do linear algebra! Intro ML (UofT) CSC311-Lec1 32 / 55
Intro ML (UofT) CSC311-Lec1 33 / 55
◮ Regression: t is a real number (e.g. stock price) ◮ Classification: t is an element of a discrete set {1, . . . , C} ◮ These days, t is often a highly structured object (e.g. image)
◮ Note: these superscripts have nothing to do with exponentiation! Intro ML (UofT) CSC311-Lec1 34 / 55
j
j )2
x(i)∈train. set
Intro ML (UofT) CSC311-Lec1 35 / 55
Intro ML (UofT) CSC311-Lec1 36 / 55
Intro ML (UofT) CSC311-Lec1 37 / 55
Intro ML (UofT) CSC311-Lec1 38 / 55
[Pic by Olga Veksler]
Intro ML (UofT) CSC311-Lec1 39 / 55
[Pic by Olga Veksler]
Intro ML (UofT) CSC311-Lec1 39 / 55
[Pic by Olga Veksler]
t(z) k
Intro ML (UofT) CSC311-Lec1 39 / 55
Intro ML (UofT) CSC311-Lec1 40 / 55
Intro ML (UofT) CSC311-Lec1 41 / 55
◮ Good at capturing fine-grained patterns ◮ May overfit, i.e. be sensitive to random idiosyncrasies in the
◮ Makes stable predictions by averaging over lots of examples ◮ May underfit, i.e. fail to capture important regularities
◮ Optimal choice of k depends on number of data points n. ◮ Nice theoretical properties if k → ∞ and k
n → 0.
◮ Rule of thumb: choose k < √n. ◮ We can choose k using validation set (next slides). Intro ML (UofT) CSC311-Lec1 42 / 55
Intro ML (UofT) CSC311-Lec1 43 / 55
Intro ML (UofT) CSC311-Lec1 44 / 55
ǫ )d
Intro ML (UofT) CSC311-Lec1 45 / 55
Intro ML (UofT) CSC311-Lec1 46 / 55
Image credit: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_swiss_roll.html
Intro ML (UofT) CSC311-Lec1 47 / 55
Intro ML (UofT) CSC311-Lec1 48 / 55
◮ Calculuate D-dimensional Euclidean distances with N data points:
◮ Sort the distances: O(N log N)
Intro ML (UofT) CSC311-Lec1 49 / 55
Intro ML (UofT) CSC311-Lec1 50 / 55
◮ Distance measure: average distance between corresponding points
Intro ML (UofT) CSC311-Lec1 51 / 55
Intro ML (UofT) CSC311-Lec1 52 / 55
Intro ML (UofT) CSC311-Lec1 53 / 55
Intro ML (UofT) CSC311-Lec1 54 / 55
Intro ML (UofT) CSC311-Lec1 55 / 55