Lecture outline Introduction to the course Introduction to Machine - PowerPoint PPT Presentation

1 Introduction to Machine Learning Lecture 1: Introduction and Linear Regression Iasonas Kokkinos Iasonas.kokkinos@gmail.com University College London

2 Lecture outline Introduction to the course Introduction to Machine Learning Least squares

3 Machine Learning Principles, methods, and algorithms for learning and prediction based on past evidence Goal: Machines that perform a task based on experience, instead of explicitly coded instructions Why? • Crucial component of every intelligent/autonomous system • Important for a system’s adaptability • Important for a system’s generalization capabilities • Attempt to understand human learning

4 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi-supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: sparse reward for a sequence of decisions

5 Classification • Based on our experience, should we give a loan to this customer? – Binary decision: yes/no Decision boundary

6 Classification examples • Digit Recognition • Spam Detection • Face detection

7 `Faceness function’: classifier Decision boundary Background Face

8 Test time: deploy the learned function • Scan window over image – Multiple scales – Multiple orientations • Classify window as either: – Face – Non-face Face Window Classifier Non-face

9 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

10 Regression • Output: Continuous – E.g. price of a car based on years, mileage, condition, …

11 Computer vision example • Human estimation: from image to vector-valued pose estimate

13 Clustering • Break a set of data into coherent groups – Labels are `invented’

14 Clustering examples • Spotify recommendations

15 Clustering examples • Image segmentation

17 Dimensionality reduction & manifold learning • Find a low-dimensional representation of high-dimensional data – Continuous outputs are `invented’

18 Example of nonlinear manifold: faces Average of two faces is not a face 1 2( x 1 + x 2 ) x 2

19 Moving along the learned face manifold Trajectory along the “male” dimension Trajectory along the “young” dimension Lample et. al. Fader Networks, NIPS 2017

20 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi supervised Partially supervised • Reinforcement learning Supervision: reward for a sequence of decisions

21 Weakly supervised learning: only part of the supervision signal Supervision signal: “motorcycle” Inferred localization information

22 Weakly supervised learning: only part of the supervision signal Supervision signal: “motorcycle” Inferred localization information

23 Semi-supervised learning: only part of the data labelled Labelled data Labelled + unlabelled data

24 Machine Learning variants • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction • Weakly supervised/semi supervised learning Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

25 Reinforcement learning • Agent interacts with environment repeatedly – Take actions, based on state – (occasionally) receive rewards – Update state – Repeat • Goal: maximize cumulative reward

26 Reinforcement learning examples • Beat human champions in games Backgammon, 90’s GO, 2015 • Robotics

27 Focus of first part: supervised learning • Supervised – Classification – Regression • Unsupervised – Clustering – Dimensionality Reduction, Manifold Learning • Weakly supervised Some data supervised, some unsupervised • Reinforcement learning Supervision: reward for a sequence of decisions

28 Classification: yes/no decision

29 Regression: continuous output

30 What we want to learn: a function • Input-output mapping y = f w ( x )

31 What we want to learn: a function • Input-output mapping method prediction y = f w ( x ) Input parameters

32 What we want to learn: a function method prediction y = f w ( x ) Input parameters x ∈ R Calculus x ∈ R D Vector calculus Machine learning: can work also for discrete inputs, strings, trees, graphs, …

33 What we want to learn: a function method prediction y = f w ( x ) Input parameters y ∈ { 0 , 1 } Classification: y ∈ R Regression:

34 What we want to learn: a function method prediction y = f w ( x ) Linear classifiers, neural networks, decision trees, ensemble models, probabilistic classifiers, …

35 Example of method: K-nearest neighbor classifier X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor – Compute distance to other training records – Identify K nearest neighbors – Take majority vote

36 Training data for NN classifier (in R 2 )

37 1-nn classifier prediction (in R 2 )

38 3-nn classifier prediction

39 Method example: decision tree Machine learning: can work also for discrete inputs, strings, trees, graphs, …

40 Method example: decision tree

41 Method example: decision tree What is the depth of the decision tree for this problem?

42 Method example: linear classifier Feature coordinate j Feature coordinate i

43 Method example: neural network

46 We have two centuries of material to cover! https://en.wikipedia.org/wiki/Least_squares The first clear and concise exposition of the method of least squares was published by Legendre in 1805. The technique is described as an algebraic procedure for fitting linear equations to data and Legendre demonstrates the new method by analyzing the same data as Laplace for the shape of the earth. The value of Legendre's method of least squares was immediately recognized by leading astronomers and geodesists of the time

47 What we want to learn: a function • Input-output mapping method prediction y = f w ( x ) = f ( x ; w ) Input parameters w ∈ R w ∈ R K

48 Assumption: linear function y = f w ( x ) = f ( x , w ) = w T x Inner product: D X w T x = h w , x i = w d x d d =1 x ∈ R D , w ∈ R D

49 Reminder: linear classifier ⋅ + ≥ x positive : x w b 0 Feature coordinate j i i ⋅ + < x negative : x w b 0 i i Each data point has a class label: +1 ( ) y t = -1 ( ) Feature coordinate i

50 Question: which one? ⋅ + ≥ x positive : x w b 0 Feature coordinate j i i ⋅ + < x negative : x w b 0 i i Each data point has a class label: +1 ( ) y t = -1 ( ) Feature coordinate i

51 Linear regression in 1D

52 Linear regression in 1D Training set: input–output pairs S = { ( x i , y i ) } , i = 1 . . . , N x i ∈ R , y i ∈ R

53 Linear regression in 1D y i = w 0 + w 1 x i 1 + ✏ i = w 0 x i 0 + w 1 x i 1 + ✏ i , x i ∀ i 0 = 1 , = w T x i + ✏ i

54 Sum of squared errors criterion y i = w T x i + ✏ i Loss function: sum of squared errors N X ( ✏ i ) 2 L ( w ) = i =1 Expressed as a function of two variables: N �⇤ 2 X y i − w 0 x i 0 + w 1 x i ⇥ � L ( w 0 , w 1 ) = 1 i =1 Question: what is the best (or least bad) value of w? Answer: least squares

55 Calculus 101 f ( x ) x ∗ x

56 Calculus 101 f ( x ) x ∗ x x ∗ = argmax x f ( x )

57 Condition for maximum: derivative is zero f ( x ) x ∗ x x ∗ = argmax x f ( x )

58 Condition for maximum: derivative is zero f ( x ) x ∗ x x ∗ = argmax x f ( x ) f 0 ( x ⇤ ) = 0 →

59 Condition for minimum: derivative is zero x ∗ = argmin x f ( x ) f 0 ( x ⇤ ) = 0 →

60 Vector calculus 101 " # ∂ f f ( x ) ∂ x 1 f ( x ) = c r f ( x ) = ∂ f ∂ x 2 2D function graph isocontours gradient field r f ( x ) = 0 at minimum of function:

Lecture outline Introduction to the course Introduction to Machine - PowerPoint PPT Presentation

1 Introduction to Machine Learning Lecture 1: Introduction and Linear Regression Iasonas Kokkinos Iasonas.kokkinos@gmail.com University College London 2 Lecture outline Introduction to the course Introduction to Machine Learning Least

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Semantics & Verification Lecture 13 Gerd Behrmann Outline of remaining lectures Lecture

Semantics & Verification Lecture 14 Gerd Behrmann Outline of remaining lectures Lecture

18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 18.175 Lecture 13 Outline Legendre

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

18.175: Lecture 23 Random walks Scott Sheffield MIT 18.175 Lecture 23 1 Outline Random walks

18.175: Lecture 5 More integration and expectation Scott Sheffield MIT 1 18.175 Lecture 5 Outline

18.175: Lecture 18 Poisson random variables Scott Sheffield MIT 18.175 Lecture 18 1 Outline Extend

18.175: Lecture 4 Integration Scott Sheffield MIT 1 18.175 Lecture 4 Outline Integration

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

1 National Association of Counties Stepping Up Team Maeghan Gilmore Nastassia Walsh Program

Production Environment - The Studio Switching Video Effects Physical Layout Control Room

Design Psychology Understanding the mind of the user rather than trying to change it to fit

Adversarial Games for Particle Physics Dark Matter Gilles

Multimedia API for KDE 4 Where Were Coming From Media Frameworks KDE Multimedia Efforts

Unsupervised Relation Extraction from Web -Bhavishya Mittal (11198) - Vempati Anurag Sai

Journal of Systems and Information Technology Extending customer relationship management: from

GTC/Osiris spectra of z~1 superdense E/S0s Jess Martnez, Rafael Guzmn et al. ( UCM/IAC

Lecture outline Introduction to the course Introduction to Machine - PowerPoint PPT Presentation

1 Introduction to Machine Learning Lecture 1: Introduction and Linear Regression Iasonas Kokkinos Iasonas.kokkinos@gmail.com University College London 2 Lecture outline Introduction to the course Introduction to Machine Learning Least

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

Semantics &amp; Verification Lecture 13 Gerd Behrmann Outline of remaining lectures Lecture

Semantics &amp; Verification Lecture 14 Gerd Behrmann Outline of remaining lectures Lecture

18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 18.175 Lecture 13 Outline Legendre

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

18.175: Lecture 23 Random walks Scott Sheffield MIT 18.175 Lecture 23 1 Outline Random walks

18.175: Lecture 5 More integration and expectation Scott Sheffield MIT 1 18.175 Lecture 5 Outline

18.175: Lecture 18 Poisson random variables Scott Sheffield MIT 18.175 Lecture 18 1 Outline Extend

18.175: Lecture 4 Integration Scott Sheffield MIT 1 18.175 Lecture 4 Outline Integration

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

1 National Association of Counties Stepping Up Team Maeghan Gilmore Nastassia Walsh Program

Production Environment - The Studio Switching Video Effects Physical Layout Control Room

Design Psychology Understanding the mind of the user rather than trying to change it to fit

Adversarial Games for Particle Physics Dark Matter Gilles

Multimedia API for KDE 4 Where Were Coming From Media Frameworks KDE Multimedia Efforts

Unsupervised Relation Extraction from Web -Bhavishya Mittal (11198) - Vempati Anurag Sai

Journal of Systems and Information Technology Extending customer relationship management: from

GTC/Osiris spectra of z~1 superdense E/S0s Jess Martnez, Rafael Guzmn et al. ( UCM/IAC

Semantics & Verification Lecture 13 Gerd Behrmann Outline of remaining lectures Lecture

Semantics & Verification Lecture 14 Gerd Behrmann Outline of remaining lectures Lecture