introduction to machine learning
play

Introduction to Machine Learning COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Introduction to Machine Learning 1 / 18 Outline 1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and


  1. Introduction to Machine Learning COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Introduction to Machine Learning 1 / 18

  2. Outline 1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and Intuition in Higher Dimensions 4 Classification through Regression 5 Linear Separability COMPSCI 371D — Machine Learning Introduction to Machine Learning 2 / 18

  3. About Slides • By popular demand, lecture slides will be made available online • They will show up just before a lecture starts • Slides are grouped by topic, not by lecture • Slides are not for studying • Class notes and homework assignments are the materials of record COMPSCI 371D — Machine Learning Introduction to Machine Learning 3 / 18

  4. Classification, Regression, Unsupervised Learning Parenthesis: Supervised vs Unsupervised • Supervised: Train with ( x , y ) • Classification: Hand-written digit recognition • Regression: Median age of YouTube viewers for each video • Unsupervised: Train with x • Clustering: Color compression • Distances matter! • We will not cover unsupervised learning COMPSCI 371D — Machine Learning Introduction to Machine Learning 4 / 18

  5. Classification, Regression, Unsupervised Learning Machine Learning Terminology • Predictor h : X → Y (the signature of h ) • X ⊆ R d is the data space • Y (categorical) is the label space for a classifier • Y ( ⊆ R e ) is the value space for a regressor • A target is either a label or a value • H is the hypothesis space (all h we can choose from) • A training set is a subset T of X × Y = 2 X × Y is the class of all possible training sets def • T • Learner λ : T → H , so that λ ( T ) = h • ℓ ( y , ˆ y ) is the loss incurred for estimating ˆ y when the true prediction is y � N • L T ( h ) = 1 n = 1 ℓ ( y n , h ( x n )) is the empirical risk of h on T N COMPSCI 371D — Machine Learning Introduction to Machine Learning 5 / 18

  6. About Dimensionality H is Typically Parametric • For polynomials, h ↔ c • We write L T ( c ) instead of L T ( h ) • “Searching H ” means “find the parameters” ˆ m � A c − b � 2 c ∈ arg min c ∈ R • This is common in machine learning: h ( x ) = h θ ( x ) , • θ : a vector of parameters • Abstract view: ˆ h ∈ arg min h ∈H L T ( h ) • Concrete view: ˆ m L T ( θ ) θ ∈ arg min θ ∈ R • Minimize a function of real variables, rather than of “functions” COMPSCI 371D — Machine Learning Introduction to Machine Learning 6 / 18

  7. About Dimensionality Curb your Dimensions • For polynomials, h c ( x ) : X → Y x ∈ X ⊆ R d and c ∈ R m • We saw that d > 1 and degree k > 1 ⇒ m ≫ d � d + k � • Specifically, m ( d , k ) = k • Things blow up when k and d grow • More generally, h θ ( x ) : X → Y x ∈ X ⊆ R d and θ ∈ R m • Which dimension(s) do we want to curb? m ? d ? • Both , for different but related reasons COMPSCI 371D — Machine Learning Introduction to Machine Learning 7 / 18

  8. About Dimensionality Problem with m Large • Even just for data fitting , we generally want N ≫ m , i.e. , (possibly many) more samples than parameters to estimate • For instance, in A c = b , we want A to have more rows than columns • Remember that annotating training data is costly • So we want to curb m : We want a small H COMPSCI 371D — Machine Learning Introduction to Machine Learning 8 / 18

  9. About Dimensionality Problems with d Large • We do machine learning , not just data fitting! • We want h to generalize to new data • During training, we would like the learner to see a good sampling of all possible x (“fill X nicely”) • With large d , this is impossible: The curse of dimensionality COMPSCI 371D — Machine Learning Introduction to Machine Learning 9 / 18

  10. Drawings and Intuition in Higher Dimensions Drawings Help Intuition COMPSCI 371D — Machine Learning Introduction to Machine Learning 10 / 18

  11. Drawings and Intuition in Higher Dimensions Intuition Often Fails in Many Dimensions 1 1 −ε / 2 1 • Gray parts dominate when d → ∞ • Distance from center to corners diverges when d → ∞ COMPSCI 371D — Machine Learning Introduction to Machine Learning 11 / 18

  12. Classification through Regression Classifiers as Partitions of X def = h − 1 ( y ) X y partitions X (not just T !) • Classifier = partition • S = h − 1 ( red square ) , C = h − 1 ( blue circle ) COMPSCI 371D — Machine Learning Introduction to Machine Learning 12 / 18

  13. Classification through Regression Classification, Geometry, and Regression • Classification partitions X ⊂ R d into sets • How do we represent sets ⊂ R d ? How do we work with them? • We’ll see a couple of ways: nearest-neighbor classifier, decision trees • These methods have a strong geometric flavor • Beware of our intuition! • Another technique: score-based classifiers i.e. , classification through regression COMPSCI 371D — Machine Learning Introduction to Machine Learning 13 / 18

  14. Classification through Regression Score-Based Classifiers s=0 s > 0 s < 0 Score Function [Figure adapted from Wei et al. , Structural and Multidisciplinary Optimization , 58:831–849, 2018] • s = 0 defines the decision boundaries • s > 0 and s < 0 defines the (two) decision regions COMPSCI 371D — Machine Learning Introduction to Machine Learning 14 / 18

  15. Classification through Regression Score-Based Classifiers • Threshold some score function s ( x ) : • Example: 's' (red squares) and 'c' (blue circles) • Correspond to two sets S ⊆ X and C = X \ S If we can estimate something like s ( x ) = P [ x ∈ S ] � 's' if s ( x ) > 1 / 2 h ( x ) = otherwise 'c' COMPSCI 371D — Machine Learning Introduction to Machine Learning 15 / 18

  16. Classification through Regression Classification through Regression • If you prefer 0 as a threshold, let s ( x ) = 2 P [ x ∈ S ] − 1 ∈ [ − 1 , 1 ] � 's' if s ( x ) > 0 h ( x ) = otherwise 'c' • Scores are convenient even without probabilities, because they are easy to work with • We implement a classifier h by building a regressor s • Example: Logistic-regression classifiers COMPSCI 371D — Machine Learning Introduction to Machine Learning 16 / 18

  17. Linear Separability Linearly Separable Training Sets • Some line (hyperplane in R d ) separates C , S • Requires much smaller H • Simplest score: s ( x ) = b + w T x . The line is s ( x ) = 0 � 's' if s ( x ) > 0 h ( x ) = otherwise 'c' COMPSCI 371D — Machine Learning Introduction to Machine Learning 17 / 18

  18. Linear Separability Data Representation? • Linear separability is a property of the data in a given representation Δ r r • Xform 1: z = x 2 1 + x 2 2 implies x ∈ S ⇔ a ≤ z ≤ b � • Xform 2: z = | x 2 1 + x 2 2 − r | implies linear separability: x ∈ S ⇔ z ≤ ∆ r COMPSCI 371D — Machine Learning Introduction to Machine Learning 18 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend