Introduction to Machine Learning COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Introduction to Machine Learning COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Introduction to Machine Learning 1 / 18

Outline 1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and Intuition in Higher Dimensions 4 Classification through Regression 5 Linear Separability COMPSCI 371D — Machine Learning Introduction to Machine Learning 2 / 18

About Slides • By popular demand, lecture slides will be made available online • They will show up just before a lecture starts • Slides are grouped by topic, not by lecture • Slides are not for studying • Class notes and homework assignments are the materials of record COMPSCI 371D — Machine Learning Introduction to Machine Learning 3 / 18

Classification, Regression, Unsupervised Learning Parenthesis: Supervised vs Unsupervised • Supervised: Train with ( x , y ) • Classification: Hand-written digit recognition • Regression: Median age of YouTube viewers for each video • Unsupervised: Train with x • Clustering: Color compression • Distances matter! • We will not cover unsupervised learning COMPSCI 371D — Machine Learning Introduction to Machine Learning 4 / 18

Classification, Regression, Unsupervised Learning Machine Learning Terminology • Predictor h : X → Y (the signature of h ) • X ⊆ R d is the data space • Y (categorical) is the label space for a classifier • Y ( ⊆ R e ) is the value space for a regressor • A target is either a label or a value • H is the hypothesis space (all h we can choose from) • A training set is a subset T of X × Y = 2 X × Y is the class of all possible training sets def • T • Learner λ : T → H , so that λ ( T ) = h • ℓ ( y , ˆ y ) is the loss incurred for estimating ˆ y when the true prediction is y � N • L T ( h ) = 1 n = 1 ℓ ( y n , h ( x n )) is the empirical risk of h on T N COMPSCI 371D — Machine Learning Introduction to Machine Learning 5 / 18

About Dimensionality H is Typically Parametric • For polynomials, h ↔ c • We write L T ( c ) instead of L T ( h ) • “Searching H ” means “find the parameters” ˆ m � A c − b � 2 c ∈ arg min c ∈ R • This is common in machine learning: h ( x ) = h θ ( x ) , • θ : a vector of parameters • Abstract view: ˆ h ∈ arg min h ∈H L T ( h ) • Concrete view: ˆ m L T ( θ ) θ ∈ arg min θ ∈ R • Minimize a function of real variables, rather than of “functions” COMPSCI 371D — Machine Learning Introduction to Machine Learning 6 / 18

About Dimensionality Curb your Dimensions • For polynomials, h c ( x ) : X → Y x ∈ X ⊆ R d and c ∈ R m • We saw that d > 1 and degree k > 1 ⇒ m ≫ d � d + k � • Specifically, m ( d , k ) = k • Things blow up when k and d grow • More generally, h θ ( x ) : X → Y x ∈ X ⊆ R d and θ ∈ R m • Which dimension(s) do we want to curb? m ? d ? • Both , for different but related reasons COMPSCI 371D — Machine Learning Introduction to Machine Learning 7 / 18

About Dimensionality Problem with m Large • Even just for data fitting , we generally want N ≫ m , i.e. , (possibly many) more samples than parameters to estimate • For instance, in A c = b , we want A to have more rows than columns • Remember that annotating training data is costly • So we want to curb m : We want a small H COMPSCI 371D — Machine Learning Introduction to Machine Learning 8 / 18

About Dimensionality Problems with d Large • We do machine learning , not just data fitting! • We want h to generalize to new data • During training, we would like the learner to see a good sampling of all possible x (“fill X nicely”) • With large d , this is impossible: The curse of dimensionality COMPSCI 371D — Machine Learning Introduction to Machine Learning 9 / 18

Drawings and Intuition in Higher Dimensions Drawings Help Intuition COMPSCI 371D — Machine Learning Introduction to Machine Learning 10 / 18

Drawings and Intuition in Higher Dimensions Intuition Often Fails in Many Dimensions 1 1 −ε / 2 1 • Gray parts dominate when d → ∞ • Distance from center to corners diverges when d → ∞ COMPSCI 371D — Machine Learning Introduction to Machine Learning 11 / 18

Classification through Regression Classifiers as Partitions of X def = h − 1 ( y ) X y partitions X (not just T !) • Classifier = partition • S = h − 1 ( red square ) , C = h − 1 ( blue circle ) COMPSCI 371D — Machine Learning Introduction to Machine Learning 12 / 18

Classification through Regression Classification, Geometry, and Regression • Classification partitions X ⊂ R d into sets • How do we represent sets ⊂ R d ? How do we work with them? • We’ll see a couple of ways: nearest-neighbor classifier, decision trees • These methods have a strong geometric flavor • Beware of our intuition! • Another technique: score-based classifiers i.e. , classification through regression COMPSCI 371D — Machine Learning Introduction to Machine Learning 13 / 18

Classification through Regression Score-Based Classifiers s=0 s > 0 s < 0 Score Function [Figure adapted from Wei et al. , Structural and Multidisciplinary Optimization , 58:831–849, 2018] • s = 0 defines the decision boundaries • s > 0 and s < 0 defines the (two) decision regions COMPSCI 371D — Machine Learning Introduction to Machine Learning 14 / 18

Classification through Regression Score-Based Classifiers • Threshold some score function s ( x ) : • Example: 's' (red squares) and 'c' (blue circles) • Correspond to two sets S ⊆ X and C = X \ S If we can estimate something like s ( x ) = P [ x ∈ S ] � 's' if s ( x ) > 1 / 2 h ( x ) = otherwise 'c' COMPSCI 371D — Machine Learning Introduction to Machine Learning 15 / 18

Classification through Regression Classification through Regression • If you prefer 0 as a threshold, let s ( x ) = 2 P [ x ∈ S ] − 1 ∈ [ − 1 , 1 ] � 's' if s ( x ) > 0 h ( x ) = otherwise 'c' • Scores are convenient even without probabilities, because they are easy to work with • We implement a classifier h by building a regressor s • Example: Logistic-regression classifiers COMPSCI 371D — Machine Learning Introduction to Machine Learning 16 / 18

Linear Separability Linearly Separable Training Sets • Some line (hyperplane in R d ) separates C , S • Requires much smaller H • Simplest score: s ( x ) = b + w T x . The line is s ( x ) = 0 � 's' if s ( x ) > 0 h ( x ) = otherwise 'c' COMPSCI 371D — Machine Learning Introduction to Machine Learning 17 / 18

Linear Separability Data Representation? • Linear separability is a property of the data in a given representation Δ r r • Xform 1: z = x 2 1 + x 2 2 implies x ∈ S ⇔ a ≤ z ≤ b � • Xform 2: z = | x 2 1 + x 2 2 − r | implies linear separability: x ∈ S ⇔ z ≤ ∆ r COMPSCI 371D — Machine Learning Introduction to Machine Learning 18 / 18

Introduction to Machine Learning COMPSCI 371D Machine Learning - PowerPoint PPT Presentation

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Introduction to Machine Learning 1 / 18 Outline 1 Classification, Regression, Unsupervised Learning 2 About Dimensionality 3 Drawings and

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning Andrea Passerini passerini@disi.unitn.it Machine Learning

Slides and Sections in PowerPoint Question: Is there a faster way to navigate through slides

Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 18-19-20 Natural

with Emotion and Personality: Mind (Brain Internal States) August 12th, 2019 Soo-Y oung Lee

and effects of Priming. Mentor: Prof. Amitabha Mukerjee Project by Nitica Sakharwade SE367

SECURITY AND PRIVACY OF MACHINE LEARNING Ian Goodfellow Staff Research Scientist Google Brain

Privilege and Responsibility Personal and Social Freedom in a

Potential evaporation vs. available heat flux R N - H G Atm S 547 Lecture 11, Slide 1

JUST THE MATHS SLIDES NUMBER 16.5 LAPLACE TRANSFORMS 5 (The Heaviside step function) by