Linear Models EECS 442 David Fouhey Fall 2019, University of - PowerPoint PPT Presentation

(Mainly ) Linear Models EECS 442 – David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/

Next Few Classes • Machine Learning (ML) Crash Course • I can’t cover everything • If you can, take a ML course or learn online • ML really won’t solve all problems and is incredibly dangerous if misused • But ML is a powerful tool and not going away

Terminology • ML is incredibly messy terminology-wise. • Most things have at lots of names. • I will try to write down multiple of them so if you see it later you’ll know what it is.

Pointers Useful book (Free too!): The Elements of Statistical Learning Hastie, Tibshirani, Friedman https://web.stanford.edu/~hastie/ElemStatLearn/ Useful set of data: UCI ML Repository https://archive.ics.uci.edu/ml/datasets.html A lot of important and hard lessons summarized: https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Machine Learning (ML) • Goal: make “sense” of data • Overly simplified version: transform vector x into vector y =T (x) that’s somehow better • Potentially you fit T using pairs of datapoints and desired outputs ( x i , y i ), or just using a set of datapoints ( x i ) • Always are trying to find some transformation that minimizes or maximizes some objective function or goal.

Machine Learning Input: x Output: y Feature vector/Data point: Label / target: Vector representation of Fixed length vector of desired datapoint. Each dimension or output. Each dimension “ feature ” represents some represents some aspect of the aspect of the data. output data Supervised : we are given y. Unsupervised : we are not, and make our own ys.

Example – Health Input: x in R N Output: y Blood pressure P(Has Diabetes) 50 f(Wx) Heart Rate P(No Diabetes) 60 … Glucose Level 0.2 Intuitive objective function : Want correct category to be likely with our model.

Example – Health Input: x in R N Output: y Blood pressure 50 Wx Heart Rate Age 60 … Glucose Level 0.2 Intuitive objective function : Want our prediction of age to be “close” to true age.

Example – Health Input: x in R N Output: discrete y (unsupervised) Blood pressure User group 1 50 0/1 f(x) Heart Rate User group 2 60 0/1 … … 0/1 User group K Glucose Level 0.2 Intuitive objective function : Want to find K groups that explain the data we see.

Example – Health Input: x in R N Output: continuous y (discovered) Blood pressure User dimension 1 50 0.2 Wx Heart Rate User dimension 2 60 1.3 … … 0.7 User dimension K Glucose Level 0.2 Intuitive objective function : Want to K dimensions (often two) that are easier to understand but capture the variance of the data.

Example – Credit Card Fraud Input: x in R N Output: y Bought before P(Fraud) 0 f(Wx) Amount P(No Fraud) $12 … Near Billing 1 Address Intuitive objective function : Want correct category to be likely with our model.

Example – Computer Vision Input: x in R N Output: y Pixel at (0,0) P(Cat) f(Wx) Pixel at (0,1) P(Dog) … … Pixel at (H-1,W-1) P(Bird) Intuitive objective function : Want correct category to be likely with our model.

Example – Computer Vision Input: x in R N Output: y Count of P(Cat) visual cluster 1 Count of f(Wx) P(Dog) visual cluster 2 … … Count of P(Bird) visual cluster K Intuitive objective function : Want correct category to be likely with our model.

Example – Computer Vision Input: x in R N Output: y f 1 (Image) P(Cat) f(Wx) f 2 (Image) P(Dog) … … f N (Image) P(Bird) Intuitive objective function : Want correct category to be likely with our model.

Abstractions • Throughout, assume we’ve converted data into a fixed-length feature vector. There are well- designed ways for doing this. • But remember it could be big! • Image (e.g., 224x224x3): 151K dimensions • Patch (e.g., 32x32x3) in image: 3072 dimensions

ML Problems in Vision Image credit: Wikipedia

ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Output Categorization Continuous Output Slide adapted from J. Hays

ML Problem Examples in Vision Cat egorization/Classification Binning into K mutually-exclusive categories P(Cat) 0.9 P(Dog) 0.1 … P(Bird) 0.0 Image credit: Wikipedia

ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Output Categorization Continuous Regression Output Slide adapted from J. Hays

ML Problem Examples in Vision Regression Estimating continuous variable(s) 3.6 Cat weight kg Image credit: Wikipedia

ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Clustering Output Categorization Continuous Regression Output Slide adapted from J. Hays

ML Problem Examples in Vision Clustering Given a set of cats, automatically discover clusters or cat egories. 1 2 3 4 5 6 Image credit: Wikipedia, cattime.com

ML Problem Examples in Vision Unsupervised Supervised (Just Data) (Data+Labels) Discrete Classification/ Clustering Output Categorization Continuous Dimensionality Regression Output Reduction Slide adapted from J. Hays

ML Problem Examples in Vision Dimensionality Reduction Find dimensions that best explain the whole image/input Cat size in image Location of cat in image For ordinary images, this is currently a totally hopeless task. For certain images (e.g., faces, this works reasonably well) Image credit: Wikipedia

Practical Example • ML has a tendency to be mysterious • Let’s start with: • A model you learned in middle/high school (a line) • Least-squares • One thing to remember: • N eqns, <N vars = overdetermined (will have errors) • N eqns, N vars = exact solution • N eqns, >N vars = underdetermined (infinite solns)

Example – Least Squares Let’s make the world’s worst weather model Data: (x 1 ,y 1 ), (x 2 ,y 2 ), …, ( x k ,y k ) Model: (m,b) y i =mx i +b Or ( w ) y i = w T x i Objective function: (y i - w T x i ) 2

World’s Worst Weather Model Given latitude (distance above equator), predict temperature by fitting a line Temp City Latitude (°) Temp (F) Ann Arbor 42 33 Washington, DC 39 38 Austin, TX 30 62 Mexico City 19 67 Latitude Panama City 9 83

Example – Least Squares 𝑙 𝑧 𝑗 − 𝒙 𝑈 𝒚 𝒋 2 2 ෍ 𝒛 − 𝒀𝒙 2 𝑗=1 Inputs : Model / Weights : Output : Temperature Latitude, 1 Latitude, “Bias” 𝑧 1 𝑦 1 1 𝑛 ⋮ ⋮ ⋮ 𝒀 = 𝒙 = 𝒛 = 𝑐 𝑧 𝑙 𝑦 𝑙 1

Example – Least Squares 𝑙 𝑧 𝑗 − 𝒙 𝑈 𝒚 𝒋 2 2 ෍ 𝒛 − 𝒀𝒙 2 𝑗=1 Inputs: Model/Weights: Output: Temperature Latitude, 1 Latitude, “Bias” 42 1 33 𝑛 𝒀 = ⋮ ⋮ 𝒙 = 𝒛 = ⋮ 𝑐 9 1 83 Intuitively why do we add a one to the inputs?

Example – Least Squares 2 arg min 𝒛 − 𝒀𝒙 2 or 𝒙 Training ( x i ,y i ): 𝑜 𝒙 𝑈 𝒚 𝒋 − 𝑧 𝑗 2 arg min 𝒙 ෍ 𝑗=1 Loss function/objective : evaluates correctness. Here: Squared L2 norm / Sum of Squared Errors Training/Learning/Fitting: try to find model that optimizes / minimizes an objective / loss function 𝒙 ∗ = 𝒀 𝑈 𝒀 −1 𝒀 𝑈 𝒛 Optimal w* is

Example – Least Squares 2 arg min 𝒛 − 𝒀𝒙 2 or 𝒙 Training ( x i ,y i ): 𝑜 𝒙 𝑈 𝒚 𝒋 − 𝑧 𝑗 2 arg min 𝒙 ෍ 𝑗=1 𝒙 𝑈 𝒚 = 𝑥 1 𝑦 1 + ⋯ + 𝑥 𝐺 𝑦 𝐺 Inference (x): Testing/Inference: Given a new output, what’s the prediction?

Least Squares: Learning Data Model City Latitude Temp Ann Arbor 42 33 Washington, DC 39 38 Temp = Austin, TX 30 62 -1.47*Lat + 97 Mexico City 19 67 Panama City 9 83 42 1 33 𝒀 𝑈 𝒀 −1 𝒀 𝑈 𝒛 39 1 38 𝑥 2𝑦1 = −1.47 𝒀 5𝑦2 = 𝒛 5𝑦1 = 30 1 62 97 19 1 67 9 1 83

Let’s Predict The Weather EECS 442 City Latitude Temp Temp Error Ann Arbor 42 33 35.3 2.3 Washington, DC 39 38 39.7 1.7 Austin, TX 30 62 52.9 10.9 Mexico City 19 67 69.1 2.1 Panama City 9 83 83.8 0.8

Is This a Minimum Viable Product? EECS 442 Pittsburgh : Actual Pittsburgh: Temp = -1.47*40 + 97 = 38 45 Berkeley : Actual Berkeley: Temp = -1.47*38 + 97 = 41 53 Sydney : Actual Sydney: Temp = -1.47*-33 + 97 = 146 74 Won’t do so well in the Australian market…

Where Can This Go Wrong?

Where Can This Go Wrong? Data Model City Latitude Temp Temp = Ann Arbor 42 33 -1.66*Lat + 103 Washington, DC 39 38 How well can we predict Ann Arbor and DC and why?

Always Need Separated Testing Model might be fit data too precisely “ overfitting ” Remember: #datapoints = #params = perfect fit Model may only work under some conditions (e.g., trained on northern hemisphere). Sydney : Temp = -1.47*-33 + 97 = 146

Training and Testing Fit model parameters on training set; evaluate on entirely unseen test set. Training Test “It’s tough to make predictions, especially about the future” -Yogi Berra Nearly any model can predict data it’s seen. If your model can’t accurately interpret “unseen” data, it’s probably useless. We have no clue whether it has just memorized.

Linear Models EECS 442 David Fouhey Fall 2019, University of - PowerPoint PPT Presentation

(Mainly ) Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Next Few Classes Machine Learning (ML) Crash Course I cant cover everything If you can,

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

Money Smart Week: Debt Management / Credit Repair State of Connecticut Department of Banking

SCAMS FRAUD and ABUSE! Observations and Lessons Even the best of us can let out guard

Todays Presenters Ken McDonnell Financial Education Program Analyst, Office of Financial

What is Social n Information Engineering? n Access to computer system n Revenge 2/15/12

Linear Models EECS 442 David Fouhey Fall 2019, University of - PowerPoint PPT Presentation

(Mainly ) Linear Models EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Next Few Classes Machine Learning (ML) Crash Course I cant cover everything If you can,

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Linear Programming Linear Programming In a linear programming problem, there is a set of

Graph Databases and Neo4j Rishabh Gupta Syed Salman Abbas Baqri Nilanjan Debnath Vanshi Mishra

Terminology - NPPI &amp; PII Defined Non-public Personal Information (NPPI): Personally

Basic Concepts in Big Data ChengXiang (Cheng) Zhai

THE SUPERVISED LEARNING PROBLEM THE SUPERVISED LEARNING PROBLEM Matthieu R Bloch January 7, 2020

Money Smart Week: Debt Management / Credit Repair State of Connecticut Department of Banking

SCAMS FRAUD and ABUSE! Observations and Lessons Even the best of us can let out guard

Todays Presenters Ken McDonnell Financial Education Program Analyst, Office of Financial

What is Social n Information Engineering? n Access to computer system n Revenge 2/15/12

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Terminology - NPPI & PII Defined Non-public Personal Information (NPPI): Personally