lecture 1 linear regression
play

Lecture 1: Linear Regression Princeton University COS 495 - PowerPoint PPT Presentation

Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang Machine learning basics What is machine learning? A computer program is said to learn from experience E with respect to some


  1. Machine Learning Basics Lecture 1: Linear Regression Princeton University COS 495 Instructor: Yingyu Liang

  2. Machine learning basics

  3. What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning , Tom Mitchell, 1997

  4. Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

  5. Example 1: image classification Experience/Data: images with labels indoor Indoor outdoor

  6. Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes

  7. Example 1: image classification (multi-class) ImageNet figure borrowed from vision.standford.edu

  8. Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

  9. Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning

  10. Math formulation Feature vector: 𝑦 𝑗 Color Histogram Extract features Label: 𝑧 𝑗 Red Green Blue Indoor 0

  11. Math formulation Feature vector: 𝑦 𝑘 Color Histogram Extract features Label: 𝑧 𝑘 Red Green Blue outdoor 1

  12. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) using training data • s.t. 𝑔 correct on test data What kind of functions?

  13. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Hypothesis class

  14. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data Connection between training data and test data?

  15. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 They have the same distribution i.i.d.: independently identically distributed

  16. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 What kind of performance measure?

  17. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Various loss functions

  18. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] • Examples of loss functions: • 0-1 loss: 𝑚 𝑔, 𝑦, 𝑧 = 𝕁[𝑔 𝑦 ≠ 𝑧] and 𝑀 𝑔 = Pr[𝑔 𝑦 ≠ 𝑧] • 𝑚 2 loss: 𝑚 𝑔, 𝑦, 𝑧 = [𝑔 𝑦 − 𝑧] 2 and 𝑀 𝑔 = 𝔽[𝑔 𝑦 − 𝑧] 2

  19. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] How to use?

  20. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 that minimizes ෠ 𝑜 𝑜 σ 𝑗=1 𝑀 𝑔 = 𝑚(𝑔, 𝑦 𝑗 , 𝑧 𝑗 ) • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] Empirical loss

  21. Machine learning 1-2-3 • Collect data and extract features • Build model: choose hypothesis class 𝓘 and loss function 𝑚 • Optimization: minimize the empirical loss

  22. Wait… • Why handcraft the feature vectors 𝑦, 𝑧 ? • Can use prior knowledge to design suitable features • Can computer learn the features on the raw images? • Learn features directly on the raw images: Representation Learning • Deep Learning ⊆ Representation Learning ⊆ Machine Learning ⊆ Artificial Intelligence

  23. Wait… • Does MachineLearning-1-2-3 include all approaches? • Include many but not all • Our current focus will be MachineLearning-1-2-3

  24. Example: Stock Market Prediction Stock Market (Disclaimer: synthetic data/in another parallel universe) Orange MacroHard Ackermann 2013 2014 2015 2016 Sliding window over time: serve as input 𝑦 ; non-i.i.d.

  25. Linear regression

  26. 𝑧 : prostate specific antigen Real data: Prostate Cancer by Stamey et al. (1989) Figure borrowed from The Elements of Statistical Learning (𝑦 1 , … , 𝑦 8 ) : clinical measures

  27. Linear regression • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑚 2 loss; also called mean square error Hypothesis class 𝓘

  28. Linear regression: optimization • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෠ 𝑜 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 • Find 𝑔 𝑜 σ 𝑗=1 𝑀 𝑔 𝑥 = 𝑈 , 𝑧 be the vector 𝑧 1 , … , 𝑧 𝑜 𝑈 • Let 𝑌 be a matrix whose 𝑗 -th row is 𝑦 𝑗 𝑜 𝑥 = 1 𝑥 𝑈 𝑦 𝑗 − 𝑧 𝑗 2 = 1 ෠ 2 𝑀 𝑔 𝑜 ෍ 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑗=1

  29. Linear regression: optimization • Set the gradient to 0 to get the minimizer 1 2 = 0 𝑥 ෠ 𝛼 𝑀 𝑔 𝑥 = 𝛼 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 𝑥 𝑥 [ 𝑌𝑥 − 𝑧 𝑈 (𝑌𝑥 − 𝑧)] = 0 𝛼 𝑥 [ 𝑥 𝑈 𝑌 𝑈 𝑌𝑥 − 2𝑥 𝑈 𝑌 𝑈 𝑧 + 𝑧 𝑈 𝑧] = 0 𝛼 2𝑌 𝑈 𝑌𝑥 − 2𝑌 𝑈 𝑧 = 0 w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

  30. Linear regression: optimization • Algebraic view of the minimizer • If 𝑌 is invertible, just solve 𝑌𝑥 = 𝑧 and get 𝑥 = 𝑌 −1 𝑧 • But typically 𝑌 is a tall matrix 𝑥 = 𝑥 = 𝑌 𝑈 𝑌 𝑌 𝑈 𝑧 𝑌 𝑧 Normal equation: w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧

  31. Linear regression with bias Bias term • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 to minimize the loss • Find 𝑔 • Reduce to the case without bias: • Let 𝑥 ′ = 𝑥; 𝑐 , 𝑦 ′ = 𝑦; 1 𝑥,𝑐 𝑦 = 𝑥 𝑈 𝑦 + 𝑐 = 𝑥 ′ 𝑈 (𝑦 ′ ) • Then 𝑔

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend