introduction to machine learning
play

Introduction to Machine Learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1 Logistics o Course website:


  1. Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1

  2. Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2

  3. What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 3

  4. What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 4

  5. Where are we? o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 5

  6. What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Learning theory Optimization Yifeng Tao Carnegie Mellon University 6

  7. What’s more after introduction? o Supervised learning: linear models o Kernel machines: SVMs and duality o à Optimization o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o à Statistical machine learning o Neural network (basics) o Deep learning in CV and NLP o à Deep learning o Probabilistic graphical models Yifeng Tao Carnegie Mellon University 7

  8. Curriculum for an ML Master/Ph.D. student in CMU o 10701 Introduction to Machine Learning: o http://www.cs.cmu.edu/~epxing/Class/10701/ o 35705 Intermediate Statistics: o http://www.stat.cmu.edu/~larry/=stat705/ o 36708 Statistical Machine Learning: o http://www.stat.cmu.edu/~larry/=sml/ o 10725 Convex Optimization: o http://www.stat.cmu.edu/~ryantibs/convexopt/ o 10708 Probabilistic Graphical Models: o http://www.cs.cmu.edu/~epxing/Class/10708-17/ o 10707 Deep Learning: o https://deeplearning-cmu-10707.github.io/ o Books: o Bishop. Pattern Recognition and Machine Learning o Goodfellow et al. Deep learning Yifeng Tao Carnegie Mellon University 8

  9. Introduction to Machine Learning Neural network (basics) Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Maria-Florina Balcan, Russ Salakhutdinov, Matt Gormley Yifeng Tao Carnegie Mellon University 9

  10. A Recipe for Supervised Learning o 1. Given training data: o 2. Choose each of these: o Decision function o Loss function o 3. Define goal and train with SGD: o (take small steps opposite the gradient) [Slide from Matt Gormley et al. ] Yifeng Tao Carnegie Mellon University 10

  11. Logistic Regression o The prediction rule: o In this case, learning P(y|x) amounts to learning conditional probability over two Gaussian distribution. o Limitation: only simple data distribution. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 11

  12. Learning highly non-linear functions o f: X à y o f might be non-linear function o X continuous or discrete vars o y continuous or discrete vars [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 12

  13. From biological neuron networks to artificial neural networks o Signals propagate through neurons in brain. o Signals propagate through perceptrons in artificial neural network. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 13

  14. Perceptron Algorithm and SVM o Perceptron: simple learning algorithm for supervised classification analyzed via geometric margins in the 50’s [Rosenblatt’57]. o Similar to SVM, a linear classifier based on analysis of margins. o Originally introduced in the online learning scenario. o Online learning model o Its guarantees under large margins [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 14

  15. The Online Learning Algorithm o Example arrive sequentially. o We need to make a prediction. o Afterwards observe the outcome. o For i=1, 2, ..., : o Application: o Email classification o Recommendation systems o Add placement in a new market [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 15

  16. Linear Separators: Perceptron Algorithm o h(x) = w T x + w 0 , if h(x) ≥ 0, then label x as +, otherwise label it as – o Set t=1, start with the all zero vector w 1 . o Given example x, predict positive iff w tT x ≥ 0 o On a mistake, update as follows: o Mistake on positive, then update w t+1 ß w t + x o Mistake on negative, then update w t+1 ß w t – x o Natural greedy procedure: o If true label of x is +1 and w t incorrect on x we have w tT x < 0, w t+1T x ß w tT x + x T x = w tT x + ||x|| 2 , so more chance w t+1 classifies x correctly. o Similarly for mistakes on negative examples. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 16

  17. Perceptron: Example and Guarantee o Example: o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Normalized margin: multiplying all points by 100, or dividing all points by 100, doesn’t change the number of mistakes; algo is invariant to scaling. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 17

  18. � Perceptron: Proof of Mistake Bound o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Proof: o Idea : analyze 𝑥 𝑢 T 𝑥 ∗ and ǁ 𝑥 𝑢 ǁ, where 𝑥 ∗ is the max-margin sep, ǁ 𝑥 ∗ ǁ=1. o Claim 1: 𝑥 𝑢 +1T 𝑥 ∗ ≥ 𝑥 𝑢 T 𝑥 ∗ + 𝛿 . (because 𝑦 T 𝑥 ∗ ≥ 𝛿 ) o Claim 2: 𝑥 𝑢 +12 ≤ 𝑥 𝑢 2 + 𝑆 2 . (by Pythagorean Theorem) o After 𝑁 mistakes: o 𝑥 𝑁 +1T 𝑥 ∗ ≥ 𝛿𝑁 (by Claim 1) o || 𝑥 𝑁 +1 || ≤ 𝑆 𝑁 (by Claim 2) o 𝑥 𝑁 +1T 𝑥 ∗ ≤ ǁ 𝑥 𝑁 +1 ǁ (since 𝑥 ∗ is unit length) o So, 𝛿𝑁 ≤ 𝑆𝑁 , so 𝑁 ≤ (R/ 𝛿 ) 2 . [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 18

  19. Multilayer perceptron (MLP) o A simple and basic type of feedforward neural networks o Contains many perceptrons that are organized into layers o MLP “perceptrons” are not perceptrons in the strict sense [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 19

  20. Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 20

  21. Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 21

  22. Activation Function o sigmoid activation function: o Squashes the neuron’s output between 0 and 1 o Always positive o Bounded o Strictly increasing o Used in classification output layer o tanh activation function: o Squashes the neuron’s output between -1 and 1 o Bounded o Strictly increasing o A linear transformation of sigmoid function [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 22

  23. Activation Function o Rectified linear (ReLU) activation: o Bounded below by 0 (always non-negative) o Tends to produce units with sparse activities o Not upper bounded o Strictly increasing o Most widely used activation function o Advantages: o Biological plausibility o Sparse activation o Better gradient propagation: vanishing gradient in sigmoidal activation [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 23

  24. Activation Function in Alexnet o A four-layer convolutional neural network o ReLU: solid line o Tanh: dashed line [Slide from https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf ] Yifeng Tao Carnegie Mellon University 24

  25. Single Hidden Layer MLP [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 25

  26. Capacity of MLP o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 26

  27. Capacity of Neural Nets o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 27

  28. MLP with Multiple Hidden Layers [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 28

  29. Capacity of Neural Nets o Deep learning playground [Slide from https://playground.tensorflow.org] Yifeng Tao Carnegie Mellon University 29

  30. Training a Neural Network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 30

  31. Stochastic Gradient Descent [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend