Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1
Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2
What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 3
What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 4
Where are we? o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 5
What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Learning theory Optimization Yifeng Tao Carnegie Mellon University 6
What’s more after introduction? o Supervised learning: linear models o Kernel machines: SVMs and duality o à Optimization o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o à Statistical machine learning o Neural network (basics) o Deep learning in CV and NLP o à Deep learning o Probabilistic graphical models Yifeng Tao Carnegie Mellon University 7
Curriculum for an ML Master/Ph.D. student in CMU o 10701 Introduction to Machine Learning: o http://www.cs.cmu.edu/~epxing/Class/10701/ o 35705 Intermediate Statistics: o http://www.stat.cmu.edu/~larry/=stat705/ o 36708 Statistical Machine Learning: o http://www.stat.cmu.edu/~larry/=sml/ o 10725 Convex Optimization: o http://www.stat.cmu.edu/~ryantibs/convexopt/ o 10708 Probabilistic Graphical Models: o http://www.cs.cmu.edu/~epxing/Class/10708-17/ o 10707 Deep Learning: o https://deeplearning-cmu-10707.github.io/ o Books: o Bishop. Pattern Recognition and Machine Learning o Goodfellow et al. Deep learning Yifeng Tao Carnegie Mellon University 8
Introduction to Machine Learning Neural network (basics) Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Maria-Florina Balcan, Russ Salakhutdinov, Matt Gormley Yifeng Tao Carnegie Mellon University 9
A Recipe for Supervised Learning o 1. Given training data: o 2. Choose each of these: o Decision function o Loss function o 3. Define goal and train with SGD: o (take small steps opposite the gradient) [Slide from Matt Gormley et al. ] Yifeng Tao Carnegie Mellon University 10
Logistic Regression o The prediction rule: o In this case, learning P(y|x) amounts to learning conditional probability over two Gaussian distribution. o Limitation: only simple data distribution. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 11
Learning highly non-linear functions o f: X à y o f might be non-linear function o X continuous or discrete vars o y continuous or discrete vars [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 12
From biological neuron networks to artificial neural networks o Signals propagate through neurons in brain. o Signals propagate through perceptrons in artificial neural network. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 13
Perceptron Algorithm and SVM o Perceptron: simple learning algorithm for supervised classification analyzed via geometric margins in the 50’s [Rosenblatt’57]. o Similar to SVM, a linear classifier based on analysis of margins. o Originally introduced in the online learning scenario. o Online learning model o Its guarantees under large margins [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 14
The Online Learning Algorithm o Example arrive sequentially. o We need to make a prediction. o Afterwards observe the outcome. o For i=1, 2, ..., : o Application: o Email classification o Recommendation systems o Add placement in a new market [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 15
Linear Separators: Perceptron Algorithm o h(x) = w T x + w 0 , if h(x) ≥ 0, then label x as +, otherwise label it as – o Set t=1, start with the all zero vector w 1 . o Given example x, predict positive iff w tT x ≥ 0 o On a mistake, update as follows: o Mistake on positive, then update w t+1 ß w t + x o Mistake on negative, then update w t+1 ß w t – x o Natural greedy procedure: o If true label of x is +1 and w t incorrect on x we have w tT x < 0, w t+1T x ß w tT x + x T x = w tT x + ||x|| 2 , so more chance w t+1 classifies x correctly. o Similarly for mistakes on negative examples. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 16
Perceptron: Example and Guarantee o Example: o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Normalized margin: multiplying all points by 100, or dividing all points by 100, doesn’t change the number of mistakes; algo is invariant to scaling. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 17
� Perceptron: Proof of Mistake Bound o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Proof: o Idea : analyze 𝑥 𝑢 T 𝑥 ∗ and ǁ 𝑥 𝑢 ǁ, where 𝑥 ∗ is the max-margin sep, ǁ 𝑥 ∗ ǁ=1. o Claim 1: 𝑥 𝑢 +1T 𝑥 ∗ ≥ 𝑥 𝑢 T 𝑥 ∗ + 𝛿 . (because 𝑦 T 𝑥 ∗ ≥ 𝛿 ) o Claim 2: 𝑥 𝑢 +12 ≤ 𝑥 𝑢 2 + 𝑆 2 . (by Pythagorean Theorem) o After 𝑁 mistakes: o 𝑥 𝑁 +1T 𝑥 ∗ ≥ 𝛿𝑁 (by Claim 1) o || 𝑥 𝑁 +1 || ≤ 𝑆 𝑁 (by Claim 2) o 𝑥 𝑁 +1T 𝑥 ∗ ≤ ǁ 𝑥 𝑁 +1 ǁ (since 𝑥 ∗ is unit length) o So, 𝛿𝑁 ≤ 𝑆𝑁 , so 𝑁 ≤ (R/ 𝛿 ) 2 . [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 18
Multilayer perceptron (MLP) o A simple and basic type of feedforward neural networks o Contains many perceptrons that are organized into layers o MLP “perceptrons” are not perceptrons in the strict sense [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 19
Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 20
Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 21
Activation Function o sigmoid activation function: o Squashes the neuron’s output between 0 and 1 o Always positive o Bounded o Strictly increasing o Used in classification output layer o tanh activation function: o Squashes the neuron’s output between -1 and 1 o Bounded o Strictly increasing o A linear transformation of sigmoid function [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 22
Activation Function o Rectified linear (ReLU) activation: o Bounded below by 0 (always non-negative) o Tends to produce units with sparse activities o Not upper bounded o Strictly increasing o Most widely used activation function o Advantages: o Biological plausibility o Sparse activation o Better gradient propagation: vanishing gradient in sigmoidal activation [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 23
Activation Function in Alexnet o A four-layer convolutional neural network o ReLU: solid line o Tanh: dashed line [Slide from https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf ] Yifeng Tao Carnegie Mellon University 24
Single Hidden Layer MLP [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 25
Capacity of MLP o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 26
Capacity of Neural Nets o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 27
MLP with Multiple Hidden Layers [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 28
Capacity of Neural Nets o Deep learning playground [Slide from https://playground.tensorflow.org] Yifeng Tao Carnegie Mellon University 29
Training a Neural Network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 30
Stochastic Gradient Descent [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 31
Recommend
More recommend