Introduction to Machine Learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1

Logistics o Course website: http://www.cs.cmu.edu/~yifengt/courses/machine-learning Slides uploaded after lecture o Time: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion o Contact: yifengt@cs.cmu.edu Yifeng Tao Carnegie Mellon University 2

What is machine learning? o What are we talking when we talk about AI and ML? Machine learning Deep learning Artificial intelligence Yifeng Tao Carnegie Mellon University 3

What is machine learning Natural language Computational Computer vision processing Biology Machine Learning Probability Statistics Calculus Linear algebra Yifeng Tao Carnegie Mellon University 4

Where are we? o Supervised learning: linear models o Kernel machines: SVMs and duality o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o Neural network (basics) o Deep learning in CV and NLP o Probabilistic graphical models o Reinforcement learning and its application in clinical text mining o Attention mechanism and transfer learning in precision medicine Yifeng Tao Carnegie Mellon University 5

What’s more after introduction? Probabilistic graphical models Deep learning Machine learning Learning theory Optimization Yifeng Tao Carnegie Mellon University 6

What’s more after introduction? o Supervised learning: linear models o Kernel machines: SVMs and duality o à Optimization o Unsupervised learning: latent space analysis and clustering o Supervised learning: decision tree, kNN and model selection o Learning theory: generalization and VC dimension o à Statistical machine learning o Neural network (basics) o Deep learning in CV and NLP o à Deep learning o Probabilistic graphical models Yifeng Tao Carnegie Mellon University 7

Curriculum for an ML Master/Ph.D. student in CMU o 10701 Introduction to Machine Learning: o http://www.cs.cmu.edu/~epxing/Class/10701/ o 35705 Intermediate Statistics: o http://www.stat.cmu.edu/~larry/=stat705/ o 36708 Statistical Machine Learning: o http://www.stat.cmu.edu/~larry/=sml/ o 10725 Convex Optimization: o http://www.stat.cmu.edu/~ryantibs/convexopt/ o 10708 Probabilistic Graphical Models: o http://www.cs.cmu.edu/~epxing/Class/10708-17/ o 10707 Deep Learning: o https://deeplearning-cmu-10707.github.io/ o Books: o Bishop. Pattern Recognition and Machine Learning o Goodfellow et al. Deep learning Yifeng Tao Carnegie Mellon University 8

Introduction to Machine Learning Neural network (basics) Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Maria-Florina Balcan, Russ Salakhutdinov, Matt Gormley Yifeng Tao Carnegie Mellon University 9

A Recipe for Supervised Learning o 1. Given training data: o 2. Choose each of these: o Decision function o Loss function o 3. Define goal and train with SGD: o (take small steps opposite the gradient) [Slide from Matt Gormley et al. ] Yifeng Tao Carnegie Mellon University 10

Logistic Regression o The prediction rule: o In this case, learning P(y|x) amounts to learning conditional probability over two Gaussian distribution. o Limitation: only simple data distribution. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 11

Learning highly non-linear functions o f: X à y o f might be non-linear function o X continuous or discrete vars o y continuous or discrete vars [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 12

From biological neuron networks to artificial neural networks o Signals propagate through neurons in brain. o Signals propagate through perceptrons in artificial neural network. [Slide from Eric Xing et al. ] Yifeng Tao Carnegie Mellon University 13

Perceptron Algorithm and SVM o Perceptron: simple learning algorithm for supervised classification analyzed via geometric margins in the 50’s [Rosenblatt’57]. o Similar to SVM, a linear classifier based on analysis of margins. o Originally introduced in the online learning scenario. o Online learning model o Its guarantees under large margins [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 14

The Online Learning Algorithm o Example arrive sequentially. o We need to make a prediction. o Afterwards observe the outcome. o For i=1, 2, ..., : o Application: o Email classification o Recommendation systems o Add placement in a new market [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 15

Linear Separators: Perceptron Algorithm o h(x) = w T x + w 0 , if h(x) ≥ 0, then label x as +, otherwise label it as – o Set t=1, start with the all zero vector w 1 . o Given example x, predict positive iff w tT x ≥ 0 o On a mistake, update as follows: o Mistake on positive, then update w t+1 ß w t + x o Mistake on negative, then update w t+1 ß w t – x o Natural greedy procedure: o If true label of x is +1 and w t incorrect on x we have w tT x < 0, w t+1T x ß w tT x + x T x = w tT x + ||x|| 2 , so more chance w t+1 classifies x correctly. o Similarly for mistakes on negative examples. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 16

Perceptron: Example and Guarantee o Example: o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Normalized margin: multiplying all points by 100, or dividing all points by 100, doesn’t change the number of mistakes; algo is invariant to scaling. [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 17

� Perceptron: Proof of Mistake Bound o Guarantee : If data has margin 𝛿 and all points inside a ball of radius 𝑆 , then Perceptron makes ≤ ( 𝑆 / 𝛿 ) 2 mistakes. o Proof: o Idea : analyze 𝑥 𝑢 T 𝑥 ∗ and ǁ 𝑥 𝑢 ǁ, where 𝑥 ∗ is the max-margin sep, ǁ 𝑥 ∗ ǁ=1. o Claim 1: 𝑥 𝑢 +1T 𝑥 ∗ ≥ 𝑥 𝑢 T 𝑥 ∗ + 𝛿 . (because 𝑦 T 𝑥 ∗ ≥ 𝛿 ) o Claim 2: 𝑥 𝑢 +12 ≤ 𝑥 𝑢 2 + 𝑆 2 . (by Pythagorean Theorem) o After 𝑁 mistakes: o 𝑥 𝑁 +1T 𝑥 ∗ ≥ 𝛿𝑁 (by Claim 1) o || 𝑥 𝑁 +1 || ≤ 𝑆 𝑁 (by Claim 2) o 𝑥 𝑁 +1T 𝑥 ∗ ≤ ǁ 𝑥 𝑁 +1 ǁ (since 𝑥 ∗ is unit length) o So, 𝛿𝑁 ≤ 𝑆𝑁 , so 𝑁 ≤ (R/ 𝛿 ) 2 . [Slide from Maria-Florina Balcan et al. ] Yifeng Tao Carnegie Mellon University 18

Multilayer perceptron (MLP) o A simple and basic type of feedforward neural networks o Contains many perceptrons that are organized into layers o MLP “perceptrons” are not perceptrons in the strict sense [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 19

Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 20

Artificial Neuron (Perceptron) [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 21

Activation Function o sigmoid activation function: o Squashes the neuron’s output between 0 and 1 o Always positive o Bounded o Strictly increasing o Used in classification output layer o tanh activation function: o Squashes the neuron’s output between -1 and 1 o Bounded o Strictly increasing o A linear transformation of sigmoid function [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 22

Activation Function o Rectified linear (ReLU) activation: o Bounded below by 0 (always non-negative) o Tends to produce units with sparse activities o Not upper bounded o Strictly increasing o Most widely used activation function o Advantages: o Biological plausibility o Sparse activation o Better gradient propagation: vanishing gradient in sigmoidal activation [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 23

Activation Function in Alexnet o A four-layer convolutional neural network o ReLU: solid line o Tanh: dashed line [Slide from https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf ] Yifeng Tao Carnegie Mellon University 24

Single Hidden Layer MLP [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 25

Capacity of MLP o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 26

Capacity of Neural Nets o Consider a single layer neural network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 27

MLP with Multiple Hidden Layers [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 28

Capacity of Neural Nets o Deep learning playground [Slide from https://playground.tensorflow.org] Yifeng Tao Carnegie Mellon University 29

Training a Neural Network [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 30

Stochastic Gradient Descent [Slide from Russ Salakhutdinov et al. ] Yifeng Tao Carnegie Mellon University 31

Introduction to Machine Learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1 Logistics o Course website:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

CS-527 Software Security Introduction Asst. Prof. Mathias Payer Department of Computer Science

Software Engineering Designing, building and maintaining large software systems CS 422

Environmental Laboratory Ethics: Then and Now Dorothy Love Eurofins Environment Testing

for sustainability & green strategies Presented by Dimitrios Marinos, IRCs Working Group

Graph U-Nets Hongyang Gao and Shuiwang Ji Texas A&M University Graph U-Nets - Department of

CS101 Lecture 17: Networking Circuit Switching Packet Switching Aaron Stevens (azs@bu.edu) 4

Network Data Repository for Researchers Saleem Bhatti, Tristan Henderson, Martin Bateman School

Ch 9: Networks Papers: Sets, Stenomaps, TopoFisheye Tamara Munzner Department of Computer

Introduction to Machine Learning Yifeng Tao School of Computer - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction to Machine Learning Yifeng Tao School of Computer Science Carnegie Mellon University Yifeng Tao Carnegie Mellon University 1 Logistics o Course website:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

CS-527 Software Security Introduction Asst. Prof. Mathias Payer Department of Computer Science

Software Engineering Designing, building and maintaining large software systems CS 422

Environmental Laboratory Ethics: Then and Now Dorothy Love Eurofins Environment Testing

for sustainability &amp; green strategies Presented by Dimitrios Marinos, IRCs Working Group

Graph U-Nets Hongyang Gao and Shuiwang Ji Texas A&amp;M University Graph U-Nets - Department of

CS101 Lecture 17: Networking Circuit Switching Packet Switching Aaron Stevens (azs@bu.edu) 4

Network Data Repository for Researchers Saleem Bhatti, Tristan Henderson, Martin Bateman School

Ch 9: Networks Papers: Sets, Stenomaps, TopoFisheye Tamara Munzner Department of Computer

for sustainability & green strategies Presented by Dimitrios Marinos, IRCs Working Group

Graph U-Nets Hongyang Gao and Shuiwang Ji Texas A&M University Graph U-Nets - Department of