CS 559: Machine Learning CS 559: Machine Learning Fundamentals and - PowerPoint PPT Presentation

1 CS 559: Machine Learning CS 559: Machine Learning Fundamentals and Applications 12 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E mail: Philippos Mordohai@stevens edu E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

Overview Overview • Deep Learning Deep Learning – Based on slides by M. Ranzato (mainly), B d lid b M R t ( i l ) S. Lazebnik, R. Fergus and Q. Zhang

Natural Neurons • Human recognition of digits H iti f di it – visual cortices – neuron interaction

Recognizing Handwritten Digits Recognizing Handwritten Digits • How to describe a digit to a computer – "a 9 has a loop at the top and a vertical stroke in a 9 has a loop at the top, and a vertical stroke in the bottom right“ – Algorithmically difficult to describe various 9s Algorithmically difficult to describe various 9s

Perceptrons Perceptrons • Perceptrons • 1950s ~ 1960s Frank Rosenblatt inspired by earlier 1950s 1960s, Frank Rosenblatt, inspired by earlier work by Warren McCulloch and Walter Pitts • Standard model of artificial neurons Standard model of artificial neurons

Binary Perceptrons Binary Perceptrons • Inputs • Multiple binary inputs Multiple binary inputs • Parameters • Thresholds & weights Th h ld & i ht • Outputs • Thresholded weighted linear combination

Layered Perceptrons Layered Perceptrons • Layered, complex model • 1 st layer 2 nd layer of 1 layer, 2 layer of perceptrons • Perceptron rule Perceptron rule • Weights, thresholds • Similarity to logical Si il it t l i l functions (NAND)

Sigmoid Neurons Sigmoid Neurons • Sigmoid neurons • Stability Stability • Small perturbation, small output change • Continuous inputs • Continuous outputs p • Soft thresholds

Output Functions Output Functions • Sigmoid neurons • Output • Output • Sigmoid vs conventional g thresholds

Smoothness & Differentiability Smoothness & Differentiability • Perturbations and Derivatives Derivatives • Continuous function • Differentiable • Differentiable • Layers • Input layers, output layers, I l l hidden layers

Layer Structure Design Layer Structure Design • Design of hidden layer • Heuristic rules Heuristic rules • Number of hidden layers vs. computational resources computational resources • Feedforward network • No loops involved No loops involved

Cost Function & Optimization Cost Function & Optimization • Learning with gradient descent • Cost function Cost function • Euclidean loss • Non negative smooth • Non ‐ negative, smooth, differentiable

Cost Function & Optimization Cost Function & Optimization • Gradient Descent • Gradient vector

Cost Function & Optimization Cost Function & Optimization • Extension to multiple dimensions • m variables m variables • Small change in variable • Small change in cost • Small change in cost

Neural Nets for Neural Nets for Computer Vision Based on Tutorials at CVPR 2012 and 2014 by Marc’Aurelio Ranzato

Building an Object Recognition System Building an Object Recognition System IDEA: Use data to optimize features for the given task given task

Building an Object Recognition System Building an Object Recognition System What we want: Use parameterized function such that a) features are computed efficiently b) features can be trained efficiently b) features can be trained efficiently

Building an Object Recognition System Building an Object Recognition System • Everything becomes adaptive • No distinction between feature extractor and classifier • Big non-linear system trained from raw pixels to labels

Building an Object Recognition System Building an Object Recognition System Q Q: How can we build such a highly non-linear system? ? A: By combining simple building blocks we can make more and more complex systems

Building a Complicated Function Building a Complicated Function • Function composition is p at the core of deep learning methods • Each “simple function” p will have parameters subject to training

Implementing a Complicated Function Implementing a Complicated Function

Intuition Behind Deep Neural Nets Intuition Behind Deep Neural Nets

Intuition Behind Deep Neural Nets Intuition Behind Deep Neural Nets Each black box can have trainable parameters Their Each black box can have trainable parameters. Their composition makes a highly non-linear system.

Intuition Behind Deep Neural Nets Intuition Behind Deep Neural Nets System produces hierarchy of features

Intuition Behind Deep Neural Nets Intuition Behind Deep Neural Nets

Key Ideas of Neural Nets Key Ideas of Neural Nets IDEA IDEA # 1 IDEA IDEA # 1 Learn features from data IDEA # IDEA # 2 Use differentiable functions that produce Use differentiable functions that produce features efficiently IDEA # 3 IDEA # E d End-to-end learning: d l i no distinction between feature extractor and classifier classifier IDEA # IDEA # 4 “Deep” architectures: cascade of simpler non linear modules cascade of simpler non-linear modules

Key Questions Key Questions • What is the input output mapping? • What is the input-output mapping? • How are parameters trained? • How are parameters trained? • How computational expensive is it? • How computational expensive is it? • How well does it work? • How well does it work?

Supervised Deep Learning Supervised Deep Learning Marc’Aurelio Ranzato

Supervised Learning Supervised Learning {(x i y i ) i=1 {(x i , y i ), i 1... P } training set P } training set x i i-th input training example y i i-th target label P number of training examples • Goal: predict the target label of unseen inputs G l di h l b l f i

Supervised Learning Examples

Supervised Deep Learning

Neural Networks Assumptions (for the next few slides): p ( ) • The input image is vectorized (disregard the spatial layout of pixels) • The target label is discrete (classification) g ( ) Question: what class of functions shall we consider to map the input into the output? to map the input into the output? Answer: composition of simpler functions. Follow-up questions: Why not a linear combination? Follow up questions: Why not a linear combination? What are the “simpler” functions? What is the interpretation? Answer: later... Answer: later...

Neural Networks: example p x input x input h 1 1-st layer hidden units h 2 2-nd layer hidden units o output o output Example of a 2 hidden layer neural network (or 4 Example of a 2 hidden layer neural network (or 4 layer network, counting also input and output)

Forward Propagation Forward Propagation Forward propagation is the process of Forward propagation is the process of computing the output of the network given its input input

Forward Propagation W 1 1 st layer weight matrix or weights b 1 1 st layer biases b 1 layer biases The non-linearity u=max(0,v) is called ReLU ReLU in the DL literature. • Each output hidden unit takes as input all the units at the • previous layer: each such layer is called “fully previous layer: each such layer is called fully fully connected fully connected connected onnected”

Rectified Linear Unit (ReLU) Rectified Linear Unit (ReLU) 38

Forward Propagation W 2 2 nd layer weight matrix or weights y g g b 2 2 nd layer biases b

Forward Propagation W 3 3 rd layer weight matrix or weights y g g b 3 3 rd layer biases b

Alternative Graphical Representations Alternative Graphical Representations

Interpretation • Question: Why can't the mapping between layers be linear? • Answer: Because composition of linear functions is a A B iti f li f ti i linear function. Neural network would reduce to (1 layer) logistic regression. • Question: What do ReLU layers accomplish? • Answer: Piece-wise linear tiling: mapping is locally linear.

Interpretation • Question: Why do we need many layers? • Answer: When input has hierarchical structure, the use of a hierarchical architecture is potentially more efficient f hi hi l hit t i t ti ll ffi i t because intermediate computations can be re-used. DL architectures are efficient also because they use distributed representations which are shared across classes.

Interpretation p 44

Interpretation • Distributed Distributed representations • Feature sharing • Feature sharing • Compositionality 45

CS 559: Machine Learning CS 559: Machine Learning Fundamentals and - PowerPoint PPT Presentation

1 CS 559: Machine Learning CS 559: Machine Learning Fundamentals and Applications 12 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E mail: Philippos Mordohai@stevens edu E-mail:

EE-559 Deep learning 1a. Introduction Fran cois Fleuret https://fleuret.org/dlc/

EE-559 Deep learning 9.3. Visualizing the processing in the input Fran cois Fleuret

EE-559 Deep learning 7. Networks for computer vision Fran cois Fleuret

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

EE-559 Deep learning 6. Going deeper Fran cois Fleuret https://fleuret.org/dlc/ [version

EE-559 Deep learning 8. Under the hood Fran cois Fleuret https://fleuret.org/dlc/

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

DATA MINING (EC 559) Dr. Dhaval Patel CSE, IIT-Roorkee General Information Instructor:

Quantitative Security Colorado State University Yashwant K Malaiya CS 559 L6: Probability &

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

CS 559: Machine Learning Fundamentals and Applications 5 th Set of Notes Instructor: Philippos

CS 559: Machine Learning Fundamentals and Applications 3 rd Set of Notes Instructor: Philippos

Constant-factor approximation algorithms for the minmax regret problem Juan Pablo Fern andez

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Linear Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Linear models for classification. Perceptron. Logistic regression. Petr Po s k P. Po

r rt tst

Resource Constrained Shortest Paths with Side Constraints and Non Linear Costs Stefano Gualandi

MA162: Finite mathematics . Jack Schmidt University of Kentucky January 11, 2012 Schedule: HW

1.4 Intercepts and Graphing The general form of a line is: where A, B, and C are integers and A

CS 559: Machine Learning CS 559: Machine Learning Fundamentals and - PowerPoint PPT Presentation

1 CS 559: Machine Learning CS 559: Machine Learning Fundamentals and Applications 12 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E mail: Philippos Mordohai@stevens edu E-mail:

EE-559 Deep learning 1a. Introduction Fran cois Fleuret https://fleuret.org/dlc/

EE-559 Deep learning 9.3. Visualizing the processing in the input Fran cois Fleuret

EE-559 Deep learning 7. Networks for computer vision Fran cois Fleuret

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

EE-559 Deep learning 6. Going deeper Fran cois Fleuret https://fleuret.org/dlc/ [version

EE-559 Deep learning 8. Under the hood Fran cois Fleuret https://fleuret.org/dlc/

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

DATA MINING (EC 559) Dr. Dhaval Patel CSE, IIT-Roorkee General Information Instructor:

Quantitative Security Colorado State University Yashwant K Malaiya CS 559 L6: Probability &amp;

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

CS 559: Machine Learning Fundamentals and Applications 5 th Set of Notes Instructor: Philippos

CS 559: Machine Learning Fundamentals and Applications 3 rd Set of Notes Instructor: Philippos

Constant-factor approximation algorithms for the minmax regret problem Juan Pablo Fern andez

Evaluation Of Post-Hoc Optimization Constraints Under Altered Cost Functions Presentation of

Linear Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale

Linear models for classification. Perceptron. Logistic regression. Petr Po s k P. Po

r rt tst

Resource Constrained Shortest Paths with Side Constraints and Non Linear Costs Stefano Gualandi

MA162: Finite mathematics . Jack Schmidt University of Kentucky January 11, 2012 Schedule: HW

1.4 Intercepts and Graphing The general form of a line is: where A, B, and C are integers and A

Quantitative Security Colorado State University Yashwant K Malaiya CS 559 L6: Probability &