Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020 - - PowerPoint PPT Presentation

deep learning theory and practice
SMART_READER_LITE
LIVE PREVIEW

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020 - - PowerPoint PPT Presentation

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020 http://leap.ee.iisc.ac.in/sriram/teaching/DL20/ deeplearning.cce2020@gmail.com Logistic Regression 2- class logistic regression Maximum likelihood solution K-class


slide-1
SLIDE 1

Deep Learning - Theory and Practice

12-03-2020

Deep Neural Networks

http://leap.ee.iisc.ac.in/sriram/teaching/DL20/ deeplearning.cce2020@gmail.com

slide-2
SLIDE 2

Logistic Regression

Bishop - PRML book (Chap 3)

❖ 2- class logistic regression ❖ K-class logistic regression ❖ Maximum likelihood solution ❖ Maximum likelihood solution

slide-3
SLIDE 3

Typical Error Surfaces

Typical Error Surface as a function of parameters (weights and biases)

slide-4
SLIDE 4

Learning with Gradient Descent

Error surface close to a local

slide-5
SLIDE 5

Learning Using Gradient Descent

slide-6
SLIDE 6

Parameter Learning

  • Solving a non-convex
  • ptimization.
  • Iterative solution.
  • Depends on the initialization.
  • Convergence to a local
  • ptima.
  • Judicious choice of learning

rate

slide-7
SLIDE 7

Least Squares versus Logistic Regression

Bishop - PRML book (Chap 4)

slide-8
SLIDE 8

Least Squares versus Logistic Regression

Bishop - PRML book (Chap 4)

slide-9
SLIDE 9

Neural Networks

slide-10
SLIDE 10

Perceptron Algorithm

What if the data is not linearly separable Perceptron Model [McCulloch, 1943, Rosenblatt, 1957] Targets are binary classes [-1,1] Similar to the logistic regression

slide-11
SLIDE 11

Multi-layer Perceptron

Multi-layer Perceptron [Hopfield, 1982] thresholding function non-linear function (tanh,sigmoid)

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Neural Networks

Multi-layer Perceptron [Hopfield, 1982] thresholding function non-linear function (tanh,sigmoid)

  • Useful for classifying non-linear data boundaries -

non-linear class separation can be realized given enough data.

slide-18
SLIDE 18

Neural Networks

tanh sigmoid ReLu Cost-Function are the desired outputs Mean Square Error Cross Entropy Types of Non-linearities

slide-19
SLIDE 19

Learning Posterior Probabilities with NNs

Choice of target function

  • Softmax function for classification
  • Softmax produces positive values that sum to 1
  • Allows the interpretation of outputs as posterior

probabilities

slide-20
SLIDE 20

Need For Deep Networks

Modeling complex real world data like speech, image, text

  • Single hidden layer networks are too restrictive.
  • Needs large number of units in the hidden layer and

trained with large amounts of data.

  • Not generalizable enough.

Networks with multiple hidden layers - deep networks (Open questions till 2005)

  • Are these networks trainable ?
  • How can we initialize such networks ?
  • Will these generalize well or over train ?
slide-21
SLIDE 21

Deep Networks Intuition

Neural networks with multiple hidden layers - Deep networks [Hinton, 2006]

slide-22
SLIDE 22

Neural networks with multiple hidden layers - Deep networks

Deep Networks Intuition

slide-23
SLIDE 23

Neural networks with multiple hidden layers - Deep networks Deep networks perform hierarchical data abstractions which enable the non-linear separation of complex data samples.

Deep Networks Intuition

slide-24
SLIDE 24

Deep Networks

  • Are these networks trainable ?
  • Advances in computation and processing
  • Graphical processing units (GPUs) performing multiple

parallel multiply accumulate operations.

  • Large amounts of supervised data sets