Neural Networks with Googles TensorFlow Shuo Zhang Computational - - PowerPoint PPT Presentation

neural networks with google s tensorflow
SMART_READER_LITE
LIVE PREVIEW

Neural Networks with Googles TensorFlow Shuo Zhang Computational - - PowerPoint PPT Presentation

Neural Networks with Googles TensorFlow Shuo Zhang Computational discourse analysis 11/22/16 Overview 1. Neural Networks basics 2. Neural Networks specifics 3. Neural Networks with Googes TensorFlow 4. Coreference: Singleton


slide-1
SLIDE 1

Neural Networks with Google’s TensorFlow

Shuo Zhang Computational discourse analysis 11/22/16

slide-2
SLIDE 2

Overview

  • 1. Neural Networks basics
  • 2. Neural Networks specifics
  • 3. Neural Networks with Googe’s TensorFlow
  • 4. Coreference: Singleton classification example
slide-3
SLIDE 3

Resources

  • Deep learning course (Google) @ Udacity
  • Machine learning course (Stanford, Andrew Ng) @ coursera
  • Neural Network course (Geoffrey Hinton) @ coursera
slide-4
SLIDE 4
  • 1. NN basics
slide-5
SLIDE 5

From linear to non-linear classifier

slide-6
SLIDE 6

Pros and cons of linear models

Pros:

  • Fast
  • Numerically stable
  • Derivative is constant

Cons:

  • Limited to modeling

additive features

  • Multiplicative or higher
  • rder features leas to

huge parameter space, not suitable for non- linear mapping Conclusion: We want to use parameters within linear functions but able to efficiently do non-linear mapping.

slide-7
SLIDE 7

From logistic regression to neural networks

slide-8
SLIDE 8

Inserting a non-linear layer: Rectified Linear Unit(ReLU)

slide-9
SLIDE 9

Intuition: how NN makes non-linear mapping possible

slide-10
SLIDE 10

Type of neural network

  • Feed forward
  • Feedback
  • Self Organizing Map(SOM)
  • ..
slide-11
SLIDE 11
  • 2. NN specifics
slide-12
SLIDE 12

Multinomial logistic regression as the basic unit in NN

slide-13
SLIDE 13

Softmax – turn outputs of linear functions into probability vectors

slide-14
SLIDE 14

One-hot encoding

slide-15
SLIDE 15

Cross entropy – measuring similarity between prediction and gold label

slide-16
SLIDE 16

Putting it together again

slide-17
SLIDE 17

MLR to NN

slide-18
SLIDE 18

ReLU – a non-linear activation function to put in the hidden layer

ReLU is one of many choices of a non-linear activation function.

https://en.wikipedia.org/wiki/ Activation_function

slide-19
SLIDE 19

Training a neural network

  • Basically similar to training

a linear model by optimizing a convex function using a method like gradient descent

  • Example cost function for

logistic based activation

slide-20
SLIDE 20

Cost function – this is universal for linear classifier or NN

  • Cost function is a function of the

parameters that captures the difference between predicted and gold label, therefore we want to minimize it.

  • How to minimize? Using gradient

descent, at each iteration, adjust the weights.

  • How to adjust weights? Subtracting

gradient (derivative) will move you toward the minimum.

slide-21
SLIDE 21

Gradient descent

  • Keep in mind that W is a matrix, so we

need to compute partial derivative with respect to each element of W, and sum them up.

slide-22
SLIDE 22

Gradient Descent flavors

  • Batch GD: classic approach, summing over derivative for all training examples

at each iteration in order to perform one update to weights, very slow, but more stable, almost never used today

  • Stochastic GD: only takes one example at each iteration and use the gradient

computed from that example to adjust weights, fast, but less stable behavior

  • Mini-batch GD: (in between) takes a mini-batch of examples (such as from 100

to 2000) and sum up those terms derivatives to perform update, balance between stability and speed (also good results), most used today

slide-23
SLIDE 23

Neural Network training: forward backward propagation

Intuition from linear classifier: Repeat:

  • Compute an output
  • Compute error
  • Adjust weights

(my implementation in Octave)

slide-24
SLIDE 24
  • 3. Neural Networks with Googe’s TensorFlow

https://www.youtube.com/watch?v=oZikw5k_2FM

slide-25
SLIDE 25

Setup

https://www.tensorflow.org/versions/r0.11/get_started/os_setup.html

slide-26
SLIDE 26

Get started

slide-27
SLIDE 27

Hyper parameter tuning (loss curve)

  • Number of hidden nodes
  • Learning rate
  • Batch size
  • Number of steps
  • Overfitting
slide-28
SLIDE 28
slide-29
SLIDE 29

Google Udacity course example:notMNIST

slide-30
SLIDE 30

Example code for notMNIST dataset (Udacity)

  • https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/

udacity (This set of ipython notebook is not only partial implementation, since it is meant to be an assignment to be completed. To view a complete implementation, refer to the .ipynb and html files I uploaded on the corpling server).