Lecture 18: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 18: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

Outline Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) neuron input output node X Y CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) neuron input output node Affine transformation 𝑍 = 𝑔(ℎ) Activation Y X We will talk later about the choice of activation function. So far we have only talked about sigmoid as an activation function but there are other choices. CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer output layer / 𝑌 = 𝑋 𝑨 * = 𝑋 ** 𝑌 * + 𝑋 *+ 𝑌 + + 𝑋 * *1 ℎ * = 𝑔(𝑨 * ) 𝑌 * 𝑋 * ,, 𝑍) , = 𝑕(ℎ * , ℎ + ) , 𝐾 = ℒ(𝑍 𝑍 𝑍 Output function Loss function 𝑋 𝑌 + + / 𝑌 = 𝑋 𝑨 + = 𝑋 +* 𝑌 * + 𝑋 ++ 𝑌 + + 𝑋 +1 + h + = 𝑔(𝑨 + ) We will talk later about the choice of the output layer and the loss function. So far we consider sigmoid as the output and log-bernouli. CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer 1 hidden layer 2 output layer 𝑌 * 𝑋 𝑋 ** +* 𝑍 𝑋 𝑋 𝑌 + *+ ++ CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1 output layer … 𝑌 * 𝑋 𝑋 ** 8* 𝑍 … 𝑋 𝑋 𝑌 + *+ 8+ We will talk later about the choice of the number of layers. CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer 3 nodes 3 nodes 𝑋 8* 𝑋 ** 𝑌 * 𝑍 … 𝑋 𝑋 8+ *+ 𝑌 + 𝑋 𝑋 8: *: CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer m nodes 𝑋 8* 𝑋 ** 𝑌 * m nodes 𝑍 … … … 𝑌 + 𝑋 𝑋 8; *; We will talk later about the choice of the number of nodes. CS109A, P ROTOPAPAS , R ADER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer m nodes Number of inputs d 𝑋 8* 𝑋 ** 𝑌 * m nodes 𝑍 … … … 𝑌 + 𝑋 𝑋 8; *; Number of inputs is specified by the data CS109A, P ROTOPAPAS , R ADER

Why layers? Representation Representation matters! CS109A, P ROTOPAPAS , R ADER

Learning Multiple Components CS109A, P ROTOPAPAS , R ADER

Depth = Repeated Compositions CS109A, P ROTOPAPAS , R ADER

Neural Networks Hand-written digit recognition: MNIST data CS109A, P ROTOPAPAS , R ADER

Depth = Repeated Compositions CS109A, P ROTOPAPAS , R ADER

Activation function ℎ = 𝑔(𝑋 / 𝑌 + 𝑐) The activation function should: Ensures not linearity • • Ensure gradients remain large through hidden unit Common choices are • Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut • • softplus tanh • • swish CS109A, P ROTOPAPAS , R ADER

Beyond Linear Models Linear models: • Can be fit efficiently (via convex optimization) • Limited model capacity Alternative: f ( x ) = w T φ ( x ) Where 𝜚 is a non-linear transform CS109A, P ROTOPAPAS , R ADER

Traditional ML Manually engineer 𝜚 • Domain specific, enormous human effort Generic transform • Maps to a higher-dimensional space • Kernel methods: e.g. RBF kernels • Over fitting: does not generalize well to test set • Cannot encode enough prior information CS109A, P ROTOPAPAS , R ADER

Deep Learning Directly learn 𝜚 • 𝑔 𝑦; 𝜄 = 𝑋 / 𝜚(𝑦; 𝜄) where 𝜄 are parameters of the transform • 𝜚 defines hidden layers • Non-convex optimization • • Can encode prior beliefs, generalizes well CS109A, P ROTOPAPAS , R ADER

Activation function ℎ = 𝑔(𝑋 / 𝑌 + 𝑐) The activation function should: Ensures not linearity • • Ensure gradients remain large through hidden unit Common choices are • Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut • • softplus tanh • • swish CS109A, P ROTOPAPAS , R ADER

CS109A, P ROTOPAPAS , R ADER

ReLU and Softplus -b/W CS109A, P ROTOPAPAS , R ADER

Generalized ReLU Generalization: For 𝛽 H > 0 𝑕 𝑦 H , 𝛽 = max 𝑏, 𝑦 H + 𝛽 min{0, 𝑦 H } CS109A, P ROTOPAPAS , R ADER

Maxout Max of k linear functions. Directly learn the activation function. 𝑕(𝑦) = max H∈{*,…,S} 𝛽 H 𝑦 H + 𝛾 CS109A, P ROTOPAPAS , R ADER

Swish: A Self-Gated Activation Function Currently, the most successful and widely-used activation function is the ReLU. Swish tends to work better than ReLU on deeper models across a number of challenging datasets. 𝑕 𝑦 = 𝑦 𝜏(𝑦) CS109A, P ROTOPAPAS , R ADER

Loss Function Cross-entropy between training data and model distribution (i.e. negative log-likelihood) 𝐾 𝑋 = −𝔽 X,Y~[ \ ]^_^ log 𝑞 defgh (y|x) Do not need to design separate loss functions. Gradient of cost function must be large enough CS109A, P ROTOPAPAS , R ADER

Loss Function Example: sigmoid output + squared loss + \ + = y − 𝜏 𝑦 𝑀 jk = 𝑧 − 𝑧 Flat surfaces CS109A, P ROTOPAPAS , R ADER

Cost Function Example: sigmoid output + cross-entropy loss 𝑀 no 𝑧, 𝑧 \ = −{ 𝑧 log 𝑧 \ + 1 − 𝑧 log(1 − 𝑧 \)} CS109A, P ROTOPAPAS , R ADER

Design Choices Activation function Loss function Output units Architecture Optimizer CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary CS109A, P ROTOPAPAS , R ADER

Link function 1 𝑌 ⟹ 𝜚 𝑌 = 𝑋 / 𝑌 ⟹ 𝑄 𝑧 = 0 = 1 + 𝑓 u(v) , = P(y = 0) X 𝑍 OUTPUT UNIT , = P(y = 0) X 𝜏(𝜚) 𝑍 𝜚 𝑌 CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary Bernoulli Sigmoid Binary Cross Entropy CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary Bernoulli Sigmoid Binary Cross Entropy Discrete CS109A, P ROTOPAPAS , R ADER

Link function multi-class problem , X 𝑍 OUTPUT UNIT 𝑓 u { v , = 𝑍 X SoftMax 𝜚(𝑌) } ∑ 𝑓 u { v S~* CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary Bernoulli Sigmoid Binary Cross Entropy Discrete Multinoulli Softmax Cross Entropy CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary Bernoulli Sigmoid Binary Cross Entropy Discrete Multinoulli Softmax Cross Entropy Continuous Gaussian Linear MSE CS109A, P ROTOPAPAS , R ADER

Output Units Output Type Output Distribution Output layer Cost Function Binary Bernoulli Sigmoid Binary Cross Entropy Discrete Multinoulli Softmax Cross Entropy Continuous Gaussian Linear MSE Continuous Arbitrary - GANS CS109A, P ROTOPAPAS , R ADER

Design Choices Activation function Loss function Output units Architecture Optimizer CS109A, P ROTOPAPAS , R ADER

NN in action CS109A, P ROTOPAPAS , R ADER

NN in action … CS109A, P ROTOPAPAS , R ADER

NN in action CS109A, P ROTOPAPAS , R ADER

Universal Approximation Theorem Think of a Neural Network as function approximation. 𝑍 = 𝑔 𝑦 + 𝜗 €(𝑦) + 𝜗 𝑍 = 𝑔 €(𝑦) NN: ⟹ 𝑔 𝑋 𝑋 : • One hidden layer is enough to represent an depth approximation of any function to an arbitrary degree of accuracy 𝑋 𝑋 So why deeper? * + • Shallow net may need (exponentially) more width • Shallow net may overfit more width CS109A, P ROTOPAPAS , R ADER

Better Generalization with Depth (Goodfellow 2017) CS109A, P ROTOPAPAS , R ADER

Large, Shallow Nets Overfit More (Goodfellow 2017) CS109A, P ROTOPAPAS , R ADER

Lecture 18: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 18: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader Outline Anatomy of a NN Design choices Activation function Loss function Output units Architecture CS109A, P ROTOPAPAS , R ADER

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

CS 241: Systems Programming Lecture 3. More Shell Spring 2020 Prof. Stephen Checkoway 1 Anatomy

Anatomy and Kinesiology of the Shoulder Girdle Lesson Plan: 40a Anatomy and Kinesiology of the

Equine Dentistry Equine Dentistry The importance of proper equine dental care The importance of

Anatomy and Kinesiology of the Pelvic Girdle Lesson Plan: 44a Anatomy and Kinesiology of the

ZOOL 2003 - Human Anatomy and Physiology I Course Instructor: Course Instructor: Dr.

Anatomy Anatomy of a of a Business Plan Business Plan The Step-by-Step Guide The

The Respiratory System Respiratory Anatomy Upper respiratory tract Nose Nasal

Scope of this talk Revise the anatomy and physiology that you learnt in pre-clinical years

Vessel Histology Vessel Anatomy Valves and Heart Sounds Echocardiography Heart Anatomy Success

Basic Anatomy Anatomy: General Planes of the body

Section 38: Muscle - Anatomy Section 38: Muscle - Anatomy 38-1 Muscle types: cardiac

The Anatomy of a Basic Will Presented to Brain Injury Society of Toronto May 20, 2020 Presenter

Presentation skills Standard modules and a sample program Alexei Kapterev June 2018 1. Wi y

Anatomy and Physiology Honey bees 3 segments Exoskeleton Bees have a hard outer covering call

Muskie Regulation Focus Group Meeting Chris Penne Utah Division of Wildlife Resources

Lecture 19: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and

CV Grad Requirements 1. Must have a minimum of 230 Credits. 2. Must pass all required

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Disclosure & Acknowledgement No conflict of interest UI Council of Teaching / Instructional

Lecture 18: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 18: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader Outline Anatomy of a NN Design choices Activation function Loss function Output units Architecture CS109A, P ROTOPAPAS , R ADER

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme &amp; the Human Anatomy Unit Revised

Fish Anatomy &amp; Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy &amp; Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

CS 241: Systems Programming Lecture 3. More Shell Spring 2020 Prof. Stephen Checkoway 1 Anatomy

Anatomy and Kinesiology of the Shoulder Girdle Lesson Plan: 40a Anatomy and Kinesiology of the

Equine Dentistry Equine Dentistry The importance of proper equine dental care The importance of

Anatomy and Kinesiology of the Pelvic Girdle Lesson Plan: 44a Anatomy and Kinesiology of the

ZOOL 2003 - Human Anatomy and Physiology I Course Instructor: Course Instructor: Dr.

Anatomy Anatomy of a of a Business Plan Business Plan The Step-by-Step Guide The

The Respiratory System Respiratory Anatomy Upper respiratory tract Nose Nasal

Scope of this talk Revise the anatomy and physiology that you learnt in pre-clinical years

Vessel Histology Vessel Anatomy Valves and Heart Sounds Echocardiography Heart Anatomy Success

Basic Anatomy Anatomy: General Planes of the body

Section 38: Muscle - Anatomy Section 38: Muscle - Anatomy 38-1 Muscle types: cardiac

The Anatomy of a Basic Will Presented to Brain Injury Society of Toronto May 20, 2020 Presenter

Presentation skills Standard modules and a sample program Alexei Kapterev June 2018 1. Wi y

Anatomy and Physiology Honey bees 3 segments Exoskeleton Bees have a hard outer covering call

Muskie Regulation Focus Group Meeting Chris Penne Utah Division of Wildlife Resources

Lecture 19: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and

CV Grad Requirements 1. Must have a minimum of 230 Credits. 2. Must pass all required

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Disclosure &amp; Acknowledgement No conflict of interest UI Council of Teaching / Instructional

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Disclosure & Acknowledgement No conflict of interest UI Council of Teaching / Instructional