Lecture 19: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 19: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and Chris Tanner

Outline Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) neuron input output node W X Y CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) neuron input output node Affine transformation 𝑍 = 𝑔(ℎ) Activation Y X We will talk later about the choice of activation function. So far we have only talked about sigmoid as an activation function but there are other choices. CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer output layer / 𝑌 = 𝑋 𝑨 * = 𝑋 ** 𝑌 * + 𝑋 *+ 𝑌 + + 𝑋 * *1 ℎ * = 𝑔(𝑨 * ) 𝑌 * 𝑋 * ,, 𝑍) , = 𝑕(ℎ * , ℎ + ) , 𝐾 = ℒ(𝑍 𝑍 𝑍 Output function Loss function 𝑋 𝑌 + + / 𝑌 = 𝑋 𝑨 + = 𝑋 +* 𝑌 * + 𝑋 ++ 𝑌 + + 𝑋 +1 + h + = 𝑔(𝑨 + ) We will talk later about the choice of the output layer and the loss function. So far we consider sigmoid as the output and log-bernouli. CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer 1 hidden layer 2 output layer 𝑌 * 𝑋 𝑋 ** +* 𝑍 𝑋 𝑋 𝑌 + *+ ++ CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1 output layer … 𝑌 * 𝑋 𝑋 ** 8* 𝑍 … 𝑋 𝑋 𝑌 + *+ 8+ We will talk later about the choice of the number of layers. CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer 3 nodes 3 nodes 𝑋 8* 𝑋 ** 𝑌 * … 𝑍 𝑋 𝑋 8+ *+ 𝑌 + 𝑋 𝑋 8: *: CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer m nodes 𝑋 8* 𝑋 ** 𝑌 * m nodes … 𝑍 … … 𝑌 + 𝑋 𝑋 8; *; We will talk later about the choice of the number of nodes. CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) Input layer hidden layer n hidden layer 1, output layer m nodes Number of inputs d 𝑋 8* 𝑋 ** 𝑌 * m nodes … 𝑍 … … 𝑌 + 𝑋 𝑋 8; *; Number of inputs is specified by the data CS109A, P ROTOPAPAS , R ADER , T ANNER

Anatomy of artificial neural network (ANN) output layer input layer hidden layer 1 hidden layer 2 CS109A, P ROTOPAPAS , R ADER , T ANNER

Why layers? Representation Representation matters! CS109A, P ROTOPAPAS , R ADER , T ANNER

Learning Multiple Components CS109A, P ROTOPAPAS , R ADER , T ANNER

Depth = Repeated Compositions CS109A, P ROTOPAPAS , R ADER , T ANNER

Neural Networks Hand-written digit recognition: MNIST data CS109A, P ROTOPAPAS , R ADER , T ANNER

Depth = Repeated Compositions CS109A, P ROTOPAPAS , R ADER , T ANNER

Beyond Linear Models Linear models: • Can be fit efficiently (via convex optimization) • Limited model capacity Alternative: f ( x ) = w T φ ( x ) Where 𝜚 is a non-linear transform CS109A, P ROTOPAPAS , R ADER , T ANNER

Traditional ML Manually engineer 𝜚 • Domain specific, enormous human effort Generic transform • Maps to a higher-dimensional space • Kernel methods: e.g. RBF kernels • Over fitting: does not generalize well to test set • Cannot encode enough prior information CS109A, P ROTOPAPAS , R ADER , T ANNER

Deep Learning Directly learn 𝜚 • 𝑔 𝑦; 𝜄 = 𝑋 / 𝜚(𝑦; 𝜄) 𝜚 𝑦; 𝜄 is an automatically-learned re resentation of x • repre For deep networks , 𝜚 is the function learned by the hidden layers of the • network 𝜄 are the learned weights • Non-convex optimization • • Can encode prior beliefs, generalizes well CS109A, P ROTOPAPAS , R ADER , T ANNER

Activation function ℎ = 𝑔(𝑋 / 𝑌 + 𝑐) The activation function should: Provide non-linearity • • Ensure gradients remain large through hidden unit Common choices are • Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut • • softplus tanh • • swish CS109A, P ROTOPAPAS , R ADER , T ANNER

Activation function ℎ = 𝑔(𝑋 / 𝑌 + 𝑐) The activation function should: Provide non-linearity • • Ensure gradients remain large through hidden unit Common choices are • sigmoid tanh • • ReLU, leaky ReLU, Generalized ReLU, MaxOut softplus • • swish • CS109A, P ROTOPAPAS , R ADER , T ANNER

Activation function ℎ = 𝑔(𝑋 / 𝑌 + 𝑐) The activation function should: Provide non-linearity • • Ensure gradients remain large through hidden unit Common choices are • sigmoid tanh • • ReLU, leaky ReLU, Generalized ReLU, MaxOut softplus • • swish CS109A, P ROTOPAPAS , R ADER , T ANNER

Sigmoid (aka Logistic) 1 𝑧 = 1 + 𝑓 UV Derivative is zero for much of the domain. This leads to “ vanishing gradients ” in backpropagation. CS109A, P ROTOPAPAS , R ADER , T ANNER

Hyperbolic Tangent (Tanh) 𝑧 = 𝑓 V − 𝑓 UV 𝑓 V + 𝑓 UV Same problem of “ vanishing gradients ” as sigmoid. CS109A, P ROTOPAPAS , R ADER , T ANNER

Rectified Linear Unit (ReLU) 𝑧 = max (0, 𝑦) Two major advantages: 1. No vanishing gradient when x > 0 2. Provides sparsity (regularization) since y = 0 when x < 0 CS109A, P ROTOPAPAS , R ADER , T ANNER

Leaky ReLU 𝑧 = max 0, 𝑦 + 𝛽 min(0,1) where 𝛽 takes a small value • Tries to fix “ dying ReLU ” problem: derivative is non-zero everywhere. • Some people report success with this form of activation function, but the results are not always consistent CS109A, P ROTOPAPAS , R ADER , T ANNER

Generalized ReLU Generalization: For 𝛽 Z > 0 𝑕 𝑦 Z , 𝛽 = max 𝑏, 𝑦 Z + 𝛽 min{0, 𝑦 Z } CS109A, P ROTOPAPAS , R ADER , T ANNER

softplus 𝑧 = log(1 + 𝑓 V ) The logistic sigmoid function is a smooth approximation of the derivative of the rectifier CS109A, P ROTOPAPAS , R ADER , T ANNER

Maxout Max of k linear functions. Directly learn the activation function. 𝑕(𝑦) = max Z∈{*,…,b} 𝛽 Z 𝑦 Z + 𝛾 CS109A, P ROTOPAPAS , R ADER , T ANNER

Swish: A Self-Gated Activation Function 𝑕 𝑦 = 𝑦 𝜏(𝑦) Currently, the most successful and widely-used activation function is the ReLU. Swish tends to work better than ReLU on deeper models across a number of challenging datasets. CS109A, P ROTOPAPAS , R ADER , T ANNER

� � Loss Function Likelihood for a given point: 𝑞 𝑧 Z 𝑋; 𝑦 Z Assume independency, likelihood for all measurements: 𝑀 𝑋; 𝑌, 𝑍 = 𝑞 𝑍 𝑋; 𝑌 = g 𝑞 𝑧 Z 𝑋; 𝑦 Z Z Maximize the likelihood, or equivalently maximize the log-likelihood: log 𝑀(𝑋; 𝑌, 𝑍) = i log 𝑞 𝑧 Z 𝑋; 𝑦 Z Z Turn this into a loss function: ℒ 𝑋; 𝑌, 𝑍 = − log 𝑀(𝑋; 𝑌, 𝑍) CS109A, P ROTOPAPAS , R ADER , T ANNER

� � Loss Function Do not need to design separate loss functions if we follow this simple procedure Examples: • Distribution is Normal then likelihood is: q p r 1 𝐍𝐓𝐅 √ 2𝜌 + 𝜏 𝑓 U o p Uo 𝑞 𝑧 Z 𝑋; 𝑦 Z = +s^+ q Z + ℒ 𝑋; 𝑌, 𝑍 = ∑ 𝑧 Z − 𝑧 Z • Distribution is Bernouli then likelihood is: o p 1 − 𝑞 Z *Uo p Cross-Entropy 𝑞 𝑧 Z 𝑋; 𝑦 Z = 𝑞 Z ℒ 𝑋; 𝑌, 𝑍 = − ∑ 𝑧 Z log 𝑞 Z + (1 − 𝑧 Z ) log(1 − 𝑞 Z ) Z CS109A, P ROTOPAPAS , R ADER , T ANNER

Design Choices Activation function Loss function Output units Architecture Optimizer CS109A, P ROTOPAPAS , R ADER , T ANNER

Output Units Output Type Output Distribution Output layer Loss Function Binary CS109A, P ROTOPAPAS , R ADER , T ANNER

Output Units Output Type Output Distribution Output layer Loss Function Binary Bernoulli CS109A, P ROTOPAPAS , R ADER , T ANNER

Output Units Output Type Output Distribution Output layer Loss Function Binary Bernoulli Binary Cross Entropy CS109A, P ROTOPAPAS , R ADER , T ANNER

Output Units Output Type Output Distribution Output layer Loss Function Binary Bernoulli ? Binary Cross Entropy CS109A, P ROTOPAPAS , R ADER , T ANNER

Lecture 19: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 19: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and Chris Tanner Outline Anatomy of a NN Design choices Activation function Loss function Output units Architecture CS109A, P ROTOPAPAS

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

CS 241: Systems Programming Lecture 3. More Shell Spring 2020 Prof. Stephen Checkoway 1 Anatomy

Anatomy and Kinesiology of the Shoulder Girdle Lesson Plan: 40a Anatomy and Kinesiology of the

Equine Dentistry Equine Dentistry The importance of proper equine dental care The importance of

Anatomy and Kinesiology of the Pelvic Girdle Lesson Plan: 44a Anatomy and Kinesiology of the

ZOOL 2003 - Human Anatomy and Physiology I Course Instructor: Course Instructor: Dr.

Anatomy Anatomy of a of a Business Plan Business Plan The Step-by-Step Guide The

The Respiratory System Respiratory Anatomy Upper respiratory tract Nose Nasal

Scope of this talk Revise the anatomy and physiology that you learnt in pre-clinical years

Vessel Histology Vessel Anatomy Valves and Heart Sounds Echocardiography Heart Anatomy Success

Basic Anatomy Anatomy: General Planes of the body

Section 38: Muscle - Anatomy Section 38: Muscle - Anatomy 38-1 Muscle types: cardiac

Lecture 18: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

The Anatomy of a Basic Will Presented to Brain Injury Society of Toronto May 20, 2020 Presenter

Presentation skills Standard modules and a sample program Alexei Kapterev June 2018 1. Wi y

Anatomy and Physiology Honey bees 3 segments Exoskeleton Bees have a hard outer covering call

CV Grad Requirements 1. Must have a minimum of 230 Credits. 2. Must pass all required

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Disclosure & Acknowledgement No conflict of interest UI Council of Teaching / Instructional

Ho w to Re a d the Bib le fo r All I ts Wo rth Se ssio n 2: Re a ding the E pistle s E

Lecture 19: Anatomy of NN CS109A Introduction to Data Science - PowerPoint PPT Presentation

Lecture 19: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader and Chris Tanner Outline Anatomy of a NN Design choices Activation function Loss function Output units Architecture CS109A, P ROTOPAPAS

Matthew Tommack, D.O. October 13, 2018 Chest radiography Anatomy, pathology Shoulder

Core Surgical Anatomy Programme Induction to the Programme &amp; the Human Anatomy Unit Revised

Fish Anatomy &amp; Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy &amp; Vision Welcome to Anatomy Emma Deighan Trainer in

Eye Disorders Patrick Sarte Anatomy of the Eye Uveitis Scleritis vs. Episcleritis

CS 241: Systems Programming Lecture 3. More Shell Spring 2020 Prof. Stephen Checkoway 1 Anatomy

Anatomy and Kinesiology of the Shoulder Girdle Lesson Plan: 40a Anatomy and Kinesiology of the

Equine Dentistry Equine Dentistry The importance of proper equine dental care The importance of

Anatomy and Kinesiology of the Pelvic Girdle Lesson Plan: 44a Anatomy and Kinesiology of the

ZOOL 2003 - Human Anatomy and Physiology I Course Instructor: Course Instructor: Dr.

Anatomy Anatomy of a of a Business Plan Business Plan The Step-by-Step Guide The

The Respiratory System Respiratory Anatomy Upper respiratory tract Nose Nasal

Scope of this talk Revise the anatomy and physiology that you learnt in pre-clinical years

Vessel Histology Vessel Anatomy Valves and Heart Sounds Echocardiography Heart Anatomy Success

Basic Anatomy Anatomy: General Planes of the body

Section 38: Muscle - Anatomy Section 38: Muscle - Anatomy 38-1 Muscle types: cardiac

Lecture 18: Anatomy of NN CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader

The Anatomy of a Basic Will Presented to Brain Injury Society of Toronto May 20, 2020 Presenter

Presentation skills Standard modules and a sample program Alexei Kapterev June 2018 1. Wi y

Anatomy and Physiology Honey bees 3 segments Exoskeleton Bees have a hard outer covering call

CV Grad Requirements 1. Must have a minimum of 230 Credits. 2. Must pass all required

Colt International Ltd Anatomy of a control system for life safety smoke control CPD Technical

Disclosure &amp; Acknowledgement No conflict of interest UI Council of Teaching / Instructional

Ho w to Re a d the Bib le fo r All I ts Wo rth Se ssio n 2: Re a ding the E pistle s E

Core Surgical Anatomy Programme Induction to the Programme & the Human Anatomy Unit Revised

Fish Anatomy & Disease Diagnosis Alex Primus University of Minnesota College of Veterinary

BIOMETRY COURSE Module 1 Anatomy & Vision Welcome to Anatomy Emma Deighan Trainer in

Disclosure & Acknowledgement No conflict of interest UI Council of Teaching / Instructional