CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] - PowerPoint PPT Presentation

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11, [B] Sec. 4.1.7, 5.1, [M] Sec. 8.5.4, [RN] Sec. 18.7 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Outline • Neural networks – Perceptron – Supervised learning algorithms for neural networks University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Brain • Seat of human intelligence • Where memory/knowledge resides • Responsible for thoughts and decisions • Can learn • Consists of nerve cells called neurons University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Neuron University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Comparison • Brain – Network of neurons – Nerve signals propagate in a neural network – Parallel computation – Robust (neurons die everyday without any impact) • Computer – Bunch of gates – Electrical signals directed by gates – Sequential and parallel computation – Fragile (if a gate stops working, computer crashes) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Artificial Neural Networks • Idea: mimic the brain to do computation • Artificial neural network: – Nodes (a.k.a. units) correspond to neurons – Links correspond to synapses • Computation: – Numerical signal transmitted between nodes corresponds to chemical signals between neurons – Nodes modifying numerical signal corresponds to neurons firing rate University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

ANN Unit • For each unit i: • Weights: ! – Strength of the link from unit " to unit # – Input signals $ " weighted by % #" and linearly combined: & # = ∑ ) % *) $ ) + , - = ! . / 0 • Activation function: 1 – Numerical signal produced: 2 * = ℎ(& * ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

ANN Unit • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Activation Function • Should be nonlinear – Otherwise network is just a linear function • Often chosen to mimic firing in neurons – Unit should be “active” (output near 1) when fed with the “right” inputs – Unit should be “inactive” (output near 0) when fed with the “wrong” inputs University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Common Activation Functions Threshold Sigmoid University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Logic Gates • McCulloch and Pitts (1943) – Design ANNs to represent Boolean functions • What should be the weights of the following units to code AND, OR, NOT ? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Network Structures • Feed-forward network – Directed acyclic graph – No internal state – Simply computes outputs from inputs • Recurrent network – Directed cyclic graph – Dynamical system with internal states – Can memorize information University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Feed-forward network • Simple network with two inputs, one hidden layer of two units, one output unit University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Perceptron • Single layer feed-forward network University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Supervised Learning • Given list of (", $) pairs • Train feed-forward ANN – To compute proper outputs $ when fed with inputs " – Consists of adjusting weights & '( • Simple learning algorithm for threshold perceptrons University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Threshold Perceptron Learning • Learning is done separately for each unit ! – Since units do not share weights • Perceptron learning for unit ! : – For each (#, %) pair do: • Case 1: correct output produced ∀ ( ) *( ← ) *( • Case 2: output produced is 0 instead of 1 ∀ ( ) *( ← ) *( + - ( • Case 3: output produced is 1 instead of 0 ∀ ( ) *( ← ) *( − - ( – Until correct output for all training instances University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Threshold Perceptron Learning " # ! " # ! • Dot products: ! " ≥ 0 and −! " ≤ 0 • Perceptron computes 1 when ) # ! " = ∑ , - , . , + . 0 > 0 0 when ) # ! " = ∑ , - , . , + . 0 < 0 • If output should be 1 instead of 0 then " 4 ! " ≥ ) # ! ) ← ) + ! " since ) + ! " • If output should be 0 instead of 1 then " 4 ! " ≤ ) # ! ) ← ) − ! " since ) − ! " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Alternative Approach • Let ! ∈ −1,1 ∀! • Let ' = { * + , ! + ∀+ } be set of misclassified examples – i.e., ! + - . / * 0 < 0 • Find - that minimizes misclassification error 3(-) = − ∑ * 7 ,8 7 ∈9 ! + - . / * 0 • Algorithm: gradient descent - ← - − ;<= learning rate or step length University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Sequential Gradient Descent • Gradient: !" = − ∑ & ' ,) ' ∈+ , - . / 0 • Sequential gradient descent: – Adjust 1 based on one example /, , at a time 1 ← 1 + 4,. / • When 4 = 1 , we recover the threshold perceptron learning algorithm University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Threshold Perceptron Hypothesis Space • Hypothesis space ℎ " : – All binary classifications with parameters " s.t. " # $ % > 0 → +1 " # $ % < 0 → −1 • Since " # $ % is linear in " , perceptron is called a linear separator • Theorem: Threshold perceptron learning converges iff the data is linearly separable University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

Linear Separability • Examples: Linearly separable Non-linearly separable University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Sigmoid Perceptron • Represent “soft” linear separators • Same hypothesis space as logistic regression University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

Sigmoid Perceptron Learning • Possible objectives – Minimum squared error ! " = 1 ! ' " ( = 1 ( ) ' − + " , - 2 & 2 & . / ' ' – Maximum likelihood • Same algorithm as for logistic regression – Maximum a posteriori hypothesis – Bayesian Learning University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

Gradient • Gradient: !" !" * !# $ = ∑ ' ( ' ) !# $ = − ∑ ' ( ' ) , - ) . ̅ 0 ' 0 1 Recall that , - = ,(1 − ,) = − ∑ ' ( ' ) , ) . ̅ 1 − , ) . ̅ 0 ' 0 ' 0 1 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 24

Sequential Gradient Descent • Perceptron-Learning(examples,network) – Repeat • For each (" # , % & ) in examples do ( & ← % & − +(, - . " # ) , ← , + 0 ( & + , - . 1 − + , - . " # " # . " # – Until some stopping criterion satisfied – Return learnt network • N.B. 0 is a learning rate corresponding to the step size in gradient descent University of Waterloo CS480/680 Spring 2019 Pascal Poupart 25

Multilayer Networks • Adding two sigmoid units with parallel but opposite “cliffs” produces a ridge University of Waterloo CS480/680 Spring 2019 Pascal Poupart 26

Multilayer Networks • Adding two intersecting ridges (and thresholding) produces a bump University of Waterloo CS480/680 Spring 2019 Pascal Poupart 27

Multilayer Networks • By tiling bumps of various heights together, we can approximate any function • Training algorithm: – Back-propagation – Essentially sequential gradient descent performed by propagating errors backward into the network – Derivation next class University of Waterloo CS480/680 Spring 2019 Pascal Poupart 28

Neural Net Applications • Neural nets can approximate any function, hence millions of applications – Speech recognition – Word embeddings – Machine translation – Vision-based object recognition – Vision-based autonomous driving – Etc. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 29

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] - PowerPoint PPT Presentation

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11, [B] Sec. 4.1.7, 5.1, [M] Sec. 8.5.4, [RN] Sec. 18.7 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Outline Neural networks

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

SLID IDE : In In Defen ense e of Smart Algorithms over er Ha Hardware e Accel eler

Safe and Robust Deep Learning Gagandeep Singh PhD Student Department of Computer Science 1

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Effective Approaches to Attention-based Neural Machine Translation Minh-Thang Luong , Hieu Pham,

1/16/2014 HIV Disease and Distal Sensory Polyneuropathy (DSP) David M. Kietrys, PT, PhD, OCS

neural circuits Lennart Oettl 2.11.2018 Oxytocin Discovered by Sir Henry H. Dale in 1906

S OCIAL COGNITION IN PEOPLE WITH AUTISM : R OLE OF N EUROPEPTIDES Karen J. Parker, PhD Department

1 Stress and Illness: Initial Vulnerability If a person has a pre-existing vulnerability