Out line Neural net wor ks Percept r on Neural Net works - PDF document

Out line • Neural net wor ks – Percept r on Neural Net works – Supervised learning algorit hms f or neur al net works J uly 7, 2005 • Reading: R&N Ch 20.5 CS 486/ 686 Univer sit y of Wat erloo 2 CS486/686 Lecture Slides (c) 2005 P. Poupart Brain Neuron • Seat of human int elligence Axonal arborization Axon from another cell • Where memory/ knowledge resides Synapse • Responsible f or t hought s and decisions Dendrite Axon • Can learn Nucleus • Consist s of ner ve cells called neurons Synapses Cell body or Soma 3 4 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Comparison Art if icial Neural Net works • Brain • I dea: mimic t he brain t o do comput at ion – Net work of neurons • Art if icial neural net work: – Nerve signals propagat e in a neural net work – Parallel comput at ion – Nodes (a.k.a unit s) correspond t o neurons – Links correspond t o synapses – Robust (neurons die everyday wit hout any impact ) • Comput at ion: • Comput er – Numerical signal t ransmit t ed bet ween nodes – Bunch of gat es corresponds t o chemical signals bet ween neurons – Elect rical signals direct ed by gat es – Nodes modif ying numerical signal cor responds t o – Sequent ial comput at ion neurons f iring rat e – Fragile (if a gat e st ops working, comput er crashes) 5 6 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart 1

ANN Unit ANN Unit • For each unit i: a 0 = −1 Bias Weight a = g(in ) i i W 0,i • Weight s: W j i g – St rengt h of t he link f rom unit j t o unit i in i Σ W j,i a – I nput signals a j weight ed by W j i and linearly a j i combined: in i = Σ j W j i a j • Act ivat ion f unct ion: g Input Input Activation Output Output Links Function Function Links – Numerical signal pr oduced: a i = g(in i ) 7 8 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Act ivat ion Funct ion Common Act ivat ion Funct ions • Should be nonlinear Thr eshold Sigmoid – Ot herwise net work is j ust a linear f unct ion g ( in i ) g ( in i ) • Of t en chosen t o mimic f ir ing in neurons +1 +1 – Unit should be “act ive” (out put near 1) when f ed wit h t he “right ” input s in i in i – Unit should be “inact ive” (out put near 0) (a) (b) when f ed wit h t he “wrong” input s g(x) = 1/ (1+e -x ) 9 10 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Logic Gat es Net work St ruct ures • McCulloch and Pit t s (1943) • Feed-f orwar d net work – Design ANNs t o represent Boolean f ns – Direct ed acyclic gr aph – No int ernal st at e • What should be t he weight s of t he f ollowing unit s t o code AND, OR, NOT ? – Simply comput es out put s f rom input s • Recurrent net work -1 -1 -1 – Direct ed cyclic graph a 1 a 1 thresh thresh thresh a 1 – Dynamical syst em wit h int ernal st at es a 2 a 2 – Can memorize inf ormat ion 11 12 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart 2

Feed-f orward net work Percept ron • Simple net work wit h t wo input s, one • Single layer f eed-f orwar d net wor k hidden layer of t wo unit s, one out put unit W 1,3 1 3 W 3,5 W 1,4 5 W W 2,3 4,5 2 4 W 2,4 Input Output W j,i a 5 = g(W 3,5 a 3 + W 4,5 a 4 ) Units Units = g(W 3,5 g(W 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g(W 1,4 a 1 + W 2,4 a 2 )) 13 14 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Supervised Learning Threshold Percept ron Lear ning • Learning is done separat ely f or each unit • Given list of < input ,out put > pair s – Since unit s do not share weight s • Train f eed-f orwar d ANN • Percept r on lear ning f or unit i: – To comput e proper out put s when f ed wit h – For each < input s,out put > pair do: input s • Case 1: correct out put produced – Consist s of adj ust ing weight s W j i – ∀ j W j i � W j i • Case 2: out put produced is 0 inst ead of 1 – ∀ j W j i � W j i + a j • Simple lear ning algorit hm f or t hreshold • Case 3: out put produced is 1 inst ead of 0 percept rons – ∀ j W j i � W j i – a j – Unt il correct out put f or all t raining inst ances 15 16 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Thr eshold Per cept r on Threshold Percept r on Lear ning Hypot hesis Space • Dot product s: a ● a ≥ 0 and -a ● a ≤ 0 • Hypot hesis space h W : – All binary classif icat ions wit h param. W s.t . • Percept ron comput es • a ● W > 0 � 1 – 1 when a ● W = Σ j a j W j i > 0 • a ● W < 0 � 0 – 0 when a ● W = Σ j a j W j i < 0 • I f out put should be 1 inst ead of 0 t hen • Since a ● W is linear in W, percept ron is – W � W+a since a ● (W+a) ≥ a ● W called a linear separ at or • I f out put should be 0 inst ead of 1 t hen – W � W-a since a ● (W-a) ≤ a ● W 17 18 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart 3

Thr eshold Per cept r on Sigmoid Percept ron Hypot hesis Space • Ar e all Boolean gat es linear ly separ able? • Represent “sof t ” linear separat or s I I I 1 1 1 1 1 1 ? 0 0 0 I I I 0 1 0 1 0 1 2 2 2 I I I I I xor I (a) and (b) or (c) 1 2 1 2 1 2 19 20 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Sigmoid Percept ron Learning Percept ron Error Gradient • E = 0.5 Err 2 = 0.5 (y – h W ( x )) 2 • Formulat e lear ning as an opt imizat ion search in weight space • ∂ E/ ∂ W j = Err × ∂ Err/ ∂ W j – Since g dif f erent iable, use gradient descent = Err × ∂ (y – g( Σ j W j x j )) = -Err × g’( Σ j W j x j ) × x j • Minimize squared error: – E = 0.5 Er r 2 = 0.5 (y – h W ( x )) 2 • When g is sigmoid f n, t hen g’ = g(1-g) • x : input • y: t arget out put • h W ( x ): comput ed out put 21 22 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Mult ilayer Feed-f or war d Percept ron Learning Algorit hm Neur al Net works • Percept r on-Learning(examples,net work) • Percept ron can only represent (sof t ) – Repeat linear separ at or s • For each e in examples do – Because single layer – in � Σ j W j x j [e] – Err � y[e] – g(in) – W j � W j + α × Err × g’(in) × x j [e] • Wit h mult iple layer s, what f ns can be – Unt il some st opping crit eria sat isf ied represent ed? – Ret urn learnt net work – Virt ually any f unct ion! • N.B. α is a lear ning r at e cor responding t o t he st ep size in gradient descent 23 24 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart 4

Mult ilayer Net works Mult ilayer Net works • Adding t wo sigmoid unit s wit h parallel • Adding t wo int er sect ing ridges (and but opposit e “clif f s” produces a ridge t hresholding) produces a bump Network output Network output 0.9 1 0.8 0.9 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 4 4 2 2 -4 -2 0 -4 -2 0 -4 -4 x2 x2 -2 -2 0 0 2 2 x1 x1 4 4 25 26 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Mult ilayer Net works Neural Net Applicat ions • By t iling bumps of various height s t o- • Neural net s can approximat e any get her, we can approximat e any f unct ion f unct ion, hence 1000’s of applicat ions – NETt alk f or pronouncing English t ext • Training algorit hm: – Charact er recognit ion – Back-propagat ion – Paint -qualit y inspect ion – Essent ially gradient perf ormed by – Vision-based aut onomous driving propagat ing err ors backward int o t he – Et c. net work – See t ext book f or derivat ion 27 28 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart Next Class Neural Net Drawbacks • Common problems: • Next Class: •Ensemble lear ning – How should we int erpret unit s? •Russell and Norvig Sect . 18.4 – How many layers and unit s should a net work have? – How t o avoid local opt imum while t raining wit h gradient descent ? 29 30 CS486/686 Lecture Slides (c) 2005 P. Poupart CS486/686 Lecture Slides (c) 2005 P. Poupart 5

Out line Neural net wor ks Percept r on Neural Net works - PDF document

Out line Neural net wor ks Percept r on Neural Net works Supervised learning algorit hms f or neur al net works J uly 7, 2005 Reading: R&N Ch 20.5 CS 486/ 686 Univer sit y of Wat erloo 2 CS486/686 Lecture Slides (c)

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

Hartford Line: A New Model for Intercity Passenger Rail 1 Hartford Line Service 2 Hartford

Coupling On-line and Off-line Random Graphs Woojin Kim March 1st Introduction Preliminary

John Heartfeld J. Otto Seibold Tempest Half life Piet Mondrian The 7 elements of art 1. line

God Reaches Out Week 1: God Reaches Out To Meet Us Where We Are Week 2: God Reaches Out In

Hudson Line Investments and Capacity Constraints Pascack Valley Line The Pascack Valley

Command Line Arguments ECE2893 Lecture 20 ECE2893 Command Line Arguments Spring 2011 1 / 5

A 3.5 keV Photon Line from a 3.5 keV ALP Line Markus Rummel, University of Oxford Seminar,

The Command Line Matthew Bender CMSC Command Line Workshop Octover 30 Matthew Bender (2015)

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Line Segments and Triangles A line drawing = set of line segments + set of faces. We need to

Line segment intersection Find all pairs of intersecting line segments. Find all pairs of

Cognitive Neuroscience Philipp Koehn 7 February 2019 Philipp Koehn Artificial Intelligence:

Buddhas Brain: Strengthening the Neural Foundations of Mindfulness and Compassion Leading

Neurobhavana Rick Hanson, Ph.D. The Wellspring Institute for Neuroscience and Contemplative

Partial Groups and Homology Groups, Partial Groups, Homology, Topology The homology of a

Nerve cell model and asymptotic expansion Yasushi ISHIKAWA [Department of Mathematics, Ehime

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

On Fourier and Wavelets: On Fourier and Wavelets: Representation, Approximation and

Data Centric Systems and Networking (DCSN) Session 1: Introduction to R212 Eiko Yoneki Systems