(C) Reg (C) Regression ression, , layered layered ne neur ural - PowerPoint PPT Presentation

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of conti tinuous units ts - Reg Regres ression ion problems - Gradient t descent, t, backpropagation of error - The role of the learning rate te - O Onlin line learn e learnin ing, , stochastic approximation

Of Neurons and Netw tworks biolog biological n ical neu euron rons (very brief) - single neurons - synapses and networks - synaptic plasticity and learning simplified descripti tion - inspirati tion for arti tificial neural netw tworks arti tificial neural netw tworks - - architectures and types of networks: recurrent attr ttracto tor neural netw tworks (associative memory) feed-forward neural netw tworks (classification/ regression) Neural Networks 2

Of Neurons and Netw tworks ne neur urons: ns: pre-synaptic post-synaptic highly specialized cells dendrites cell body so soma ma • incoming dendrite tes • branched axon axon • many ma ny ne neur urons ns ! ≳ 10 12 in human cortex axon soma highly connecte ted ! ≳ 1000 neighbors synaptic cleft axon acti tion pote tenti tials / spikes: branches ∙ travel along ∙ cells generate the axon electric pulses Neural Networks 3

Of Neurons and Netw tworks pre-synaptic synap sy napses: ses: ∙ pre-synaptic pulse arriving at vesicles excita tato tory /inhibito tory synapse transmitter triggers / hinders post-synaptic spike generation synaptic cleft receptors excitatory: increase ∙ incoming the postsynaptic pulse membrane potential inhibitory: decrease post-synaptic ∙ all or nothing response potential exceeds th threshold ⇨ postsynaptic neuron fires potential is sub-th threshold ⇨ posts tsynapti tic neuron rests Neural Networks 4

Of Neurons and Netw tworks simplified description of neural activity: firing rate tes single spikes time [ms] e.g. spikes / ms mean activity S(t) Neural Networks 5

Of Neurons and Netw tworks (mean) local pote tenti tial at neuron i (with activity S i ) X weighte ted sum of incoming activities w ij S j j j excita tato tory synapse  > 0 synapti tic  w ij = = 0 weights ts inhibito tory synapse < 0  i Neural Networks 6

Acti tivati tion Functi tion hP i non-l no n-line near resp sponse nse: S i = h j w ij S j ∙ minimal activity h(x → - ∞ ) ≡ 0 important class of fcts.: ∙ maximal activity h(x → + ∞ ) ≡ 1 sigmo sigmoid idal al acti tivati tion ∙ monotonic increase h’(x) > 0 h ( x i ) = 1 ⇣ ⌘ just one example: 1 + tanh [ γ ( x i − θ )] 2 1 gain parameter Υ Υ local threshold θ X x i = w ij S j 0 θ j Neural Networks 7

Acti tivati tion Functi tion hP i no non-l n-line near resp sponse nse: S i = g j w ij S j ∙ minimal activity g(x → - ∞ ) ≡ -1 sigmo sigmoid idal al acti tivati tion ∙ maximal activity g(x → + ∞ ) ≡ 1 ∙ monotonic increase g ’ (x) > 0 just one example: g ( x i ) = tanh [ γ ( x i − θ )] 1 gain parameter Υ Υ local threshold θ X x i = w ij S j -1 j θ Neural Networks 8

McCulloch Pitts tts Neurons an extreme case: infinite te gain γ → ∞ ⇢ +1 for x ≥ θ g ( x i ) = tanh [ γ ( x i − θ )] → sign [ x − θ ] = − 1 for x < θ McCulloch Pitts tts [1943]: model neuron is either quiescent or maximally active do not consider graded response local threshold θ 1 ( don’t confuse θ with the all-or-nothing X x i = w ij S j threshold in spiking -1 j θ neurons ) Neural Networks 9

Synapti tic Plasti ticity ty D. D. Heb Hebb [1949] [1949] Hypothesis: Heb Hebbian ian Learn Learnin ing A consider - presynaptic neuron A - postsynaptic neuron B - excitatory synapse w BA B If A and B (frequently) fire at the same time the excitatory synaptic strength increases w AB → memory-effect will favor joint activity in the future For symmetrized firing rates − 1 ≤ S A , S B ≤ +1 change of synaptic strength ∆ w BA ∝ S A S B pre-synaptic x post-synaptic activity Neural Networks 10

Arti tificial Neural Netw tworks in the following: - assembled from simple firing rate neurons - connected by weights, real valued synaptic strenghts - various architectures and types of networks e.g.: attr ttracto tor neural netw tworks, recurrent t netw tworks w ij S i ( t ) dynamical systems, e.g. Hopfield model: S j ( t ) network of McCulloch Pitts neurons, can operate as Associative Memory by learning of synaptic interactions here: N=5 neurons partial connectivity Neural Networks 11

feed-forward netw tworks layered archite tectu ture input t layer (external stimulus) (here: 6-3-4-1) directe ted connecti tions (here: only to next layer) hidden units ts (internal representation) w ij 0 1 @X S i = g w ij S j A j ↑ previous layer only outp tput t unit( t(s) (function of input vector) Neural Networks 12

the perceptr th tron revisite ted input t units ts ξ j ∈ I R R N weights ts w j ∈ I R, w ∈ I single outp tput t unit 0 1 N X S = sign w j ξ j − θ @ A j =1 output = “ linear separable functi tion ” of input variables parameterized by the weight vector and threshold θ w Neural Networks 13

convergent t tw two-layer archite tectu ture R N input t units ts ξ j ∈ I R, ξ ∈ I w ( k ) input t to to hidden weights ts j 0 1 hidden layer units ts w ( k ) @X S k = g ξ j A j j hidden to to outp tput t weights ts v k single outp tput t unit σ output = non-linear functi tion of input variables: 0 0 1 1 K ! w ( k ) X @X @X σ = g v k S k = g v k g ξ j A A j k =1 k j parameterized by set of all weights (and threshold) Neural Networks 14

netw tworks of conti tinuous nodes continuous activation functions, e.g. g ( x ) = tanh ( γ x ) for all nodes in the network given a network architecture, the weights (and thresholds) parameterize a function (input/output relation): R N → σ ( ξ ) ∈ I (here: single output unit) ξ ∈ I R Learning as reg regression ression problem problem µ , τ µ = { ξ µ , τ µ = τ ( ξ µ ) } P set of examples with real-valued labels µ =1 tr training: (approximately) implement σ ( ξ µ ) = τ ( ξ µ ) for all µ generalizati tion: σ ( ξ ) ≈ τ ( ξ ) application to novel data Neural Networks 15

error measure and tr training training strategy: employ an error m error measu easure re for comparison of student/teacher outputs just one very popular and plausible choice: e ( σ , τ ) = 1 2 ( σ − τ ) 2 quadrati tic deviati tion: P P E = 1 e µ = 1 1 ⌘ 2 ⇣ X X cost t functi tion: σ ( ξ µ ) − τ ( ξ µ ) 2 P P µ =1 µ =1 - defined for a given set of example data - guides the training process - is a differenti tiable functi tion of weights and thresholds - training by gradient t descent t minimization of E Neural Networks 16

a single unit t . . . . . . R N ξ j ∈ I R, ξ ∈ I R N w ∈ I 0 1 N X σ = g w j ξ j @ A j =1 P E ( w ) = 1 1 g ( w · ξ µ ) − τ µ ⌘ 2 ⇣ X 2 P µ =1 P ∂ E ( w ) = 1 ⇣ g ( w · ξ µ ) − τ µ ⌘ X g 0 ( w · ξ µ ) ξ µ k ∂ w k P µ =1 P r w E ( w ) = 1 ⇣ g ( w · ξ µ ) � τ µ ⌘ X g 0 ( w · ξ µ ) ξ µ P µ =1 Neural Networks 17

convenient calculation of the gradient in multilayer networks ( chain rule) Backpropagation of Error example: continuous two-layer network with hidden units convenient calculation of the gradient in multilayer networks ( ← chain rule) inputs example: continuous two-layer network with K hidden units weights inputs R N ξ ∈ I weights R N , k = 1 , 2 , . . . , K w k ∈ I hidden units convenient calculation of the gradient in multilayer networks ( chain rule) hidden units σ k ( ξ ) = g ( w k · ξ ) example: continuous two-layer network with hidden units output inputs ⇣P K ⌘ output σ ( ξ ) = h j =1 v j g ( w j · ξ ) derive and weights derive and the weigths w k and v k are used ... hidden units the weigths – downward for the calculation of hidden states and output and are used ... – upward for the calculation of the gradient ⇣P ⌘ output – for the calculation of hidden states and output 75 – for the calculation of the gradient derive and ∂ E Exercise: r w k E ∂ v k the weigths and are used ... 18 – for the calculation of hidden states and output – for the calculation of the gradient

backpropagati tion A.E. Bryson, Y.-C. Ho (1969) (1969) Applied optimal control: optimization, estimation and control. Blaisdell Publishing, p 481 P. Werbos (1974). (1974). Beyond regression: New Tools for Prediction and Analysis in Behavorial Sciences PhD thesis, Harvard University D.E. Rumelhart, G.E. Hinton, R.J. Williams (1986) (1986) Learning representations by backpropagating errors. Nature 323 (6088): 533-536 Neural Networks 19

backpropagati tion 1987 1995 Neural Networks 20

negative gradient gives the direction of steepest descent in E simple gradient based minimization of E : sequence w 0 → w 1 → . . . → w t → w t +1 → . . . with w t +1 = w t − η r E | w t approaches some minimum of (?) E learning rate rate η – controls the step size of the algorithm – has to be small enough to ensure convergence – should be as large as possible to facilitate fast learning 21

(C) Reg (C) Regression ression, , layered layered ne neur ural - PowerPoint PPT Presentation

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of conti tinuous units ts - Reg Regres ression ion problems - Gradient t descent, t, backpropagation of error - The role of the learning rate te - O

Sa Safety verification for deep ne neur ural ne networks Marta Kwiatkowska Department of

Layered Systems Software Design and Architectures Layered Systems BSD Unix Layered Architecture

Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS

Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA Regina SAS

e En Entr trepr prene neur urshi ship Cl Club & & Me Mentori oring En Entr

R URAL H EALTH C HALLENGES AND THE R EMEDIAL P ROSPECTS OF T ECHNOLOGY A ZALEA H EALTH L EADERS S

LONDON LO CUL CULTURAL CUL CULTURAL URAL URAL PROFILE PR PROFILE PR OFILE OFILE REPOR

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Hodgkin-Huxley Model of Action Potentials Differential Equations Math 210 Neuron Axon

Neural Networks CS 6355: Structured Prediction Based on slides and material from Geoffrey Hinton,

-Synuclein Toxicity Biology of -Synuclein Dendrites -Synuclein Maroteaux, et al. J.

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University

Understanding New Regulatory Network to Develop Novel Immunotherapy Palermo, October 24, 2011

MISSION Clinical Program GWG Recommendations Shyam Patel Senior Science Officer, Portfolio

How to Model HIV How to Model HIV Infection Infection Alan S. Perelson, PhD Alan S. Perelson,

INDUSTRIAL OPPORTUNITIES & APPLICATION OF ALTERNATIVE TESTING METHODS FOR RISK ASSESSMENT

Sambuz

Useful Links

Newsletter

Mail Us

(C) Reg (C) Regression ression, , layered layered ne neur ural - PowerPoint PPT Presentation

(C) Reg (C) Regression ression, , layered layered ne neur ural netw tworks - Networks of conti tinuous units ts - Reg Regres ression ion problems - Gradient t descent, t, backpropagation of error - The role of the learning rate te - O

Sa Safety verification for deep ne neur ural ne networks Marta Kwiatkowska Department of

Layered Systems Software Design and Architectures Layered Systems BSD Unix Layered Architecture

Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS

Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA Regina SAS

e En Entr trepr prene neur urshi ship Cl Club &amp; &amp; Me Mentori oring En Entr

R URAL H EALTH C HALLENGES AND THE R EMEDIAL P ROSPECTS OF T ECHNOLOGY A ZALEA H EALTH L EADERS S

LONDON LO CUL CULTURAL CUL CULTURAL URAL URAL PROFILE PR PROFILE PR OFILE OFILE REPOR

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Hodgkin-Huxley Model of Action Potentials Differential Equations Math 210 Neuron Axon

Neural Networks CS 6355: Structured Prediction Based on slides and material from Geoffrey Hinton,

-Synuclein Toxicity Biology of -Synuclein Dendrites -Synuclein Maroteaux, et al. J.

Artifical Neural Networks STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University

Understanding New Regulatory Network to Develop Novel Immunotherapy Palermo, October 24, 2011

MISSION Clinical Program GWG Recommendations Shyam Patel Senior Science Officer, Portfolio

How to Model HIV How to Model HIV Infection Infection Alan S. Perelson, PhD Alan S. Perelson,

INDUSTRIAL OPPORTUNITIES &amp; APPLICATION OF ALTERNATIVE TESTING METHODS FOR RISK ASSESSMENT

Sambuz

Useful Links

Newsletter

Mail Us

e En Entr trepr prene neur urshi ship Cl Club & & Me Mentori oring En Entr

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

INDUSTRIAL OPPORTUNITIES & APPLICATION OF ALTERNATIVE TESTING METHODS FOR RISK ASSESSMENT