 
              Perception as Signal Processing October 16, 2018
What is theory for? To answer why?
What is theory for? To answer why? There are two sorts of answer in the context of neuroscience.
What is theory for? To answer why? There are two sorts of answer in the context of neuroscience. Constructive or mechanistic – why is the sky blue? ◮ provides a mechanistic understanding of observations ◮ links structure to function ◮ helps to codify, organise and relate experimental findings
What is theory for? To answer why? There are two sorts of answer in the context of neuroscience. Constructive or mechanistic – why is the sky blue? ◮ provides a mechanistic understanding of observations ◮ links structure to function ◮ helps to codify, organise and relate experimental findings Normative or teleological – why do we see light between 390 to 700 nm? ◮ provides an understanding of the purpose of function ◮ only sensible in the context of evolutionary selection
Sensation and Perception Two dominant ways of thinking about sensory systems and perception. Signal processing – falls between normative and mechanistic ◮ a succession of filtering and feature-extraction stages that arrives at a ’detection’ or ’recognition’ output. ◮ dominated by feed-forward metaphors ◮ temporal processing often limited to integration ◮ some theories may incorporate local recurrence and also feedback for feature selection or attention ◮ behavioural and neural theory is dominated by information-like quantities Inference – strongly normative ◮ parse sensory input to work out the configuration of the world ◮ fundamental roles for lateral interaction, feedback and dynamical state ◮ behavioural theory is well understood and powerful; neural underpinnings are little understood.
Signal-processing paradigms filtering 1 (efficient) coding 2 feature detection 3
Signal-processing paradigms filtering 1 (efficient) coding 2 feature detection 3
The eye and retina
Centre-surround receptive fields
Centre-surround models Centre-surround receptive fields are commonly described by one of two equations, giving the scaled response to a point of light shone at the retinal location ( x , y ) . A difference-of-Gaussians (DoG) model: � � � − ( x − c x ) 2 + ( y − c y ) 2 − ( x − c x ) 2 + ( y − c y ) 2 1 1 D DoG ( x , y ) = − exp exp 2 πσ 2 2 σ 2 2 πσ 2 2 σ 2 c c s s 0.06 0.05 0.06 0.04 0.04 0.03 0.02 0.02 0 0.01 −0.02 10 5 10 0 5 0 0 −5 −5 −0.01 −10 −10 −5 0 5 10 −10
Centre-surround models . . . or a Laplacian-of-Gaussian (LoG) model: � � �� − ( x − c x ) 2 + ( y − c y ) 2 1 D LoG ( x , y ) = −∇ 2 2 πσ 2 exp 2 σ 2 0.06 0.05 0.06 0.04 0.04 0.03 0.02 0.02 0 0.01 −0.02 10 5 10 0 5 0 0 −5 −5 −0.01 −10 −10 −10 −5 0 5 10
Linear receptive fields The linear-like response apparent in the prototypical experiments can be generalised to give a predicted firing rate in response to an arbitrary stimulus s ( x , y ) : � r ( c x , c y ; s ( x , y )) = dx dy D c x , c y ( x , y ) s ( x , y ) The receptive field centres ( c x , c y ) are distributed over visual space. If we let D () represent the RF function centred at 0, instead of at ( c x , c y ) , we can write: � r ( c x , c y ; s ( x , y )) = dx dy D ( c x − x , c y − y ) s ( x , y ) which looks like a convolution.
Transfer functions Thus a repeated linear receptive field acts like a spatial filter, and can be characterised by its frequency-domain transfer function. (Indeed, much early visual processing is studied in terms of linear systems theory.) Transfer functions for both DoG and LoG centre-surround models are bandpass . Taking 1D versions: 2 1 1.8 0.8 1.6 0.6 centre Gaussian second derivative ( ω 2 ) 0.4 1.4 difference 1.2 0.2 response response 0 1 0.8 −0.2 Gaussian −0.4 0.6 surround Gaussian 0.4 −0.6 −0.8 0.2 product −1 0 0 fmax 0 fmax frequency frequency This accentuates mid-range spatial frequencies.
Transfer functions
Edge detection Bandpass filters emphasise edges: orginal image DoG responses thresholded
Orientation selectivity
Linear receptive fields – simple cells Linear response encoding: � ∞ � r ( t 0 , s ( x , y , t )) = d τ dx dy s ( x , y , t 0 − τ ) D ( x , y , τ ) 0 For separable receptive fields: D ( x , y , τ ) = D s ( x , y ) D t ( τ ) For simple cells: � � − ( x − c x ) 2 − ( y − c y ) 2 D s = exp cos ( kx − φ ) 2 σ 2 2 σ 2 x y
Linear response functions – simple cells
Simple cell orientation selectivity
2D Fourier Transforms Again, the best way to look at a filter is in the frequency domain, but now we need a 2D transform. � � − x 2 − y 2 D ( x , y ) = exp cos ( kx ) 2 σ 2 2 σ 2 x y � � � − x 2 − y 2 dx dy e − i ω x x e − i ω y y exp � D ( ω x , ω y ) = cos ( kx − φ ) 2 σ 2 2 σ 2 x y � � dx e − i ω x x e − x 2 / 2 σ 2 dy e − i ω y y e − y 2 / 2 σ 2 x cos ( kx − φ ) · = y � � √ √ x / 2 ◦ π [ δ ( ω x − k ) + δ ( ω x + k )] e − σ 2 x ω 2 2 πσ y e − σ 2 y ω 2 y / 2 = 2 πσ x � y ] � 2 [( ω x − k ) 2 σ 2 x + ω 2 y σ 2 y ] + e − 1 2 [( ω x + k ) 2 σ 2 x + ω 2 y σ 2 e − 1 = 2 π 2 σ x σ y Easy to read spatial frequency tuning, bandwidth; orientation tuning and (for homework) bandwidth.
Drifting gratings s ( x , y , t ) = G + A cos ( kx − ω t − φ )
Separable and inseparable response functions Separable: motion sensitive; Inseparable: motion sensitive; not direction sensitive and direction sensitive
Complex cells Complex cells are sensitive to orientation, but, supposedly, not phase. One model might be (neglecting time) �� � � � 2 − ( x − c x ) 2 − ( y − c y ) 2 r ( s ( x , y )) = dx dy s ( x , y ) exp cos ( kx ) 2 σ 2 2 σ 2 x y �� � � � 2 − ( x − c x ) 2 − ( y − c y ) 2 + dx dy s ( x , y ) exp cos ( kx − π/ 2 ) 2 σ 2 2 σ 2 x y But many cells do have some residual phase sensitivity. Quantified by ( f 1 / f 0 ratio). Stimulus-response functions (and constructive models) for complex cells are still a matter of debate.
Other V1 responses: surround effects
Other V1 responses ◮ end-stopping (hypercomplex) ◮ blobs and colour ◮ . . .
Signal-processing paradigms filtering 1 (efficient) coding 2 feature detection 3
Information What does a neural response tell us about a stimulus? Shannon theory: ◮ Entropy: bits needed to specify an exact stimulus. ◮ Conditional entropy: bits needed to specify the exact stimulus after we see the response. ◮ (Average mutual) information: the difference (infomation gained from the response) ◮ Mutual information is bounded by the entropy of the response ⇒ maximum entropy encoding and decorrelation. Discrimination theory: ◮ How accurately (squared-error) can the stimulus be estimated from the response. ◮ Cram´ er-Rao bound relates this to the Fisher Information – a differential measure of how much the response distribution changes with the stimulus. ◮ Fisher information can often be optimised directly. Linked by rate-distortion theory and by aymptotic (large population) arguments.
Entropy maximisation � � I [ � R | � S ; R ] = H [ R ] − H S � �� � � �� � marginal entropy noise entropy
Entropy maximisation � � I [ � R | � S ; R ] = H [ R ] − H S � �� � � �� � marginal entropy noise entropy � � � If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H S
Entropy maximisation � � I [ � R | � S ; R ] = H [ R ] − H S � �� � � �� � marginal entropy noise entropy � � � If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H S Consider a (rate coding) neuron with r ∈ [ 0 , r max ] . � r max h ( r ) = − dr p ( r ) log p ( r ) 0
Entropy maximisation � � I [ � R | � S ; R ] = H [ R ] − H S � �� � � �� � marginal entropy noise entropy � � � If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H S Consider a (rate coding) neuron with r ∈ [ 0 , r max ] . � r max h ( r ) = − dr p ( r ) log p ( r ) 0 To maximise the marginal entropy, we add a Lagrange multiplier ( µ ) to enforce normalisation and then differentiate � � r max � � − log p ( r ) − 1 − µ δ r ∈ [ 0 , r max ] h ( r ) − µ p ( r ) = δ p ( r ) 0 otherwise 0
Entropy maximisation � � I [ � R | � S ; R ] = H [ R ] − H S � �� � � �� � marginal entropy noise entropy � � � If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H S Consider a (rate coding) neuron with r ∈ [ 0 , r max ] . � r max h ( r ) = − dr p ( r ) log p ( r ) 0 To maximise the marginal entropy, we add a Lagrange multiplier ( µ ) to enforce normalisation and then differentiate � � r max � � − log p ( r ) − 1 − µ δ r ∈ [ 0 , r max ] h ( r ) − µ p ( r ) = δ p ( r ) 0 otherwise 0 ⇒ p ( r ) = const for r ∈ [ 0 , r max ]
Recommend
More recommend