Perception as Signal Processing October 16, 2018 What is theory - - PowerPoint PPT Presentation

perception as signal processing
SMART_READER_LITE
LIVE PREVIEW

Perception as Signal Processing October 16, 2018 What is theory - - PowerPoint PPT Presentation

Perception as Signal Processing October 16, 2018 What is theory for? To answer why? What is theory for? To answer why? There are two sorts of answer in the context of neuroscience. What is theory for? To answer why? There are two sorts of


slide-1
SLIDE 1

Perception as Signal Processing

October 16, 2018

slide-2
SLIDE 2

What is theory for?

To answer why?

slide-3
SLIDE 3

What is theory for?

To answer why? There are two sorts of answer in the context of neuroscience.

slide-4
SLIDE 4

What is theory for?

To answer why? There are two sorts of answer in the context of neuroscience. Constructive or mechanistic – why is the sky blue?

◮ provides a mechanistic understanding of observations ◮ links structure to function ◮ helps to codify, organise and relate experimental findings

slide-5
SLIDE 5

What is theory for?

To answer why? There are two sorts of answer in the context of neuroscience. Constructive or mechanistic – why is the sky blue?

◮ provides a mechanistic understanding of observations ◮ links structure to function ◮ helps to codify, organise and relate experimental findings

Normative or teleological – why do we see light between 390 to 700 nm?

◮ provides an understanding of the purpose of function ◮ only sensible in the context of evolutionary selection

slide-6
SLIDE 6

Sensation and Perception

Two dominant ways of thinking about sensory systems and perception. Signal processing – falls between normative and mechanistic

◮ a succession of filtering and feature-extraction stages that arrives at a ’detection’ or

’recognition’ output.

◮ dominated by feed-forward metaphors

◮ temporal processing often limited to integration ◮ some theories may incorporate local recurrence and also feedback for feature

selection or attention

◮ behavioural and neural theory is dominated by information-like quantities

Inference – strongly normative

◮ parse sensory input to work out the configuration of the world ◮ fundamental roles for lateral interaction, feedback and dynamical state ◮ behavioural theory is well understood and powerful; neural underpinnings are little

understood.

slide-7
SLIDE 7

Signal-processing paradigms

1

filtering

2

(efficient) coding

3

feature detection

slide-8
SLIDE 8

Signal-processing paradigms

1

filtering

2

(efficient) coding

3

feature detection

slide-9
SLIDE 9

The eye and retina

slide-10
SLIDE 10

Centre-surround receptive fields

slide-11
SLIDE 11

Centre-surround models

Centre-surround receptive fields are commonly described by one of two equations, giving the scaled response to a point of light shone at the retinal location (x, y). A difference-of-Gaussians (DoG) model: DDoG(x, y) = 1 2πσ2

c

exp

  • −(x − cx)2 + (y − cy)2

2σ2

c

1 2πσ2

s

exp

  • −(x − cx)2 + (y − cy)2

2σ2

s

−10 −5 5 10 −10 −5 5 10 −0.02 0.02 0.04 0.06 −10 −5 5 10 −0.01 0.01 0.02 0.03 0.04 0.05 0.06

slide-12
SLIDE 12

Centre-surround models

. . . or a Laplacian-of-Gaussian (LoG) model: DLoG(x, y) = −∇2

  • 1

2πσ2 exp

  • −(x − cx)2 + (y − cy)2

2σ2

  • −10

−5 5 10 −10 −5 5 10 −0.02 0.02 0.04 0.06 −10 −5 5 10 −0.01 0.01 0.02 0.03 0.04 0.05 0.06

slide-13
SLIDE 13

Linear receptive fields

The linear-like response apparent in the prototypical experiments can be generalised to give a predicted firing rate in response to an arbitrary stimulus s(x, y): r(cx, cy; s(x, y)) =

  • dx dy Dcx ,cy (x, y)s(x, y)

The receptive field centres (cx, cy) are distributed over visual space. If we let D() represent the RF function centred at 0, instead of at (cx, cy), we can write: r(cx, cy; s(x, y)) =

  • dx dy D(cx − x, cy − y)s(x, y)

which looks like a convolution.

slide-14
SLIDE 14

Transfer functions

Thus a repeated linear receptive field acts like a spatial filter, and can be characterised by its frequency-domain transfer function. (Indeed, much early visual processing is studied in terms

  • f linear systems theory.)

Transfer functions for both DoG and LoG centre-surround models are bandpass. Taking 1D versions:

fmax −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

centre Gaussian surround Gaussian difference frequency response

fmax 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Gaussian second derivative (ω2) product frequency response

This accentuates mid-range spatial frequencies.

slide-15
SLIDE 15

Transfer functions

slide-16
SLIDE 16

Edge detection

Bandpass filters emphasise edges:

  • rginal image

DoG responses thresholded

slide-17
SLIDE 17

Orientation selectivity

slide-18
SLIDE 18

Linear receptive fields – simple cells

Linear response encoding: r(t0, s(x, y, t)) =

  • dx dy s(x, y, t0 − τ)D(x, y, τ)

For separable receptive fields: D(x, y, τ) = Ds(x, y)Dt(τ) For simple cells: Ds = exp

  • −(x − cx)2

2σ2

x

− (y − cy)2

2σ2

y

  • cos(kx − φ)
slide-19
SLIDE 19

Linear response functions – simple cells

slide-20
SLIDE 20

Simple cell orientation selectivity

slide-21
SLIDE 21

2D Fourier Transforms

Again, the best way to look at a filter is in the frequency domain, but now we need a 2D transform. D(x, y) = exp

  • − x2

2σ2

x

− y2

2σ2

y

  • cos(kx)
  • D(ωx, ωy) =
  • dx dy e−iωx xe−iωy y exp
  • − x2

2σ2

x

− y2

2σ2

y

  • cos(kx − φ)

=

  • dx e−iωx xe−x2/2σ2

x cos(kx − φ) ·

  • dy e−iωy ye−y2/2σ2

y

= √

2πσx

  • e−σ2

x ω2 x /2 ◦ π[δ(ωx − k) + δ(ωx + k)]

2πσye−σ2

y ω2 y /2

= 2π2σxσy

  • e− 1

2 [(ωx −k)2σ2 x +ω2 y σ2 y ] + e− 1 2 [(ωx +k)2σ2 x +ω2 y σ2 y ]

Easy to read spatial frequency tuning, bandwidth; orientation tuning and (for homework) bandwidth.

slide-22
SLIDE 22

Drifting gratings

s(x, y, t) = G + A cos(kx − ωt − φ)

slide-23
SLIDE 23

Separable and inseparable response functions

Separable: motion sensitive; not direction sensitive Inseparable: motion sensitive; and direction sensitive

slide-24
SLIDE 24

Complex cells

Complex cells are sensitive to orientation, but, supposedly, not phase. One model might be (neglecting time) r(s(x, y)) =

  • dx dy s(x, y) exp
  • −(x − cx)2

2σ2

x

− (y − cy)2

2σ2

y

  • cos(kx)

2 +

  • dx dy s(x, y) exp
  • −(x − cx)2

2σ2

x

− (y − cy)2

2σ2

y

  • cos(kx − π/2)

2

But many cells do have some residual phase sensitivity. Quantified by (f1/f0 ratio). Stimulus-response functions (and constructive models) for complex cells are still a matter of debate.

slide-25
SLIDE 25

Other V1 responses: surround effects

slide-26
SLIDE 26

Other V1 responses

◮ end-stopping (hypercomplex) ◮ blobs and colour ◮ . . .

slide-27
SLIDE 27

Signal-processing paradigms

1

filtering

2

(efficient) coding

3

feature detection

slide-28
SLIDE 28

Information

What does a neural response tell us about a stimulus? Shannon theory:

◮ Entropy: bits needed to specify an exact stimulus. ◮ Conditional entropy: bits needed to specify the exact stimulus after we see the response. ◮ (Average mutual) information: the difference (infomation gained from the response) ◮ Mutual information is bounded by the entropy of the response ⇒ maximum entropy

encoding and decorrelation. Discrimination theory:

◮ How accurately (squared-error) can the stimulus be estimated from the response. ◮ Cram´

er-Rao bound relates this to the Fisher Information – a differential measure of how much the response distribution changes with the stimulus.

◮ Fisher information can often be optimised directly.

Linked by rate-distortion theory and by aymptotic (large population) arguments.

slide-29
SLIDE 29

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy
slide-30
SLIDE 30

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy

If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H

  • S
slide-31
SLIDE 31

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy

If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H

  • S
  • Consider a (rate coding) neuron with r ∈ [0, rmax].

h(r) = −

rmax

dr p(r) log p(r)

slide-32
SLIDE 32

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy

If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H

  • S
  • Consider a (rate coding) neuron with r ∈ [0, rmax].

h(r) = −

rmax

dr p(r) log p(r) To maximise the marginal entropy, we add a Lagrange multiplier (µ) to enforce normalisation and then differentiate

δ δp(r)

  • h(r) − µ

rmax

p(r)

  • =

− log p(r) − 1 − µ

r ∈ [0, rmax]

  • therwise
slide-33
SLIDE 33

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy

If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H

  • S
  • Consider a (rate coding) neuron with r ∈ [0, rmax].

h(r) = −

rmax

dr p(r) log p(r) To maximise the marginal entropy, we add a Lagrange multiplier (µ) to enforce normalisation and then differentiate

δ δp(r)

  • h(r) − µ

rmax

p(r)

  • =

− log p(r) − 1 − µ

r ∈ [0, rmax]

  • therwise

⇒ p(r) = const for r ∈ [0, rmax]

slide-34
SLIDE 34

Entropy maximisation

I[ S; R] = H[R] marginal entropy

H

  • R|

S

  • noise entropy

If noise is small and “constant” ⇒ maximise marginal entropy ⇒ maximise H

  • S
  • Consider a (rate coding) neuron with r ∈ [0, rmax].

h(r) = −

rmax

dr p(r) log p(r) To maximise the marginal entropy, we add a Lagrange multiplier (µ) to enforce normalisation and then differentiate

δ δp(r)

  • h(r) − µ

rmax

p(r)

  • =

− log p(r) − 1 − µ

r ∈ [0, rmax]

  • therwise

⇒ p(r) = const for r ∈ [0, rmax]

i.e. p(r) =

  • 1

rmax

r ∈ [0, rmax]

  • therwise
slide-35
SLIDE 35

Histogram Equalisation

Suppose r = ˜ s + η where η represents a (relatively small) source of noise. Consider deterministic encoding ˜ s = f(s). How do we ensure that p(r) = 1/rmax? 1 rmax = p(r) ≈ p(˜ s) = p(s) f ′(s)

⇒ f ′(s) = rmax p(s) ⇒ f(s) = rmax s

−∞

ds′ p(s′)

˜

s

−3 −2 −1 1 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

s

slide-36
SLIDE 36

Histogram Equalisation

Laughlin (1981)

slide-37
SLIDE 37

Decorrelation at the retina

Atick and Redlich (1992) argued that the retina decorrelates natural spatial statistics.

slide-38
SLIDE 38

Decorrelation at the retina

Atick and Redlich (1992) argued that the retina decorrelates natural spatial statistics. RGCs exhibit roughly linear (centre-surround) processing: ra − ra =

  • dx Ds(x − a)
  • filter

s(x)

  • stimulus
slide-39
SLIDE 39

Decorrelation at the retina

Atick and Redlich (1992) argued that the retina decorrelates natural spatial statistics. RGCs exhibit roughly linear (centre-surround) processing: ra − ra =

  • dx Ds(x − a)
  • filter

s(x)

  • stimulus

Therefore the correlation (covariance) between cells is Qr(a, b) =

  • dx dy Ds(x − a)Ds(y − b)s(x)s(y)
  • =
  • dx dy Ds(x − a)Ds(y − b) s(x)s(y)
  • Qs(x,y)
slide-40
SLIDE 40

Decorrelation at the retina

Atick and Redlich (1992) argued that the retina decorrelates natural spatial statistics. RGCs exhibit roughly linear (centre-surround) processing: ra − ra =

  • dx Ds(x − a)
  • filter

s(x)

  • stimulus

Therefore the correlation (covariance) between cells is Qr(a, b) =

  • dx dy Ds(x − a)Ds(y − b)s(x)s(y)
  • =
  • dx dy Ds(x − a)Ds(y − b) s(x)s(y)
  • Qs(x,y)

Using (spatial) stationarity, we can transform to the Fourier domain:

  • Qr(k) = |

Ds(k)|2 Qs(k)

slide-41
SLIDE 41

Decorrelation at the retina

Atick and Redlich (1992) argued that the retina decorrelates natural spatial statistics. RGCs exhibit roughly linear (centre-surround) processing: ra − ra =

  • dx Ds(x − a)
  • filter

s(x)

  • stimulus

Therefore the correlation (covariance) between cells is Qr(a, b) =

  • dx dy Ds(x − a)Ds(y − b)s(x)s(y)
  • =
  • dx dy Ds(x − a)Ds(y − b) s(x)s(y)
  • Qs(x,y)

Using (spatial) stationarity, we can transform to the Fourier domain:

  • Qr(k) = |

Ds(k)|2 Qs(k) and thus output decorrelation requires

|

Ds(k)|2 ∝ 1

  • Qs(k)
slide-42
SLIDE 42

Decorrelation at the retina

Spatial correlations of natural images fall off with f −2:

  • Qs(k) ∝

1

|k|2 + k2

and the optical filter of the eye introduces (crudely) a low-pass term ∝ e−α|k|. So decorrelation requires

|

Ds(k)|2 ∝ |k|2 + k 2 e−α|k|

slide-43
SLIDE 43

Decorrelation at the retina

Spatial correlations of natural images fall off with f −2:

  • Qs(k) ∝

1

|k|2 + k2

and the optical filter of the eye introduces (crudely) a low-pass term ∝ e−α|k|. So decorrelation requires

|

Ds(k)|2 ∝ |k|2 + k 2 e−α|k| But: not all input is signal.

slide-44
SLIDE 44

Decorrelation at the retina

Spatial correlations of natural images fall off with f −2:

  • Qs(k) ∝

1

|k|2 + k2

and the optical filter of the eye introduces (crudely) a low-pass term ∝ e−α|k|. So decorrelation requires

|

Ds(k)|2 ∝ |k|2 + k 2 e−α|k| But: not all input is signal. Photodetection introduces noise. Therefore, cascade linear filters: s + η −

− − − − →

ˆ

s −

− − − − →

Ds

r with

  • Dη(k) =
  • Qs(k)
  • Qs(k) +

Qη(k) (Wiener filter)

slide-45
SLIDE 45

Decorrelation at the retina

Spatial correlations of natural images fall off with f −2:

  • Qs(k) ∝

1

|k|2 + k2

and the optical filter of the eye introduces (crudely) a low-pass term ∝ e−α|k|. So decorrelation requires

|

Ds(k)|2 ∝ |k|2 + k 2 e−α|k| But: not all input is signal. Photodetection introduces noise. Therefore, cascade linear filters: s + η −

− − − − →

ˆ

s −

− − − − →

Ds

r with

  • Dη(k) =
  • Qs(k)
  • Qs(k) +

Qη(k) (Wiener filter) Thus the combined RGC filter is predicted to be:

|

Ds(k)| Dη(k) ∝

  • Qs(k)
  • Qs(k) +

Qη(k)

slide-46
SLIDE 46

Decorrelation at the retina

slide-47
SLIDE 47

Decorrelation at the retina

slide-48
SLIDE 48

Tuning curves

We often consider the way that the firing rate of a cell r represents a single (possibly multidimensional) stimulus value s: r = f(s). Even if s and r are embedded in time-series we assume:

  • 1. that coding is instantaneous (with a fixed lag),
  • 2. that r (and therefore s) is constant over a short time ∆.

The function f(s) is known as a tuning curve.

slide-49
SLIDE 49

Tuning curves

Commonly assumed mathematical forms for (1D) tuning curves:

  • Gaussian

r0 + rmax exp

  • − 1

2σ2 (x − xpref)2

  • (Thresholded) Ramp

r0 + Θ(x − xthr) rmax ρ · (x − xthr)

  • Cosine

r0 + rmax cos(θ − θpref)

  • Wrapped Gaussian

r0 + rmax

  • n

exp

  • − 1

2σ2 (θ − θpref − 2πn)2

  • von Mises (“circular Gaussian”)

r0 + rmax exp [κ cos(θ − θpref)]

  • periodic (grid)

f(s) = f1(sin(2πs/λ))

slide-50
SLIDE 50

Decoding – the Cricket cercal system

ra(s) = r max

a

[cos(θ − θa)]+ = r max

a

[cT

av]+

cT

1c2 = 0

c3 = −c1 c4 = −c2 So, writing ˜ ra = ra/r max

a

:

˜

r1 − ˜ r3

˜

r2 − ˜ r4

  • =
  • cT

1

cT

2

  • v

v = (c1c2)

˜

r1 − ˜ r3

˜

r2 − ˜ r4

  • = ˜

r1c1 − ˜ r3c3 + ˜ r2c2 − ˜ r4c4 =

  • a

˜

raca This is called population vector decoding.

slide-51
SLIDE 51

Motor cortex (simplified)

Cosine tuning, randomly distributed preferred directions. In general, population vector decoding works for

◮ cosine tuning ◮ cartesian or dense (tight) directions

But:

◮ is it optimal? ◮ does it generalise? (Gaussian tuning curves) ◮ how accurate is it?

slide-52
SLIDE 52

Measuring the potential quality of a representation

Consider a (one dimensional) stimulus that takes on continuous values (e.g. angle).

◮ contrast ◮ orientation ◮ motion direction ◮ movement speed

Suppose a neuron fires n spikes in response to stimulus s according to some distribution P(n|f(s)∆) Given an observation of n, how well can we estimate s?

slide-53
SLIDE 53

Cram´ er-Rao bound

Suppose the neural response can be described by a probability distribution P(r|s). The Fisher information measures how this distribution changes with s: J(s∗) = −

  • d2 log P(r|s)

ds2

  • s∗
  • s∗

= d log P(r|s)

ds

  • s∗

2

s∗

The Cram´ er-Rao bound states that for any N, any unbiased estimator ˆ s({ni}) of s will have the property that

s({ni}) − s∗)2

ni|s∗ ≥

1 J(s∗). Thus, Fisher Information gives a lower bound on the variance of any unbiased estimator.

[For estimators with bias b(s∗) = ˆ s({ni}) − s∗ the bound is:

s({ni}) − s∗)2

ni|s∗ ≥ (1+b′(s∗))2 J(s∗)

+ b2(s∗)] The Fisher Information is the most common tool to analyse optimality in populations.

slide-54
SLIDE 54

Fisher Info and tuning curves

n = r∆ + noise; r = f(s) ⇒ J(s∗) =

  • d

ds

  • s∗ log P(n|s)

2

s∗

slide-55
SLIDE 55

Fisher Info and tuning curves

n = r∆ + noise; r = f(s) ⇒ J(s∗) =

  • d

ds

  • s∗ log P(n|s)

2

s∗

=

  • d

dr∆

  • f(s∗) log P(n|r∆)∆f ′(s∗)

2

s∗

slide-56
SLIDE 56

Fisher Info and tuning curves

n = r∆ + noise; r = f(s) ⇒ J(s∗) =

  • d

ds

  • s∗ log P(n|s)

2

s∗

=

  • d

dr∆

  • f(s∗) log P(n|r∆)∆f ′(s∗)

2

s∗

= Jnoise(r∆)∆2f ′(s∗)2

slide-57
SLIDE 57

Fisher Info and tuning curves

n = r∆ + noise; r = f(s) ⇒ J(s∗) =

  • d

ds

  • s∗ log P(n|s)

2

s∗

=

  • d

dr∆

  • f(s∗) log P(n|r∆)∆f ′(s∗)

2

s∗

= Jnoise(r∆)∆2f ′(s∗)2

s firing rate / Fisher info f(s) J(s)

slide-58
SLIDE 58

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

slide-59
SLIDE 59

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

slide-60
SLIDE 60

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

slide-61
SLIDE 61

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

= (n − r ∗∆)2 (r ∗∆)2

  • s∗
slide-62
SLIDE 62

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

= (n − r ∗∆)2 (r ∗∆)2

  • s∗

=

r ∗∆

(r ∗∆)2

slide-63
SLIDE 63

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

= (n − r ∗∆)2 (r ∗∆)2

  • s∗

=

r ∗∆

(r ∗∆)2 =

1 r ∗∆

slide-64
SLIDE 64

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

= (n − r ∗∆)2 (r ∗∆)2

  • s∗

=

r ∗∆

(r ∗∆)2 =

1 r ∗∆

[not surprising!

r ∗∆ = n and V [n] = r ∗∆]

slide-65
SLIDE 65

Fisher info for Poisson neurons

For Poisson neurons P(n|r∆) = e−r∆

(r∆)n n!

so Jnoise[r∆] =

  • d

dr∆

  • r∗∆ log P(n|r∆)

2

s∗

=

  • d

dr∆

  • r∗∆ − r∆ + n log r∆ − log n!

2

s∗

=

  • − 1 + n/r ∗∆

2

s∗

= (n − r ∗∆)2 (r ∗∆)2

  • s∗

=

r ∗∆

(r ∗∆)2 =

1 r ∗∆

[not surprising!

r ∗∆ = n and V [n] = r ∗∆] and, referred back to the stimulus value: J[s∗] = f ′(s∗)2∆/f(s∗)

slide-66
SLIDE 66

Population Fisher Info

Fisher Informations for independent random variates add: Jn(s) =

  • − d2

ds2 log P(n|s)

  • =
  • − d2

ds2

  • a

log P(na|s)

  • =
  • a
  • − d2

ds2 log P(na|s)

  • =
  • a

Jna(s).

= ∆

  • a

f ′

a(s)2

fa(s) [for Poisson cells]

slide-67
SLIDE 67

Optimal tuning properties

A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce one such argument (from Zhang and Sejnowski, 1999). Consider a population of cells that codes the value of a D dimensional stimulus, s. Let the ath cell emit r spikes in an interval τ with probability distribution that is conditionally independent

  • f the other cells (given s) and has the form

Pa(r | s, τ) = S(r, f a(s), τ). Also let the tuning curve of the ath cell, f a(s), be circularly symmetric: f a(s) = F · φ

  • (ξa)2

; (ξa)2 =

D

  • i

(ξa

i )2;

ξa

i = si − ca i

σ ,

where F is a maximal rate and the function φ is monotically decreasing. The parameters ca and σ give the centre of the ath tuning curve and the (common) width.

slide-68
SLIDE 68

Optimal tuning properties

Now, the (ij)th term in the FI matrix for the ath cell is (by definition) Ja

ij (s) = E

∂ ∂si

log Pa(r | s, τ) ∂

∂sj

log Pa(r | s, τ)

  • Applying the chain rule repeatedly, we find that

∂ ∂si

log Pa(r | s, τ) = 1 S(r, f a(s), τ)

∂ ∂si

S(r, f a(s), τ)

= S(2)(r, f a(s), τ)

S(r, f a(s), τ)

∂ ∂si

f a(s) (where S(2) indicates differentiation with respect to the second argument)

= S(2)(r, f a(s), τ)

S(r, f a(s), τ) Fφ′

(ξa)2 ∂ ∂si

D

  • i

(ξa

i )2

= S(2)(r, f a(s), τ)

S(r, f a(s), τ) Fφ′

(ξa)2 2(si − ca

i )

(σa

i )2

slide-69
SLIDE 69

Optimal tuning properties

So, Ja

ij (s) = E

  • S(2)(r, f a(s), τ)

S(r, f a(s), τ)

2

4F 2

φ′ (ξa)22 (si − ca

i )(sj − ca j )

σ4 = Aφ

  • (ξa)2, F, τ

(si − ca

i )(sj − ca j )

σ4

where the function Aφ does not depend explicitly on σ.

slide-70
SLIDE 70

Optimal tuning properties

We assumed neurons were independent ⇒ Fisher information adds. Approximate by integral

  • ver the tuning curve centres, assuming uniform density η of neurons.

Jij(s) =

  • a

Ja

ij (s)

≈ +∞

−∞

dca

1 · · ·

+∞

−∞

dca

D ηJa ij (s)

= +∞

−∞

dca

1 · · ·

+∞

−∞

dca

D ηAφ

  • (ξa)2, F, τ

(si − ca

i )(sj − ca j )

σ4

Change variables: ca

i → ξa i

= +∞

−∞

σdξa

1 · · ·

+∞

−∞

σdξa

D ηAφ

  • (ξa)2, F, τ

ξa

i ξa j

σ2 = σD σ2 η +∞

−∞

dξa

1 · · ·

+∞

−∞

dξa

D Aφ

  • (ξa)2, F, τ
  • ξa

i ξa j

Now, if i = j, integral is odd in both ξa

i and ξa j , and thus vanishes. If i = j, then the integral

has some value D · Kφ(F, τ, D), independent of σ. Thus, Jii = σD−2ηDKφ(F, τ, D) and the total Fisher information is proportional to σD−2.

slide-71
SLIDE 71

Optimal tuning properties

Thus optimal tuning width depends on the stimulus dimension through the interplay of two effects: slope: f ′(s) ∝ σ−1 ⇒ Ja(s) ∝ σ−2 per cell number of cells: N(s) ∝ σD ⇒ J(s) ∝ σD−2 population

◮ D = 1

⇒ σ → 0 (although a lower limit is encountered when the tuning width falls below the

inter-cell spacing)

◮ D = 2

⇒ J independent of σ.

◮ D > 2

⇒ σ → ∞ (actual limit set by valid stimuli).

slide-72
SLIDE 72

Optimal tuning properties

Thus optimal tuning width depends on the stimulus dimension through the interplay of two effects: slope: f ′(s) ∝ σ−1 ⇒ Ja(s) ∝ σ−2 per cell number of cells: N(s) ∝ σD ⇒ J(s) ∝ σD−2 population

◮ D = 1

⇒ σ → 0 (although a lower limit is encountered when the tuning width falls below the

inter-cell spacing)

◮ D = 2

⇒ J independent of σ.

◮ D > 2

⇒ σ → ∞ (actual limit set by valid stimuli).

◮ If circular symmetry is relaxed to allow different scales in each dimension for different

cells then solution is a Cartesian code (narrow in one dimension, wide in others).

◮ Single-bump constraint is essential to analysis. Fisher information cannot address

ambiguity between bumps.

◮ Single coded value – analysing multiple values or distributions is more complex.

slide-73
SLIDE 73

Signal-processing paradigms

1

filtering

2

(efficient) coding

3

feature detection

slide-74
SLIDE 74

Feature detection and representation

slide-75
SLIDE 75

Trained network models

slide-76
SLIDE 76

Trained network models