Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics - - PowerPoint PPT Presentation

gatsby theoretical neuroscience lectures non gaussian
SMART_READER_LITE
LIVE PREVIEW

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics - - PowerPoint PPT Presentation

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Aapo Hyv


slide-1
SLIDE 1

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Aapo Hyv¨ arinen

Gatsby Unit University College London

27 Feb 2017

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-2
SLIDE 2

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Outline

◮ Part I: Theory of ICA

◮ Definition and difference to PCA ◮ Importance of non-Gaussianity

◮ Part II: Natural images and ICA

◮ Application of ICA and sparse coding on natural images ◮ Extensions of ICA with dependent components

◮ Part III: Estimation of unnormalized models

◮ Motivation by extensions of ICA ◮ Score matching ◮ Noise-contrastive estimation

◮ Part IV: Recent extensions of ICA and natural image statistics

◮ A three-layer model, towards deep learning Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-3
SLIDE 3

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Part I: Theory of ICA

◮ Definition of ICA as non-Gaussian generative model ◮ Importance of non-Gaussianity ◮ Fundamental difference to PCA ◮ Estimation by maximization of non-Gaussianity ◮ Measures of non-Gaussianity

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-4
SLIDE 4

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Problem of blind source separation

There is a number of “source signals”: Due to some external circumstances, only linear mixtures of the source signals are observed. Estimate (separate) original signals!

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-5
SLIDE 5

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

A solution is possible

PCA does not recover original signals

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-6
SLIDE 6

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

A solution is possible

PCA does not recover original signals Use information on statistical independence to recover:

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-7
SLIDE 7

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Independent Component Analysis

(H´ erault and Jutten, 1984-1991)

◮ Observed random variables xi are modelled as linear sums of

hidden variables: xi =

m

  • j=1

aijsj, i = 1...n (1)

◮ Mathematical formulation of blind source separation problem ◮ Not unlike factor analysis ◮ Matrix of aij is parameter matrix, called “mixing matrix”. ◮ The si are hidden random variables

called “independent components”, or “source signals”

◮ Problem: Estimate both aij and sj, observing only xi.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-8
SLIDE 8

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

When can the ICA model be estimated?

◮ Must assume:

◮ The si are mutually statistically independent ◮ The si are nongaussian (non-normal) ◮ (Optional:) Number of independent components is equal to

number of observed variables

◮ Then: mixing matrix and components can be identified

(Comon, 1994) A very surprising result!

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-9
SLIDE 9

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Reminder: Principal component analysis

◮ Basic idea: find directions i wixi of maximum variance ◮ We must constrain the norm of w: i w2 i = 1, otherwise

solution is that wi are infinite.

◮ For more than one component, find direction of max var

  • rthogonal to components previously found.

◮ Classic factor analysis has essentially same idea as in PCA:

explain maximal variance with limited number of components

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-10
SLIDE 10

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Comparison of ICA, PCA, and factor analysis

◮ In contrast to PCA and factor analysis, components really give

the original source signals or underlying hidden variables

◮ In PCA and factor analysis, only a subspace is properly

determined (although an arbitrary basis is given as output)

◮ Catch: ICA only works when components are nongaussian

◮ Many psychological or social-science hidden variables (e.g.

“intelligence”) may be (practically) gaussian because sum of many independent variables (central limit theorem).

◮ But signals measured by sensors are usually quite nongaussian Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-11
SLIDE 11

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Some examples of nongaussianity

1 2 3 4 5 6 7 8 9 10 −2 −1.5 −1 −0.5 0.5 1 1.5 2 1 2 3 4 5 6 7 8 9 10 −4 −3 −2 −1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 −6 −4 −2 2 4 6 −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −4 −3 −2 −1 1 2 3 4 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −6 −4 −2 2 4 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-12
SLIDE 12

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Why classic methods cannot find original components or sources

◮ In PCA and FA: find components yi which are uncorrelated

cov(yi, yj) = E{yiyj} − E{yi}E{yj} = 0 (2) and maximize explained variance (or variance of components)

◮ Such methods need only the covariances, cov(xi, xj) ◮ However, there are many different component sets that are

uncorrelated, because

◮ The number of covariances is ≈ n2/2 due to symmetry ◮ So, we cannot solve the n2 mixing coeffs, not enough

information! (“More variables than equations”)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-13
SLIDE 13

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Nongaussianity, with independence, gives more information

◮ For independent variables we have

E{h1(y1)h2(y2)} − E{h1(y1)}E{h2(y2)} = 0. (3)

◮ For nongaussian variables, nonlinear covariances give more

information than just covariances.

◮ This is not true for multivariate gaussian distribution

◮ Distribution is completely determined by covariances ◮ Uncorrelated gaussian variables are independent

⇒ ICA model cannot be estimated for gaussian data.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-14
SLIDE 14

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Whitening as preprocessing for ICA

◮ Whitening is usually done before ICA ◮ Whitening means decorrelation and standardization,

E{xxT } = I.

◮ After whitening, A can be considered orthogonal.

E{xxT } = I = AE{ssT }AT = AAT (4)

◮ Half of parameters estimated!

(and other technical benefits)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-15
SLIDE 15

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Illustration

Two components with uniform distributions: Original components, observed mixtures, PCA, ICA PCA does not find original coordinates, ICA does!

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-16
SLIDE 16

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Blind source separation Linear generative model Comparison to PCA Identifiability by nongaussianity

Illustration of problem with gaussian distributions

Original components,

  • bserved mixtures,

PCA Distribution after PCA is the same as distribution before mixing! “Factor rotation problem” in classic factor analysis

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-17
SLIDE 17

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Basic intuitive principle of ICA estimation

◮ Inspired by the Central Limit Theorem:

◮ Average of many independent random variables will have a

distribution that is close(r) to gaussian

◮ In the limit of an infinite number of random variables, the

distribution tends to gaussian

◮ Consider a linear combination i wixi = i qisi ◮ Because of theorem, i qisi should be more gaussian than si. ◮ Maximizing the nongaussianity of i wixi, we can find si. ◮ Also known as projection pursuit. ◮ Cf. principal component analysis:

maximize variance of

i wixi.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-18
SLIDE 18

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Illustration of changes in nongaussianity

−4 −3 −2 −1 1 2 3 4 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Histogram and scatterplot, original uniform distributions

−4 −3 −2 −1 1 2 3 4 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Histogram and scatterplot, mixtures given by PCA

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-19
SLIDE 19

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Sparsity is the dominant form of non-Gaussianity

◮ In natural signals, fundamental non-Gaussianity is sparsity ◮ Sparsity = probability density has heavy tails and peak at zero:

−5 5

gaussian

−5 5

sparse

−3 −2 −1 1 2 3 0.5 1 1.5 2

◮ (Another form of non-Gaussianity is skewness or asymmetry)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-20
SLIDE 20

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Kurtosis as nongaussianity measure

◮ Problem: how to measure nongaussianity (sparsity)? ◮ Definition:

kurt(x) = E{x4} − 3(E{x2})2 (5)

◮ if variance constrained to unity, essentially 4th moment. ◮ Simple algebraic properties because it’s a cumulant:

kurt(s1 + s2) = kurt(s1) + kurt(s2) (6) kurt(αs1) = α4kurt(s1) (7)

◮ zero for gaussian RV, non-zero for most nongaussian RV’s. ◮ positive vs. negative kurtosis have typical forms of pdf. ◮ Variance must be constrained to measure non-Gaussianity

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-21
SLIDE 21

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Illustration of pos and neg kurtosis

Left: Laplacian pdf, positive kurt (“supergaussian”). Right: Uniform pdf, negative kurt (“subgaussian”).

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-22
SLIDE 22

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Why kurtosis is not optimal

◮ Sensitive to outliers:

Consider a sample of 1000 values with unit var, and one value equal to 10. Kurtosis equals at least 104/1000 − 3 = 7.

◮ For supergaussian variables, statistical performance not

  • ptimal even without outliers.

◮ Other measures of nongaussianity should be considered.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-23
SLIDE 23

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Differential entropy as nongaussianity measure

◮ Generalization of ordinary discrete Shannon entropy:

H(x) = E{− log p(x)} (8)

◮ for fixed variance , maximized by gaussian distribution. ◮ often normalized to give negentropy

J(x) = H(xgauss) − H(x) (9)

◮ Good statistical properties, but computationally difficult.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-24
SLIDE 24

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Approximation of negentropy

◮ Approximations of negentropy (Hyv¨

arinen, 1998):

JG(x) = (E{G(x)} − E{G(xgauss)})2 (10) where G is a nonquadratic function.

◮ Generalization of (square of) kurtosis (which is G(x) = x4). ◮ A good compromise?

◮ statistical properties not bad (for suitable choice of G) ◮ computationally simple

◮ Further possibility: If we know the data is sparse, negative of

L1 norm −E{|x|} may be enough.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-25
SLIDE 25

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Basic ICA estimation procedure

  • 1. Whiten the data to give z.
  • 2. Set iteration count i = 1.
  • 3. Take a random vector wi.
  • 4. Maximize nongaussianity of wT

i z,

under constraints wi2 = 1 and wT

i wj = 0, j < i

  • 5. increment iteration count by 1, go back to 3

Alternatively: maximize all the wi in parallel, keeping them

  • rthogonal.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-26
SLIDE 26

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Development of ICA algorithms

◮ Nongaussianity measure: Essential ingredient

◮ Kurtosis: global consistency, but nonrobust. ◮ Differential entropy: statistically justified, but difficult to

compute.

◮ Essentially same as likelihood (Pham et al, 1992/97) or

infomax (Bell and Sejnowski, 1995)

◮ Rough approximations of entropy: compromise

◮ Optimization methods

◮ Gradient methods (e.g. natural gradient; Amari et al, 1996) ◮ Fast fixed-point algorithm, FastICA (Hyv¨

arinen, 1999)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-27
SLIDE 27

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Maximization of non-Gaussianity Measures of non-Gaussianity

Conclusion: Theory of ICA

◮ ICA is a non-Gaussian linear generative model (factor analysis) ◮ Basic principle: maximize non-Gaussianity of components ◮ (Really very different from PCA: maximize variance of

components)

◮ Sparsity is a form of non-Gaussianity prevalent in natural

signals

◮ Measures of non-Gaussianity crucial: kurtosis vs. differential

entropy

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-28
SLIDE 28

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Part II: Natural images and ICA

◮ Natural images have statistical regularities ◮ Statistical models show optimal processing ◮ Basic model is independent component analysis ◮ Components are not really independent

⇒ Need and opportunity for better models

◮ Instead of nongaussianity we could use temporal correlations ◮ A unifying framework: bubbles

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-29
SLIDE 29

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Linear statistical models of images

= s1· + s2· + · · · + sk·

◮ Each image (patch) is a linear sum of basis vectors (features) ◮ What are the “best” basis vectors for natural images?

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-30
SLIDE 30

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

The visual cortex of the brain

LGN V1 retina

Receptive field of a simple cell in V1:

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-31
SLIDE 31

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences

Sparse coding

◮ Sparse coding means: For random vector x, find linear

representation: x = As (11) so that the components si are as sparse (=supergaussian) as possible.

◮ Important property: a given data point is represented using

  • nly a limited number of “active” (clearly non-zero)

components si.

◮ In contrast to PCA, the active components change from

image patch to another.

◮ Cf. vocabulary of a language which can describe many

different things by combining a small number of active words.

◮ Maximizes non-Gaussianity, therefore like ICA!

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-32
SLIDE 32

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

ICA / sparse coding of natural images

(Olshausen and Field, 1996; Bell and Sejnowski, 1997)

Features similar to wavelets, Gabor functions, simple cells.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-33
SLIDE 33

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Dependence of “independent” components

◮ Components estimated from natural images are not really

independent

◮ Next, we model some of the dependencies ◮ Independent subspaces + Topographic ICA

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-34
SLIDE 34

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Correlation of squares

◮ What kind of dependence remains between the components? ◮ Answer: Squares s2 i and s2 j are correlated inside a subspace

⇒ Dependence through variances

◮ Similar to the models by Simoncelli et al on wavelet

coefficients; Valpola et al on variance sources Two signals that are uncor- related but whose squares are correlated.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-35
SLIDE 35

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Grouping components (Cardoso, 1998; Hyv¨

arinen and Hoyer, 2000)

◮ Assumption: the si can be divided into groups (subspaces),

such that

◮ the si in the same group are dependent on each other ◮ dependencies between different groups are not allowed

◮ We also need to specify the distributions inside the groups,

achieving correlations of squares

◮ Invariant features given by norms of projections on the

subspaces ⇒ spherically symmetric inside subspaces

◮ Inside each subspace, minimize

  • k
  • i=1

(wT

i x)2

(12)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-36
SLIDE 36

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Computation of invariant features

I

Input

8

<w , I>

7

<w , I>

(.) 2 (.) 2

6

<w , I>

2

<w , I>

1

<w , I>

3

<w , I>

5

<w , I>

4

<w , I>

(.) 2 (.) 2 (.)

Σ

2 (.) 2 (.) 2 2 (.)

Σ

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-37
SLIDE 37

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Independent subspaces of natural images

Emergence of phase-invariance, as in complex cells in V1.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-38
SLIDE 38

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Topographic ICA (Hyv¨

arinen, Hoyer and Inki, 2001)

◮ Components are arranged on a two-dimensional lattice ◮ Statistical dependency follows topography:

The squares s2

i are correlated for near-by components ◮ Each local region is like a subspace

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-39
SLIDE 39

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Topographic ICA on natural images

Topography similar to what is found in the cortex.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-40
SLIDE 40

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Independent subspaces Topographic ICA

Temporally coherent components (Hurri and Hyv¨

arinen, 2003)

◮ In image sequences (video) we can look at the temporal

correlations

◮ An alternative to nongaussianity ◮ Linear correlations give only Fourier-like receptive fields ◮ We proposed temporal correlations of squares ◮ Similar to source separation using nonstationary variance

(Matsuoka et al, 1995)

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-41
SLIDE 41

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Temporal coherence Bubbles

Temporally coherent features on natural image sequences

Features similar to those obtained by ICA

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-42
SLIDE 42

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Temporal coherence Bubbles

Bubbles: a unifying framework

◮ Correlation of squares both over time and over components

⇒ spatiotemporal modulating variance variables

◮ Simple objective can be obtained where we pool squares over

space and time

T

  • t=1

n

  • j=1

G n

  • i=1
  • τ

h(i, j, τ)(wT

i x(t − τ))2

  • .

(13) where h(i, j, τ) is neighbourhood function, and G a nonlinear function.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-43
SLIDE 43

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Temporal coherence Bubbles

Illustration of four types of representation

time −> <− position of filter −>

sparse

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20

time −> <− position of filter −>

sparse topographic

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20

time −> <− position of filter −>

sparse temporally coherent

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20

time −> <− position of filter −>

bubbles

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis

slide-44
SLIDE 44

Definition of ICA Measures of nongaussianity Natural images, sparsity, ICA Independent subspaces and topography Image sequences Temporal coherence Bubbles

Conclusion: Natural images and ICA

◮ ICA is a non-Gaussian factor analysis ◮ Basic principle: maximize non-Gaussianity of components ◮ Measures of non-Gaussianity crucial: kurtosis vs. differential

entropy

◮ ICA and related models show optimal features for natural

images.

◮ ICA models basic linear features. ◮ Independent subspaces and topographic ICA model basic

dependencies or nonlinearities.

◮ Temporal coherence is an alternative approach, leading to

bubbles.

Aapo Hyv¨ arinen Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statis