The inconvenience of a single Universe. J.-F. Cardoso, C.N.R.S. - - PowerPoint PPT Presentation

the inconvenience of a single universe
SMART_READER_LITE
LIVE PREVIEW

The inconvenience of a single Universe. J.-F. Cardoso, C.N.R.S. - - PowerPoint PPT Presentation

The inconvenience of a single Universe. J.-F. Cardoso, C.N.R.S. Institut dAstrophysique de Paris - The photons who come in cold - A single Universe Mathematics of Imaging Workshop #2 Institut Henri Poincar, March 2019 . The


slide-1
SLIDE 1

The inconvenience of a single Universe.

J.-F. Cardoso, C.N.R.S. Institut d’Astrophysique de Paris

  • The photons who come in cold
  • A single Universe

Mathematics of Imaging Workshop #2 Institut Henri Poincaré, March 2019

slide-2
SLIDE 2

.

The inconvenience of a single planet

.

slide-3
SLIDE 3

The inconvenience of a single planet There is no planet B.

slide-4
SLIDE 4

.

The inconvenience of a single Universe

.

slide-5
SLIDE 5

.

The photons who come in cold: Big Bang theory

.

slide-6
SLIDE 6

The Planck mission from the European Spatial Agency

slide-7
SLIDE 7
slide-8
SLIDE 8

Cosmic background (and pesky foregrounds)

slide-9
SLIDE 9

Extracting the CMB from Planck frequency channels Color scale: hundreds of micro-Kelvins. Credits: ESA, FRB.

slide-10
SLIDE 10

Multipole decomposition and angular frequencies

  • A spherical field X(θ, φ) can be decomposed into ‘harmonic’ components:

X(θ, φ) =

  • ℓ≥0

X(ℓ)(θ, φ) [θ, φ] = [(co)latitude, longitude] called monopole, dipole, quadrupole, octopole, . . . , multipole, indexed by the (discrete) angular frequency, ℓ = 0, 1, 2, . . ., thusly: = + + + + + · · · X(θ, φ) = X(0)(θ, φ) + X(1)(θ, φ) + X(2)(θ, φ) + X(3)(θ, φ) + · · ·

  • Projection onto the (2ℓ + 1)-dimensional spherically invariant subspaces.
  • Distribution of energy across (angular) scales quantified by

the (empirical) angular spectrum:

  • Cℓ

def

= 1 2ℓ + 1

  • S2 X(ℓ)(θ, φ)2.
slide-11
SLIDE 11

Fourier on the sphere: Spherical harmonic decomposition

  • An ortho-basis for spherical fields: the spherical harmonics Yℓm(θ, φ):

X(θ, φ) =

  • ℓ≥0
  • −ℓ≤m≤ℓ

aℓm Yℓm(θ, φ) ← → aℓm =

  • θ
  • φ Yℓm(θ, φ) X(θ, φ)
  • Multipole decomposition and angular spectrum:

X(ℓ)(θ, φ) =

m=−ℓ

  • m=ℓ

aℓm Yℓm(θ, φ)

  • Cℓ =

1 2ℓ + 1

m=−ℓ

  • m=ℓ

a2

ℓm

slide-12
SLIDE 12

Angular spectrum of the CMB (as measured/fitted by W-MAP)

  • Large scales dominate. We plot:

D(ℓ) = C(ℓ) × ℓ(ℓ + 1)/2π

  • Acoustic peaks !!!
  • One Universe: cosmic variance.

If

  • Cℓ =

1 2ℓ + 1

  • −ℓ≤m≤ℓ

a2

ℓm,

then Var Cℓ/E Cℓ

  • =

2 2ℓ + 1.

slide-13
SLIDE 13

Theoretical angular spectrum of the CMB A cosmological model predicts the angular spectrum

  • f

the CMB as a function of “cosmo- logical parameters”. Examples of the dependence of the spectrum on some parame- ters of the Λ − CDM model.

slide-14
SLIDE 14

Angular spectrum and likelihood (ideally)

  • The spherical harmonic coefficients aℓm of a stationary random field

are uncorrelated with variance Cℓ, defining the angular power spectrum: E

  • aℓm aℓ′m′
  • = Cℓ δℓℓ′ δmm′
  • Thus, for a stationary Gaussian field, the empirical spectrum
  • Cℓ =

1 2ℓ + 1

m=−ℓ

  • m=ℓ

a2

ℓm

is a sufficient statistic since the likelihood then reads: −2 log P(X|{Cℓ}) =

  • ℓ≥0

(2ℓ + 1) Cℓ Cℓ + log Cℓ

  • + cst
  • Also reads like a spectral mismatch:

Cℓ Cℓ + log Cℓ

  • = k

Cℓ Cℓ

  • + cst’

k(u) def = u − log u − 1 ≥ 0

slide-15
SLIDE 15

.

Extracting the CMB

.

slide-16
SLIDE 16

Extracting the CMB from Planck frequency channels How to do it?

slide-17
SLIDE 17

Some requirements for producing a CMB map

  • The method should be robust, accurate and high SNR (obviously).

Special features: data set is expensive and there is ground truth.

  • The method should be linear in the data:
  • 1. It is critical not to introduce non Gaussianity
  • 2. Propagation of simulated individual inputs should be straightforward
  • The method should be able to support wide dynamical ranges, over the sky,
  • ver angular frequencies, across channel frequencies.
  • (Wo.manpower behind the method should be aplenty).
slide-18
SLIDE 18

Wide dynamics over the sky Left: The W-MAP K band. Natural color scale [-200, 130000] µK. Middle: Same map with an equalized color scale. Right: Same map with a color scale adapted to CMB: [-300, 300] µK. Average power as a function of latitude

  • n a log scale for the same map.
slide-19
SLIDE 19

Wide spectral dynamics, SNR variations

10-6 10-4 10-2 100 102 104 1000 2000 30 GHz 44 GHz 70 GHz 100 GHz 143 GHz 217 GHz 353 GHz 545 GHz 857 GHz

S & N angular spectra in Planck channels (unbeamed) for fsky = 0.40.

slide-20
SLIDE 20

And what about the noise ? Local RMS (µK) of the noise in a reconstructed CMB map.

slide-21
SLIDE 21

Foregrounds CMB Gal clusters Gal clusters Galaxies Free-free Synchrotron Dust Noise Various foreground emissions (both galactic and extra-galactic) pile up in front of the CMB. But they do so additively ! Even better, most scale rigidly with frequency: each frequency channel sees a different mixture of each astrophysical emission:

d =

  

d30 . . . d857

   = As + n

Such a linear mixture can be inverted . . . if the mixing matrix A is known. How to find it or do without it ? 1 Trust astrophysics and use parametric models, or 2 Trust your data and the power of statistics.

slide-22
SLIDE 22

Three contamination models based on matrices (or lack thereof)

1 Nine Planck channels modeled as noisy linear mixtures of CMB and 6 (say) “foregrounds”

     

d1 d2 . . . . . . d9

     

=

     

a1 a2 . . . . . . a9 F11 . . . F16 F21 . . . F26 . . . . . . . . . . . . . . . . . . F91 . . . F96

     

×

   

s f1 . . . f6

    +      

n1 n2 . . . . . . n9

     

  • r

d = [ a | F ]

  • s

f

  • + n

2 Interesting limiting case: maximal foreground, no noise, that is, Planck channels modeled as linear mixtures of CMB and 9 − 1 = 8 “foregrounds”

     

d1 d2 . . . . . . d9

     

=

     

a1 a2 . . . . . . a9 F11 . . . . . . F18 F21 . . . . . . F28 . . . . . . . . . . . . . . . . . . . . . . . . F91 . . . . . . F98

     

×

    

s f1 f2 . . . f8

    

  • r

d = [ a | F ]

  • s

f

  • 3 No foreground/noise model at all, but an unstructured contaminant g:

d = as + g.

− A linear combination s =

i widi = w†d of the input channels gives

unbiased CMB if w†a = 1 and perfect foreground rejection if w†F = 0. Actually, we are after Span(F), the foreground subspace.

slide-23
SLIDE 23

Four CMB maps in Planck releases NILC SEVEM SMICA Commander Wavelet space Pixel+Harmonic Harmonic space Pixel space

d = a s + g d = a s + g d = [a|F]

  • s

f

  • + n

d = [a|F(θ)]

  • s

f

  • + n
  • Various filtering schemes (space-dependent, multipole-dependent, or both):

− NILC : Needlet (spherical wavelet) domain ILC. − SEVEM : Pixel domain, internal template fitting. − SMICA : Harmonic domain, ML approach, foreground subspace. − Commander : Pixel domain, Bayesian method with physical foreground models.

slide-24
SLIDE 24

The SMICA method The CMB is statistically independent of the other components. SMICA = SM + ICA = Spectral Matching Independent Component Analysis Do ICA: Blind estimation of the linear model d = [ a | F ]

  • s

f

  • + n,

via spectral matching: if all fields are modeled as Gaussian stationary, then the likelihood is a mismatch between empirical and theoretical spectra. Q: How dare you do that on sky maps so blatantly non Gaussian/non stationary?

slide-25
SLIDE 25

Combining all 9 Planck channels, non parametrically: the ILC 1/ Independent contamination model: Stack the 9 Planck channels into a 9 × 1 data vector d = [d30, d44, . . . , d545, d857]†

d(p) = a s(p) + g(p)

p = 1, . . . , Npix 2/ Linear combination: Estimate the CMB signal s(p) in pixel p by weighting the inputs:

  • s(p) = w†d(p)

Idea: The variance of independent variables add up. Hence. . . Minimize the pixel average (w†d)2p subject to w†a = 1, yielding

w =

  • C−1 a

a† C−1 a

with

  • C = dd†p,

the sample covariance matrix. Coined ILC (Internal Linear Combination) by astrophysicists, a.k.a. MVBF (Min. Variance Beam Former) in array processing, BLUE elsewhere. .

slide-26
SLIDE 26

Is the ILC good enough for Planck data ? ILC looks good: linear, unbiased, min. MSE, very blind, very few assumptions: knowing a (calibration) and the CMB uncorrelated from the rest (very true). However, a simulation result shows poor quality: ← − ILC map

  • n

a ±300µK color scale Error

  • n

a ±50µK color scale − → Two things, at least, need fixing:

  • harmonic dependence and
  • chance correlations.
slide-27
SLIDE 27

Linear filtering in harmonic space Since resolution, noise and foregrounds vary (wildly) in power over channels and angular frequency, the combining weights should depend on ℓ. Harmonic ILC (Tegmark): CMB map synthesized from spherical harmonic coefficients ˆ sℓm, obtained as linear combinations: ˆ sℓm = w†

ℓ dℓm

with, again,

wℓ = C−1

a a†C−1

a Cℓ = Cov(dℓm)

  • At high ℓ, (spectral) matrices Cℓ well

estimated by

Cℓ =

m dℓmd† ℓm/ 2ℓ + 1.

  • At lower ℓ, we need to get smarter to

fight chance correlation.

500 1000 1500 2000 2500 3000 3500

3 2 1 1 2 3

wi(ℓ) µK/µKRJ

i

  • 030

044 070 100 143 217 353 545 857

217 353 143 857 545 100

slide-28
SLIDE 28

ILC coefficients: raw (thin lines) and via SMICA modelling (thick lines) At low ℓ, the empirical covariance matrices

  • Cℓ =

1 2ℓ + 1

  • −ℓ≤m≤ℓ

dℓmd†

ℓm

do not average over enough modes. Chance correlation hurts.

slide-29
SLIDE 29

Chance correlation: the single Universe problem Simplest illustrative example:

d =

  • d1

d2

  • =
  • s + αf

f′

  • =
  • 1
  • s +
  • αf

f′

  • Form a covariance matrix

C = dd† and the ILC estimate will yield

  • s = a†

C−1 d a† C−1 a

= d1 − d1d2 d2

2

· d2 = s − sf′ f′2 · f′

  • Chance corr.

+ α

  • f − ff′

f′2 · f′

  • Non-rigid scaling

What is hitting us harder: chance correlation or non-rigid scaling ? The error due to chance correlation is independent of α. Same error whether α = 0 or α = 1 or α ≫ 1. A mixing model implies f = f′, i.e. rigid scaling. In such a model, chance correlation (a.k.a. ILC bias) dominates the error.

slide-30
SLIDE 30
  • Similarly. . .

In the square noise-free case: d =

  • a F

s

f

  • , the sample covariance is:

dd† =

  • a F

s2 sf† fs ff†

a F

In the absence of chance correlation: sf = 0, linear algebra says:

a†dd†−1F = 0

so that the ILC filter w ∝ dd†−1a would ensure w†F = 0, that is, perfect orthogonality to the foreground subspace. If the sample average sf is not identical to the ensemble average Esf = 0 (chance correlation), then contamination is unavoidable. High accuracy CMB requires good determination of the foreground subspace. How to do it with a single sky ?

slide-31
SLIDE 31

SMICA and the foreground subspace model

  • Planck channels modeled as noisy linear mixtures of CMB and 6 “foregrounds”:

     

d1 d2 . . . . . . d9

     

=

     

a1 a2 . . . . . . a9 F11 . . . F16 F21 . . . F26 . . . . . . . . . . . . . . . . . . F91 . . . F96

     

×

   

s f1 . . . f6

    +      

n1 n2 . . . . . . n9

     

  • r

dℓm = [ a | F ]

  • sℓm

fℓm

  • + nℓm

i.e. foregrounds coherent enough to be confined to a low dimensional subspace, Span(F), but otherwise unconstrained: any spectrum, color, correlation. . . Cov(dℓm) = [ a | F ]

  • Ccmb

Pℓ

  • [ a | F ]†+

  

σ2

1ℓ

. . . . . . ... . . . . . . σ2

9ℓ

   = Cℓ(a, Ccmb

, F, Pℓ, σ2

iℓ).

− A very blind model (all non-zero parameters are free!) but still identifiable.

  • Only Span(F), the foreground subspace, is needed to suppress the foregrounds.

It is jointly determined by all multipoles involved in SMICA likelihood: Gaussian + stationary →

ℓ(2ℓ + 1)

  • trace

CℓCℓ(θ)−1 + log det Cℓ(θ)

  • .
slide-32
SLIDE 32

When the stationary Gaussian likelihood is OK Consider the noise-free, square case : d = [ a | F ]

  • s

f

  • with known a.

For any matrix G, the ‘preprocessor’ T = [ a | G ]−1 ensures [ a | G ]−1 [ a | F ] =

  • 1

α†X

X

  • for some matrix X and vector α.

Applying T to data yields a (presumably) dirty CMB and n − 1 ‘templates’:

  • y

g

  • def

= Td =

  • s + α† Xf

X f

  • Statistical models of foreground not needed: they are deterministically observed.

.

slide-33
SLIDE 33

A likelihood pD(D|A). Two ingredients for a likelihood of d = A

  • s

f

  • :

P

  • s

f

  • = Ps(s) · PF(f)
  • 1) Statistical independence

and

  • y

g

  • def

= Td model =

  • s + α† Xf

X f

  • 2) (Pre-processed) mixing model

. After just a little bit of work: p(d|A) = pS(y − α†g)

  • CMB

· pF(X−1g)| det(X)|−Npix

  • foregrounds

· |T|Npix

  • Constant

. Thus a maximum likelihood solution for the signal of interest is

  • sML = y − ˆ

α†g ˆ α = arg max

α

pS(y − α†g) and this value depends neither on g nor on the contamination model pF(·).

  • Hence, in the high SNR limit, a statistical model is needed only

for the component of interest, the CMB, whenever a is known (calibration).

slide-34
SLIDE 34

More explicitly Do we have a good probability model to use in ˆ α = arg max

α

pS(y − α†g) for the CMB ? Of course! It is trivial in harmonic space: −2 log pS(y − α†g) =

  • m

(yℓm − α†gℓm)2 Cℓ + cst The (trivial) solution corresponds to combining the inputs with weight vector ˆ α =

  • C−1

H a

a† C−1

H a

i.e. an ILC with

  • CH =
  • m

dℓmd†

ℓm/Cℓ

This is also the SMICA solution in the same context: chance correlation is optimally mitigated in the spectral domain.

slide-35
SLIDE 35

Wisdom of the likelihood Two covariance matrices:

  • CH =
  • m

dℓmd†

ℓm/Cℓ

  • CP = dd†p, =
  • m

dℓmd†

ℓm

  • The 1/Cℓ weight equalizes the variance of the harmonic modes of the CMB.
  • The 1/Cℓ weight can be replaced with anything similar.
  • Pixel-based covariance

CP dominated by a small number of effective modes.

slide-36
SLIDE 36

Some orders of magnitude

  • Multipole range 2 ≤ ℓ ≤ 25
  • Galactic foregrounds with gℓ = (2ℓ + 1) ℓ−2.4

Variance decreases by a factor 6.60 with respect to pixel average if optimal weighting wℓ = 1/Cℓ is used. It’s like having 5.60 extra Universes to average over.

  • Variance decreases by a factor 6.55 with respect to pixel average

if suboptimal weighting wℓ = ℓ2 is used.

slide-37
SLIDE 37

Conclusions Trying to make the best out of a single Universe realization ICA: Key idea about the foregrounds : one subspace to rule them all (out). Robustness by doing without a complete foreground model (subspace only). Can be made not too naive statistically for CMB extraction: spectral matching. Targetting the CMB makes modeling foreground distribution irrelevant in the high SNR (large scales, low ℓ) limit. Future: data-driven foreground models when the SNR is not so great.