SLIDE 1 The inconvenience of a single Universe.
J.-F. Cardoso, C.N.R.S. Institut d’Astrophysique de Paris
- The photons who come in cold
- A single Universe
Mathematics of Imaging Workshop #2 Institut Henri Poincaré, March 2019
SLIDE 2
.
The inconvenience of a single planet
.
SLIDE 3
The inconvenience of a single planet There is no planet B.
SLIDE 4
.
The inconvenience of a single Universe
.
SLIDE 5
.
The photons who come in cold: Big Bang theory
.
SLIDE 6
The Planck mission from the European Spatial Agency
SLIDE 7
SLIDE 8
Cosmic background (and pesky foregrounds)
SLIDE 9
Extracting the CMB from Planck frequency channels Color scale: hundreds of micro-Kelvins. Credits: ESA, FRB.
SLIDE 10 Multipole decomposition and angular frequencies
- A spherical field X(θ, φ) can be decomposed into ‘harmonic’ components:
X(θ, φ) =
X(ℓ)(θ, φ) [θ, φ] = [(co)latitude, longitude] called monopole, dipole, quadrupole, octopole, . . . , multipole, indexed by the (discrete) angular frequency, ℓ = 0, 1, 2, . . ., thusly: = + + + + + · · · X(θ, φ) = X(0)(θ, φ) + X(1)(θ, φ) + X(2)(θ, φ) + X(3)(θ, φ) + · · ·
- Projection onto the (2ℓ + 1)-dimensional spherically invariant subspaces.
- Distribution of energy across (angular) scales quantified by
the (empirical) angular spectrum:
def
= 1 2ℓ + 1
SLIDE 11 Fourier on the sphere: Spherical harmonic decomposition
- An ortho-basis for spherical fields: the spherical harmonics Yℓm(θ, φ):
X(θ, φ) =
aℓm Yℓm(θ, φ) ← → aℓm =
- θ
- φ Yℓm(θ, φ) X(θ, φ)
- Multipole decomposition and angular spectrum:
X(ℓ)(θ, φ) =
m=−ℓ
aℓm Yℓm(θ, φ)
1 2ℓ + 1
m=−ℓ
a2
ℓm
SLIDE 12 Angular spectrum of the CMB (as measured/fitted by W-MAP)
- Large scales dominate. We plot:
D(ℓ) = C(ℓ) × ℓ(ℓ + 1)/2π
- Acoustic peaks !!!
- One Universe: cosmic variance.
If
1 2ℓ + 1
a2
ℓm,
then Var Cℓ/E Cℓ
2 2ℓ + 1.
SLIDE 13 Theoretical angular spectrum of the CMB A cosmological model predicts the angular spectrum
the CMB as a function of “cosmo- logical parameters”. Examples of the dependence of the spectrum on some parame- ters of the Λ − CDM model.
SLIDE 14 Angular spectrum and likelihood (ideally)
- The spherical harmonic coefficients aℓm of a stationary random field
are uncorrelated with variance Cℓ, defining the angular power spectrum: E
- aℓm aℓ′m′
- = Cℓ δℓℓ′ δmm′
- Thus, for a stationary Gaussian field, the empirical spectrum
- Cℓ =
1 2ℓ + 1
m=−ℓ
a2
ℓm
is a sufficient statistic since the likelihood then reads: −2 log P(X|{Cℓ}) =
(2ℓ + 1) Cℓ Cℓ + log Cℓ
- + cst
- Also reads like a spectral mismatch:
Cℓ Cℓ + log Cℓ
Cℓ Cℓ
k(u) def = u − log u − 1 ≥ 0
SLIDE 15
.
Extracting the CMB
.
SLIDE 16
Extracting the CMB from Planck frequency channels How to do it?
SLIDE 17 Some requirements for producing a CMB map
- The method should be robust, accurate and high SNR (obviously).
Special features: data set is expensive and there is ground truth.
- The method should be linear in the data:
- 1. It is critical not to introduce non Gaussianity
- 2. Propagation of simulated individual inputs should be straightforward
- The method should be able to support wide dynamical ranges, over the sky,
- ver angular frequencies, across channel frequencies.
- (Wo.manpower behind the method should be aplenty).
SLIDE 18 Wide dynamics over the sky Left: The W-MAP K band. Natural color scale [-200, 130000] µK. Middle: Same map with an equalized color scale. Right: Same map with a color scale adapted to CMB: [-300, 300] µK. Average power as a function of latitude
- n a log scale for the same map.
SLIDE 19 Wide spectral dynamics, SNR variations
10-6 10-4 10-2 100 102 104 1000 2000 30 GHz 44 GHz 70 GHz 100 GHz 143 GHz 217 GHz 353 GHz 545 GHz 857 GHz
S & N angular spectra in Planck channels (unbeamed) for fsky = 0.40.
SLIDE 20
And what about the noise ? Local RMS (µK) of the noise in a reconstructed CMB map.
SLIDE 21
Foregrounds CMB Gal clusters Gal clusters Galaxies Free-free Synchrotron Dust Noise Various foreground emissions (both galactic and extra-galactic) pile up in front of the CMB. But they do so additively ! Even better, most scale rigidly with frequency: each frequency channel sees a different mixture of each astrophysical emission:
d =
d30 . . . d857
= As + n
Such a linear mixture can be inverted . . . if the mixing matrix A is known. How to find it or do without it ? 1 Trust astrophysics and use parametric models, or 2 Trust your data and the power of statistics.
SLIDE 22 Three contamination models based on matrices (or lack thereof)
1 Nine Planck channels modeled as noisy linear mixtures of CMB and 6 (say) “foregrounds”
d1 d2 . . . . . . d9
=
a1 a2 . . . . . . a9 F11 . . . F16 F21 . . . F26 . . . . . . . . . . . . . . . . . . F91 . . . F96
×
s f1 . . . f6
+
n1 n2 . . . . . . n9
d = [ a | F ]
f
2 Interesting limiting case: maximal foreground, no noise, that is, Planck channels modeled as linear mixtures of CMB and 9 − 1 = 8 “foregrounds”
d1 d2 . . . . . . d9
=
a1 a2 . . . . . . a9 F11 . . . . . . F18 F21 . . . . . . F28 . . . . . . . . . . . . . . . . . . . . . . . . F91 . . . . . . F98
×
s f1 f2 . . . f8
d = [ a | F ]
f
- 3 No foreground/noise model at all, but an unstructured contaminant g:
d = as + g.
− A linear combination s =
i widi = w†d of the input channels gives
unbiased CMB if w†a = 1 and perfect foreground rejection if w†F = 0. Actually, we are after Span(F), the foreground subspace.
SLIDE 23 Four CMB maps in Planck releases NILC SEVEM SMICA Commander Wavelet space Pixel+Harmonic Harmonic space Pixel space
d = a s + g d = a s + g d = [a|F]
f
d = [a|F(θ)]
f
- + n
- Various filtering schemes (space-dependent, multipole-dependent, or both):
− NILC : Needlet (spherical wavelet) domain ILC. − SEVEM : Pixel domain, internal template fitting. − SMICA : Harmonic domain, ML approach, foreground subspace. − Commander : Pixel domain, Bayesian method with physical foreground models.
SLIDE 24 The SMICA method The CMB is statistically independent of the other components. SMICA = SM + ICA = Spectral Matching Independent Component Analysis Do ICA: Blind estimation of the linear model d = [ a | F ]
f
via spectral matching: if all fields are modeled as Gaussian stationary, then the likelihood is a mismatch between empirical and theoretical spectra. Q: How dare you do that on sky maps so blatantly non Gaussian/non stationary?
SLIDE 25 Combining all 9 Planck channels, non parametrically: the ILC 1/ Independent contamination model: Stack the 9 Planck channels into a 9 × 1 data vector d = [d30, d44, . . . , d545, d857]†
d(p) = a s(p) + g(p)
p = 1, . . . , Npix 2/ Linear combination: Estimate the CMB signal s(p) in pixel p by weighting the inputs:
Idea: The variance of independent variables add up. Hence. . . Minimize the pixel average (w†d)2p subject to w†a = 1, yielding
w =
a† C−1 a
with
the sample covariance matrix. Coined ILC (Internal Linear Combination) by astrophysicists, a.k.a. MVBF (Min. Variance Beam Former) in array processing, BLUE elsewhere. .
SLIDE 26 Is the ILC good enough for Planck data ? ILC looks good: linear, unbiased, min. MSE, very blind, very few assumptions: knowing a (calibration) and the CMB uncorrelated from the rest (very true). However, a simulation result shows poor quality: ← − ILC map
a ±300µK color scale Error
a ±50µK color scale − → Two things, at least, need fixing:
- harmonic dependence and
- chance correlations.
SLIDE 27 Linear filtering in harmonic space Since resolution, noise and foregrounds vary (wildly) in power over channels and angular frequency, the combining weights should depend on ℓ. Harmonic ILC (Tegmark): CMB map synthesized from spherical harmonic coefficients ˆ sℓm, obtained as linear combinations: ˆ sℓm = w†
ℓ dℓm
with, again,
wℓ = C−1
ℓ
a a†C−1
ℓ
a Cℓ = Cov(dℓm)
- At high ℓ, (spectral) matrices Cℓ well
estimated by
Cℓ =
m dℓmd† ℓm/ 2ℓ + 1.
- At lower ℓ, we need to get smarter to
fight chance correlation.
500 1000 1500 2000 2500 3000 3500
ℓ
3 2 1 1 2 3
wi(ℓ) µK/µKRJ
i
044 070 100 143 217 353 545 857
217 353 143 857 545 100
SLIDE 28 ILC coefficients: raw (thin lines) and via SMICA modelling (thick lines) At low ℓ, the empirical covariance matrices
1 2ℓ + 1
dℓmd†
ℓm
do not average over enough modes. Chance correlation hurts.
SLIDE 29 Chance correlation: the single Universe problem Simplest illustrative example:
d =
d2
f′
f′
C = dd† and the ILC estimate will yield
C−1 d a† C−1 a
= d1 − d1d2 d2
2
· d2 = s − sf′ f′2 · f′
+ α
f′2 · f′
What is hitting us harder: chance correlation or non-rigid scaling ? The error due to chance correlation is independent of α. Same error whether α = 0 or α = 1 or α ≫ 1. A mixing model implies f = f′, i.e. rigid scaling. In such a model, chance correlation (a.k.a. ILC bias) dominates the error.
SLIDE 30
In the square noise-free case: d =
s
f
- , the sample covariance is:
dd† =
s2 sf† fs ff†
a F
†
In the absence of chance correlation: sf = 0, linear algebra says:
a†dd†−1F = 0
so that the ILC filter w ∝ dd†−1a would ensure w†F = 0, that is, perfect orthogonality to the foreground subspace. If the sample average sf is not identical to the ensemble average Esf = 0 (chance correlation), then contamination is unavoidable. High accuracy CMB requires good determination of the foreground subspace. How to do it with a single sky ?
SLIDE 31 SMICA and the foreground subspace model
- Planck channels modeled as noisy linear mixtures of CMB and 6 “foregrounds”:
d1 d2 . . . . . . d9
=
a1 a2 . . . . . . a9 F11 . . . F16 F21 . . . F26 . . . . . . . . . . . . . . . . . . F91 . . . F96
×
s f1 . . . f6
+
n1 n2 . . . . . . n9
dℓm = [ a | F ]
fℓm
i.e. foregrounds coherent enough to be confined to a low dimensional subspace, Span(F), but otherwise unconstrained: any spectrum, color, correlation. . . Cov(dℓm) = [ a | F ]
ℓ
Pℓ
σ2
1ℓ
. . . . . . ... . . . . . . σ2
9ℓ
= Cℓ(a, Ccmb
ℓ
, F, Pℓ, σ2
iℓ).
− A very blind model (all non-zero parameters are free!) but still identifiable.
- Only Span(F), the foreground subspace, is needed to suppress the foregrounds.
It is jointly determined by all multipoles involved in SMICA likelihood: Gaussian + stationary →
ℓ(2ℓ + 1)
CℓCℓ(θ)−1 + log det Cℓ(θ)
SLIDE 32 When the stationary Gaussian likelihood is OK Consider the noise-free, square case : d = [ a | F ]
f
For any matrix G, the ‘preprocessor’ T = [ a | G ]−1 ensures [ a | G ]−1 [ a | F ] =
α†X
X
- for some matrix X and vector α.
Applying T to data yields a (presumably) dirty CMB and n − 1 ‘templates’:
g
= Td =
X f
- Statistical models of foreground not needed: they are deterministically observed.
.
SLIDE 33 A likelihood pD(D|A). Two ingredients for a likelihood of d = A
f
P
f
- = Ps(s) · PF(f)
- 1) Statistical independence
and
g
= Td model =
X f
- 2) (Pre-processed) mixing model
. After just a little bit of work: p(d|A) = pS(y − α†g)
· pF(X−1g)| det(X)|−Npix
· |T|Npix
. Thus a maximum likelihood solution for the signal of interest is
α†g ˆ α = arg max
α
pS(y − α†g) and this value depends neither on g nor on the contamination model pF(·).
- Hence, in the high SNR limit, a statistical model is needed only
for the component of interest, the CMB, whenever a is known (calibration).
SLIDE 34 More explicitly Do we have a good probability model to use in ˆ α = arg max
α
pS(y − α†g) for the CMB ? Of course! It is trivial in harmonic space: −2 log pS(y − α†g) =
(yℓm − α†gℓm)2 Cℓ + cst The (trivial) solution corresponds to combining the inputs with weight vector ˆ α =
H a
a† C−1
H a
i.e. an ILC with
dℓmd†
ℓm/Cℓ
This is also the SMICA solution in the same context: chance correlation is optimally mitigated in the spectral domain.
SLIDE 35 Wisdom of the likelihood Two covariance matrices:
dℓmd†
ℓm/Cℓ
dℓmd†
ℓm
- The 1/Cℓ weight equalizes the variance of the harmonic modes of the CMB.
- The 1/Cℓ weight can be replaced with anything similar.
- Pixel-based covariance
CP dominated by a small number of effective modes.
SLIDE 36 Some orders of magnitude
- Multipole range 2 ≤ ℓ ≤ 25
- Galactic foregrounds with gℓ = (2ℓ + 1) ℓ−2.4
Variance decreases by a factor 6.60 with respect to pixel average if optimal weighting wℓ = 1/Cℓ is used. It’s like having 5.60 extra Universes to average over.
- Variance decreases by a factor 6.55 with respect to pixel average
if suboptimal weighting wℓ = ℓ2 is used.
SLIDE 37
Conclusions Trying to make the best out of a single Universe realization ICA: Key idea about the foregrounds : one subspace to rule them all (out). Robustness by doing without a complete foreground model (subspace only). Can be made not too naive statistically for CMB extraction: spectral matching. Targetting the CMB makes modeling foreground distribution irrelevant in the high SNR (large scales, low ℓ) limit. Future: data-driven foreground models when the SNR is not so great.