blackboard

waynehu

Professor, Department of Astronomy and Astrophysics
University of Chicago

Group Contact CV SnapShots
CMB Introduction '96   Intermediate '01   Polarization Intro '01   Cosmic Symphony '04   Polarization Primer '97   Review '02   Power Animations   Lensing   Power Prehistory   Legacy Material '96   PhD Thesis '95 Baryon Acoustic Oscillations Cosmic Shear Clusters
Transfer Function
Intro to Cosmology [243] Cosmology I [legacy 321] Cosmology II [321] Current Topics [282] Galaxies and Universe [242] Radiative Processes [305] Research Preparation [307] GR Perturbation Theory [408] CMB [448] Cosmic Acceleration [449]

Mapmaking

An experiment can be characterized by the data $d_t$ taken at many different times; a pointing matrix $P_{ti}$, relating the data timestream to the underlying signal at pixelized positions indexed by $i$, and a noise matrix $C_{d,tt'}$ characterizing the covariance of the noise in the timestream. A model for the data then is $d_t = P_{ti}\Theta_i + n_t$ (with implicit sum over the repeating index $i$); it is the sum of signal plus noise. Here $n_t$ is drawn from a distribution (often Gaussian) with mean zero and covariance $\langle n_tn_{t'} \rangle=C_{d,tt'}$. In its simplest form the pointing matrix ${\bf P}$ contains rows - which corresponds to a particular time - with all zeroes in it except for one column with a one (see Figure 5). That column corresponds to the particular pixel observed at the time of interest. Typically, a pixel will be scanned many times during an experiment, so a given column will have many ones in it, corresponding to the many times the pixel has been observed.

Given this model, a well-posed question is: what is the optimal estimator for the signal $\Theta_i$? i.e. what is the best way to construct a map? The answer stems from the likelihood function ${\cal L}$, defined as the probability of getting the data given the theory ${\cal L}\equiv
P[{\rm data} \vert {\rm theory}]$. In this case, the theory is the set of parameters $\Theta_i$,

\begin{displaymath}
{\cal L}_\Theta(d_t)
= {1\over (2\pi)^{N_t/2} \sqrt{\det {\b...
...-1}_{d,tt'}
\left(d_{t'} - P_{t'j} \Theta_j\right)\right] \,.
\end{displaymath} (26)

That is, the noise, the difference between the data and the modulated signal, is assumed to be Gaussian with covariance ${\bf C}_d$.

There are two important theorems useful in the construction of a map and more generally in each step of the data pipeline [Tegmark et al, 1997]. The first is Bayes' Theorem. In this context, it says that $P[\Theta_i\vert d_t]$, the probability that the temperatures are equal to $\Theta_i$ given the data, is proportional to the likelihood function times a prior $P(\Theta_i)$. Thus, with a uniform prior,

\begin{displaymath}
P[\Theta_i\vert d_t] \propto
P[d_t \vert \Theta_i] \equiv {\cal L}_\Theta(d_t) \,,\end{displaymath} (27)

with the normalization constant determined by requiring the integral of the probability over all $\Theta_i$ to be equal to one. The probability on the left is the one of interest. The most likely values of $\Theta_i$ therefore are those which maximize the likelihood function. Since the log of the likelihood function in question, Equation (26), is quadratic in the parameters $\Theta_i$, it is straightforward to find this maximum point. Differentiating the argument of the exponential with respect to $\Theta_i$ and setting to zero leads immediately to the estimator

\begin{displaymath}
\hat\Theta_i = C_{N,ij} P_{jt} C^{-1}_{d,tt'} d_{t'}\,,
\end{displaymath} (28)

where ${\bf C}_N\equiv ({\bf P}^{\rm tr} {\bf C}_{d}^{-1} {\bf P})^{-1}$. As the notation suggests, the mean of the estimator is equal to the actual $\Theta_i$ and the variance is equal to ${\bf C}_N$.

The second theorem states that this maximum likelihood estimator is also the minimum variance estimator. The Cramer-Rao inequality says no estimator can measure the $\Theta_i$ with errors smaller than the diagonal elements of ${\bf F}^{-1}$, where the Fisher matrix is defined as

\begin{displaymath}
F_{\Theta,ij} \equiv \left< - {\partial^2 \ln {\cal L}_\Theta \over \partial\Theta_i\partial\Theta_j} \right>
.
\end{displaymath} (29)

Inspection of Equation (26) shows that, in this case the Fisher matrix is precisely equal to $C_N^{-1}$. Therefore, the Cramer-Rao theorem implies that the estimator of Equation (28) is optimal: it has the smallest possible variance [Tegmark, 1997a]. No information is lost if the map is used in subsequent analysis instead of the timestream data, but huge factors of compression have been gained. For example, in the recent Boomerang experiment [Netterfield et al, 2001], the timestream contained $2\times 10^8$ numbers, while the map had only $57,000$ pixels. The map resulted in compression by a factor of $3500$.

There are numerous complications that must be dealt with in realistic applications of Equation (28). Perhaps the most difficult is estimation of ${\bf C}_d$, the timestream noise covariance. This typically must be done from the data itself [Ferreira & Jaffe, 2000,Stompor et al, 2001]. Even if ${\bf C}_d$ were known perfectly, evaluation of the map involves inverting ${\bf C}_d$, a process which scales as the number of raw data points cubed. For both of these problems, the assumed stationarity of $C_{d,tt'}$ (it depends only on $t-t'$) is of considerable utility. Iterative techniques to approximate matrix inversion can also assist in this process [Wright et al, 1996]. Another issue which has received much attention is the choice of pixelization. The community has converged on the Healpix pixelization scheme[*], now freely available.

Perhaps the most dangerous complication arises from astrophysical foregrounds, both within and from outside the Galaxy, the main ones being synchrotron, bremmsstrahlung, dust and point source emission. All of the main foregrounds have different spectral shapes than the blackbody shape of the CMB. Modern experiments typically observe at several different frequencies, so a well-posed question is: how can we best extract the CMB signal from the different frequency channels [Bouchet & Gispert, 1999]? The blackbody shape of the CMB relates the signal in all the channels, leaving one free parameter. Similarly, if the foreground shapes are known, each foreground comes with just one free parameter per pixel. A likelihood function for the data can again be written down and the best estimator for the CMB amplitude determined analytically. While in the absence of foregrounds, one would extract the CMB signal by weighting the frequency channels according to inverse noise, when foregrounds are present, the optimal combination of different frequency maps is a more clever weighting that subtracts out the foreground contribution [Dodelson, 1997]. One can do better if the pixel-to-pixel correlations of the foregrounds can also be modeled from power spectra [Tegmark & Efstathiou, 1996] or templates derived from external data.

This picture is complicated somewhat because the foreground shapes are not precisely known, varying across the sky, e.g. from a spatially varying dust temperature. This too can be modelled in the covariance and addressed in the likelihood [Tegmark, 1998,White, 1998]. The resulting cleaned CMB map is obviously noisier than if foregrounds were not around, but the multiple channels keep the degradation managable. For example, the errors on some cosmological parameters coming from Planck may degrade by almost a factor of ten as compared with the no-foreground case. However, many errors will not degrade at all, and even the degraded parameters will still be determined with unprecedented precision [Knox, 1999,Prunet et al, 2000,Tegmark et al, 2000].

Many foregrounds tend to be highly non-Gaussian and in particular well-localized in particular regions of the map. These pixels can be removed from the map as was done for the region around the galactic disk for COBE. This technique can also be highly effective against point sources. Indeed, even if there is only one frequency channel, external foreground templates set the form of the additional contributions to ${\bf C}_N$, which, when properly included, immunize the remaining operations in the data pipeline to such contaminants [Bond et al, 1998]. The same technique can be used with templates of residual systematics or constraints imposed on the data, from e.g. the removal of a dipole.


next up previous
Next: Bandpower Estimation Up: DATA ANALYSIS Previous: DATA ANALYSIS
Wayne Hu 2001-10-15