The very large CMB data sets that have begun arriving require new, innovative tools of analysis. The fundamental tool for analyzing CMB data - the likelihood function - has been used since the early days of anisotropy searches [Readhead et al, 1989,Bond et al, 1991,Dodelson & Jubas, 1993]. Brute force likelihood analyses [Tegmark & Bunn, 1995] were performed even on the relatively large COBE data set, with six thousand pixels in its map. Present data sets are a factor of ten larger, and this factor will soon get larger by yet another factor of a hundred. The brute force approach, the time for which scales as the number of pixels cubed, no longer suffices.
In response, analysts have devised a host of techniques that move beyond the early brute force approach. The simplicity of CMB physics - due to linearity - is mirrored in analysis by the apparent Gaussianity of both the signal and many sources of noise. In the Gaussian limit, optimal statistics are easy to identify. These compress the data so that all of the information is retained, but the subsequent analysis - because of the compression - becomes tractable.
The Gaussianity of the CMB is not shared by other cosmological systems since gravitational non-linearities turn an initially Gaussian distribution into a non-Gaussian one. Nontheless, many of the techniques devised to study the CMB have been proposed for studying: the 3D galaxy distribution [Tegmark et al, 1998], the 2D galaxy distribution [Efstathiou & Moody, 2001,Huterer et al, 2001] the Lyman alpha forest [Hui et al, 2001], the shear field from weak lensing [Hu & White, 2001], among others. Indeed, these techniques are now indispensible, powerful tools for all cosmologists, and we would be remiss not to at least outline them in a disussion of the CMB, the context in which many of them were developed.
Figure: Data pipeline and radical compression. Map are constructed for each frequency channel from the data timestreams, combined and cleaned of foreground contamination by spatial (represented here by excising the galaxy) and frequency information. Bandpowers are extracted from the maps and cosmological parameters from the bandpowers. Each step involves a substantial reduction in the number of parameters needed to describe the data, from potentially for the Planck satellite.
Figure 5 summarizes the path from the data analysis starting point, a timestream of data points, to the end, the determination of cosmological parameters. Preceding this starting point comes the calibration and the removal of systematic errors from the raw data, but being experiment specific, we do not attempt to cover such issues here.Each step radically compresses the data by reducing the number of parameters used to describe it. Although this data pipeline and our discussion below are focused on temperature anisotropies, similar steps have been elucidated for polarization [Bunn, 2001,Tegmark & de Oliveira-Costa, 2001,Lewis et al, 2001].