CosmoMC Readme

This version June 2008. Check the web page for the latest version.


Contents

See also the CosmoloGUI readme for information about how to make plots from samples using an easy-to-use graphical user interface.

Introduction

CosmoMC is a Fortran 90 Markov-Chain Monte-Carlo (MCMC) engine for exploring cosmological parameter space, together with code for analysing Monte-Carlo samples and importance sampling. The code does brute force (but accurate) theoretical matter power spectrum and Cl calculations with CAMB. See the paper for an introduction, descriptions, and typical results from some pre-WMAP data. It can also be compiled as a generic sampler without using any cosmology codes.

On a multi-processor machine you can start to get good results in a couple of hours. On single processors you'll need to set aside rather longer. You can also run on a cluster.

By default CosmoMC uses a simple Metropolis algorithm, but there are options for slice sampling and more powerful methods for exploring complicated distribution or using the fast/slow parameter sub-spaces. The program takes as inputs estimates of central values and posterior uncertainties of the various parameters. The proposal density can use information about parameter correlations from a supplied covariance matrix: use one if possible as it will significantly improve performance. There is an option to estimate the covariance about the best-fit point (though this doesn't work very reliably), and some covariance matrices are supplied for common sets of default base parameters. If you compile and run with MPI (to run across nodes in a cluster), there is an option to dynamically learn the proposal matrix from the covariance of the post-burn-in samples so far. The MPI option also allows you to terminate the computation automatically when a particular convergence criterion is matched. MPI is recommended.

There are two programs supplied cosmomc and getdist. The first does the actual Monte-Carlo and produces sets of .txt chain files and (optionally) .data output files (the binary .data files include the theoretical CMB power spectra etc.). The "getdist" program analyses the .txt files calculating statistics and outputs files for the requested 1D, 2D and 3D plots (and could be used independently of the main cosmomc program). The "cosmomc" program also does post processing on .data files, for example doing importance sampling with new data.

Please e-mail details of any bugs or enhancements to Antony Lewis. If you have any questions please ask in the CosmoCoffee computers and software forum. You can also read answers to other people's questions there.

Downloading and Compiling

If you don't want to use WMAP you can use Makefile_nowmap, then no need to link to cfitsio or the WMAP likelihood.

You will need a Fortran 90 (or higher) compiler - you can get free Intel Linux, G95 or GFortran compilers. Please let me know if you find compiler bugs and have specific fixes. You also need to link to LAPACK (for doing matrix diagonalization, etc) - you may need to edit the Makefile to specify where this on your system. Intel systems often use Intel's MKL.

Using Visual Fortran there's no need to use the Makefile, just open the project file in the source folder, and set params.ini as the program argument. For Compaq CVF do this under Project, Settings, Debug and set the working directory to ..\. Under Tools, Options, Directories set the include path to [cxml path]/CXML/INCLUDE and lib path to [cxml path]/CXML/LIB. Don't install the 6.6C3 update as it gives compiler errors (6.6B is fine). You then need to add cfitsio files to your project depending on where they are on your system.

To change the l_max which is used for generating Cls you'll need to edit the value in cmbtypes.f90, run "make clean" then "make" to rebuild everything. Note l_max should be 50 larger than the largest l which you need accurately. You can also change matter_power_lnzsteps, the number of redshifts at which matter power spectra are sampled.

The default code includes polarization. You can edit the num_cls parameter in cmbtypes.f90 to include just temperature (num_cls=1), TT, TE and EE (num_cls=3) or TT, TE, EE and BB (num_cls=4). You will need the last option if you are including tensors and using polarized data. You can use temperature-only datasets with num_cls 3 or 4, the only disadvantage being that it will be marginally slower, and the .data files with be substantially larger. For WMAP data you need num_cls = 3 or 4.

See BibTex file for relevant citations.

CosmoMC as a generic sampler

CosmoMC can also be compiled to sample any function you like, without calling any cosmology codes. Use the supplied Makefile_nowmap and set generic_mcmc = .true. in settings.f90. Also in settings.f90 set num_hard to the number of parameters you want to vary, and num_initpower, and num_norm to zero. Write your likelihood function as a function of the array of parameters in GenericLikelihoodFunction (calclike.f90). You don't need CFITSIO or WMAP code installed to do this, but you will still need to compile CAMB.

Running and input parameters

See the supplied params.ini file for a fairly self-explanatory list of input parameters and options. The file_root entry gives the root name for all files produced. Running using MPI on a cluster is recommended if possible as you can automatically handle convergence testing and stopping.


Input Parameters

Output files

Analysing samples and plotting

The getdist program analyses text files produced by the cosmomc program. These are in the format

    weight like param1 param2 param3 ...

The weight gives the number of samples (or importance weight) with these parameters. like gives -log(likelihood). The getdist program could be used completely independently of the cosmomc program.

Run getdist distparams.ini to process the chains specified in the parameter input file distparams.ini. This should be fairly self-explanatory, in the same format as the cosmomc parameter input file. Note that sigma_8 is only computed if you are including LSS data when generating the chain (as computing the matter power spectrum slows things down considerably; You can post-process to compute sigma8 if you like, see action=1 in the cosmomc input file).

GetDist Parameters

The .ini file comments should explain the other options.

Output Text Files

Plotting

Parameter labels are set in distparams.ini - if any are blank the parameter is ignored. You can also specify which parameters to plot, or if parameters are not specified for the 2D plots or the colour of the 3D plots getdist automatically works out the most correlated variables and uses them. The data files used by SuperMongo and MatLab are output to the plot_data directory.

Convergence diagnostics

The getdist program will output convergence diagnostics, both short summary information when getdist is run, and also more detailed information in the file_root.converge file. When running with MPI the first two of the parameters below can also be calculated when running the chains for comparison with a stopping criterion (see the .ini input file).
Differences between GetDist and MPI run-time statistics

GetDist will cut out ignore_rows from the beginning of each chain, then compute the R statistic using the last half of the remaining samples. The MPI run-time statistic uses the last half of all of the samples. In addition, GetDist will use all the parameters, including derived parameters. If a derived parameter has poor convergence this may show up when running GetDist but not when running the chain (however the eigenvalues of covariance of means is computed using only base parameters). The run-time values also use thinned samples (by default every one in ten), whereas GetDist will use all of them. GetDist will allow you to use only subsets of the chains.

Parameterizations

Performance of the MCMC can be improved by using parameters which have a close to Gaussian posterior distribution. The default parameters (which get implicit flat priors) are

  1. ombh2 - the physical baryon density
  2. omch2 - the physical dark matter density
  3. 100*theta - 100*(the ratio of the [approx] sound horizon to the angular diameter distance)
  4. tau - the optical depth
  5. omk - omega_K
  6. nufrac - the fraction of the dark matter energy in the form of massive neutrinos
  7. w - the (assumed constant) equation of state of the dark energy (taken to be quintessence)
  8. n_s - the scale spectral index
  9. n_t - the tensor spectral index
  10. n_run - the running of the scalar spectral index
  11. ln[10^10 A_s] - A_s is the primordial superhorizon power in the curvature perturbation on 0.05Mpc^{-1} scales (i.e. in this is an amplitude parameter)
  12. amp_ratio - the ratio A_t/A_s, where A_t is the primordial power in the transverse traceless part of the metric tensor
  13. A_SZ SZ template normalization (assumed to be independent of other parameters)

Parameters like H_0 and Omega_lambda are derived from the above. Using theta rather than H_0 is more efficient as it is much less correlated with other parameters. There is an implit prior 40 < H_0 < 100. The .txt chain files list derived parameters after the 13 base parameters. By default these are ΩΛ (14), Age/Gyr (15), Ωm (16), σ8 (17), zre (18), r10 (19) and H0 (20). r10 is the ratio of the tensor to scalar Cl at l=10.

Since the program uses a covariance matrix for the parameters, it knows about (or will learn about) linear combination degeneracies. In particular ln[10^10 A_s] - 2*tau is well constrained, since exp(-2tau)A_s determines the overall amplitude of the observed CMB anisotropy (thus the above parameterization explores the tau-A degeneracy efficiently). The supplied covariance matrix will do this even if you add new parameters.

Changing parameters does in principle change the results as each base parameter has a flat prior. However for well constrained parameters this effect is very small. In particular using theta rather than H_0 has a small effect on marginalized results.

The above parameterization does make use of some knowledge about the physics, in particular the (approximate) formula for the sound horizon. Also supplied is a params_H.f90 file which uses H_0,z_re and A_s instead of theta, tau and log(10^10 A_s) which is more generic. Though slower to converge, this may be useful if you want to play around with different extended models - just edit the Makefile to use params_H.f90 instead of params_CMB.f90. Sample input files and covariance matrix along with params_H.f90 are available here. Since the parameters have a different meaning in this parameterization, you should not try to mix .covmat (or other) files with those from the default parameterization. Note this file tends to get out of synch with the latest CosmoMC version.

Hard coded priors

The default installation hard codes a few priors, in some instances you may wish to edit these: There is no prior on the positivity of Omega_Lambda.

Data

The supplied CMB datasets that are used for computing the likelihood are given in *.dataset files in the data directory (these may not be up to date). These are in a standard .ini format, and contain the data points and errors, data name, calibration and beam uncertainties, and window file directory. Code for handling these is in cmbdata.f90. The WMAP data is handled separately as a special case. Various simple priors are encoded in calclike.f90.

There is also built-in support for 2dF and (few years old) supernovae observations. Adding new data sets should be quite straightforward - you are encouraged to donate anything you add to be used by everyone. See add-ons and datasets.

Programming

The most likely need to modify the code is to change l_max, num_cls, or matter_power_lnzsteps, all specified in cmbtypes.f90. To change the numbers of parameters you'll need to change the constants in settings.f90. Run "make clean" after changing settings before re-compiling. When adding just one additional parameter it's often easiest to re-interpret one of the default parameters rather than adding in new parameters.

You are encouraged to examine what the code is doing and consider carefully changes you may wish to make. For example, the results can depend on the parameterization. You may also want to use different CAMB modules, e.g. slow-roll parameters for inflation, or use a fast approximator. The main source code files you may want to modify are

Add-ons and extra datasets

Version History

Reference links

Probabilistic Inference Using Markov Chain Monte Carlo Methods

Information Theory, Inference and Learning Algorithms

MCMC Preprint Service

Raftery and Lewis convergence diagnostics

There are also some notes on the proposal density, fast and slow parameters, and slice sampling as used by CosmoMC. See also the BibTex file of CosmoMC references you should cite, along with some references of potential interest.

FAQ

  1. What are the dotted lines on the plots?
    Dotted lines are mean likelihoods of samples, solid lines are marginalized probabilities. For Gaussian distributions they should be the same. For skew disctributions, or if chains are poorly converged, they will not be. Sometime there is a much better fit (giving high mean likelihood) in a small region of parameter space which has very low marginalized probability. There is more discussion in the original paper.
     
  2. What's in the .likestats file produced by getdist?
    These are values for the best-fit sample, and projections of the n-Dimensional confidence region. Usually people only look at the best-fit values. The n-D limits give you some idea of the range of the posterior, and are much more conservative than the marginalized limits.
     
  3. I'm writing a paper "constraints on X". How do I add X to cosmomc?
    Often the easiest thing to do is to re-interpret one of the unused standard parameters, e.g. w, A_T, etc, depending on whether the parameter is "fast" or not (if in doubt, use one of the slow parameters like w or f_nu). You just need to change the CMBToCAMB routine in CMB_Cls_Simple.f90 so that your parameter is correctly fed into CAMB, change limits appropriately in the params.ini file, etc. See the undocumented references to w_is_w in the code for how w can be re-interpreted as a ratio of isocurvature to adiabatic in CAMB's initial conditions. If you need to add more than a couple of parameters you'll probably instead need to edit settings.f90 to increase the number of parameters, and edit the .ini file accordingly. Slow parameters should have numbers lower than fast parameters (i.e. to insert 5 slow parameters, the index of the fast parameters in the .ini file should increase by 5).
     
  4. Why do some chains sometimes appear to get stuck?
    Usually this is because the starting position for the chain is a long way from the best fit region. Since the marginal distributions of e.q. A_s are rather narrow, it can take a while for chains to move from into an acceptable region of A_s exp(-2τ). The cure is to check your starting parameter values and start widths (make sure the widths are not too wide), or to use a sampling method that is more robust (e.g. use sampling_method = 4). If you are patient, stuck chains should eventually find a sensible region of parameter space anyway. Occasionally the staring position may be in a corner of parameter space so that prior ranges prevent any resonable proposed moves. In this case check your starting values and ranges, or just try re-starting the chain (a different random starting position will possibly be OK).
     
  5. How to I simulate futuristic CMB data? See CosmoCoffee. Also if you want to include lensing potential reconstruction, see this page.

Feel free to ask questions (and read answers to other people's) on the CosmoCoffee software forum. There is also a FAQ in the CosmoloGUI readme.


Antony Lewis.