On a multi-processor machine you can start to get good results in a couple of hours. On single processors you'll need to set aside rather longer. You can also run on a cluster.
By default CosmoMC uses a simple Metropolis algorithm, but there are options for slice sampling and more powerful methods for exploring complicated distribution or using the fast/slow parameter sub-spaces. The program takes as inputs estimates of central values and posterior uncertainties of the various parameters. The proposal density can use information about parameter correlations from a supplied covariance matrix: use one if possible as it will significantly improve performance. There is an option to estimate the covariance about the best-fit point (though this doesn't work very reliably), and some covariance matrices are supplied for common sets of default base parameters. If you compile and run with MPI (to run across nodes in a cluster), there is an option to dynamically learn the proposal matrix from the covariance of the post-burn-in samples so far. The MPI option also allows you to terminate the computation automatically when a particular convergence criterion is matched. MPI is recommended.
There are two programs supplied cosmomc and getdist. The first does the actual Monte-Carlo and produces sets of .txt chain files and (optionally) .data output files (the binary .data files include the theoretical CMB power spectra etc.). The "getdist" program analyses the .txt files calculating statistics and outputs files for the requested 1D, 2D and 3D plots (and could be used independently of the main cosmomc program). The "cosmomc" program also does post processing on .data files, for example doing importance sampling with new data.
Please e-mail details of any bugs or enhancements to Antony Lewis. If you have any questions please ask in the CosmoCoffee computers and software forum. You can also read answers to other people's questions there.
Downloading and Compiling
Using MPI simplifies running several chains and proposal optimization. MPI can be used with OpenMP: generally you want to use OpenMP to use all the shared-memory processors on each node of a cluster, and MPI to run multiple chains on different nodes (the program can also just be run on a single CPU).
You will need a Fortran 90 (or higher) compiler - you can get free Intel Linux, G95 or GFortran compilers. Please let me know if you find compiler bugs and have specific fixes. You also need to link to LAPACK (for doing matrix diagonalization, etc) - you may need to edit the Makefile to specify where this on your system. Intel systems often use Intel's MKL.
Using Visual Fortran there's no need to use the Makefile, just open the project file in the source folder, and set params.ini as the program argument. For Compaq CVF do this under Project, Settings, Debug and set the working directory to ..\. Under Tools, Options, Directories set the include path to [cxml path]/CXML/INCLUDE and lib path to [cxml path]/CXML/LIB. Don't install the 6.6C3 update as it gives compiler errors (6.6B is fine). You then need to add cfitsio files to your project depending on where they are on your system.
To change the l_max which is used for generating Cls you'll need to edit the value in cmbtypes.f90, run "make clean" then "make" to rebuild everything. Note l_max should be 50 larger than the largest l which you need accurately. You can also change matter_power_lnzsteps, the number of redshifts at which matter power spectra are sampled.
The default code includes polarization. You can edit the num_cls parameter in cmbtypes.f90 to include just temperature (num_cls=1), TT, TE and EE (num_cls=3) or TT, TE, EE and BB (num_cls=4). You will need the last option if you are including tensors and using polarized data. You can use temperature-only datasets with num_cls 3 or 4, the only disadvantage being that it will be marginally slower, and the .data files with be substantially larger. For WMAP data you need num_cls = 3 or 4.
See BibTex file for relevant citations.CosmoMC as a generic sampler
CosmoMC can also be compiled to sample any function you like, without calling any cosmology codes. Use the supplied Makefile_nowmap and set generic_mcmc = .true. in settings.f90. Also in settings.f90 set num_hard to the number of parameters you want to vary, and num_initpower, and num_norm to zero. Write your likelihood function as a function of the array of parameters in GenericLikelihoodFunction (calclike.f90). You don't need CFITSIO or WMAP code installed to do this, but you will still need to compile CAMB.
See the supplied params.ini file for a fairly self-explanatory list of input parameters and options. The file_root entry gives the root name for all files produced. Running using MPI on a cluster is recommended if possible as you can automatically handle convergence testing and stopping.
./cosmomc params.iniThe samples will be in file_root.txt, etc. You can start several instances of the program generating separate chains using
./cosmomc params.ini 1In this case samples will be in the files file_root_NN.txt, where NN labels the chain number.
./cosmomc params.ini 2
etc.
mpirun -np 4 ./cosmomc params.iniThere is also a supplied perl script runMPI.pl that you may be able to adapt for submitting jobs to a PBS queue, running lamboot, etc, e.g.
perl runMPI.pl params 4to run 4 chains over four nodes using the params.ini parameters file (the script is set up by default for the CITA cluster - edit ppn=2 to the number of CPUs per node you have). A couple of runMPI.pl variations are also supplied (specifically for a couple of Cambridge computers, but may be generally adaptable).
If things go wrong check the .log and any error files in your cosmomc/scripts directory.
weight like param1 param2 param3 ...
The weight gives the number of samples (or importance weight) with these parameters. like gives -log(likelihood). The getdist program could be used completely independently of the cosmomc program.
Run getdist distparams.ini to process the chains specified in the parameter input file distparams.ini. This should be fairly self-explanatory, in the same format as the cosmomc parameter input file. Note that sigma_8 is only computed if you are including LSS data when generating the chain (as computing the matter power spectrum slows things down considerably; You can post-process to compute sigma8 if you like, see action=1 in the cosmomc input file).
GetDist Parameters
Of course you should also check that you have set num_bins large enough and that your plots are stable to increasing it. Turning off smoothing can also be a useful check.
Output Text Files
Plotting
Parameter labels are set in distparams.ini - if any are blank the parameter is ignored. You can also specify which parameters to plot, or if parameters are not specified for
the 2D plots or the colour of the 3D plots getdist automatically works out
the most correlated variables and uses them.
The data files used by SuperMongo and MatLab
are output to the plot_data directory.
Performance of the MCMC can be improved by using parameters which have a close to Gaussian posterior distribution. The default parameters (which get implicit flat priors) are
Parameters like H_0 and Omega_lambda are derived from the above. Using theta rather than H_0 is more efficient as it is much less correlated with other parameters. There is an implit prior 40 < H_0 < 100. The .txt chain files list derived parameters after the 13 base parameters. By default these are ΩΛ (14), Age/Gyr (15), Ωm (16), σ8 (17), zre (18), r10 (19) and H0 (20). r10 is the ratio of the tensor to scalar Cl at l=10.
Since the program uses a covariance matrix for the parameters, it knows about (or will learn about) linear combination degeneracies. In particular ln[10^10 A_s] - 2*tau is well constrained, since exp(-2tau)A_s determines the overall amplitude of the observed CMB anisotropy (thus the above parameterization explores the tau-A degeneracy efficiently). The supplied covariance matrix will do this even if you add new parameters.
Changing parameters does in principle change the results as each base parameter has a flat prior. However for well constrained parameters this effect is very small. In particular using theta rather than H_0 has a small effect on marginalized results.
The above parameterization does make use of some knowledge about the physics, in particular the (approximate) formula for the sound horizon. Also supplied is a params_H.f90 file which uses H_0,z_re and A_s instead of theta, tau and log(10^10 A_s) which is more generic. Though slower to converge, this may be useful if you want to play around with different extended models - just edit the Makefile to use params_H.f90 instead of params_CMB.f90. Sample input files and covariance matrix along with params_H.f90 are available here. Since the parameters have a different meaning in this parameterization, you should not try to mix .covmat (or other) files with those from the default parameterization. Note this file tends to get out of synch with the latest CosmoMC version.
The supplied CMB datasets that are used for computing the likelihood are given in
*.dataset files in the data directory (these may not be up to date). These are in a standard .ini format,
and contain the data points and errors, data name, calibration and beam
uncertainties, and window file directory. Code for handling these is in cmbdata.f90. The WMAP data is handled separately as a special case. Various simple priors are encoded in calclike.f90.
The most likely need to modify the code is to change l_max, num_cls, or matter_power_lnzsteps, all specified in cmbtypes.f90. To change the numbers of parameters you'll need to change the constants in settings.f90. Run "make clean" after changing settings before re-compiling. When adding just one additional parameter it's often easiest to re-interpret one of the default parameters rather than adding in new parameters.
You are encouraged to examine what the code is doing and consider carefully
changes you may wish to make. For example, the results can depend on the
parameterization. You may also want to use different CAMB modules, e.g.
slow-roll parameters for inflation, or use a fast approximator. The main
source code files you may want to modify are
The .ini file comments should explain the other options.
Example: Since many people get this wrong, here is an illustration of what happens when generating plots from a tensor run set of chains (with prior r>0):
Incorrect result when limits12 is not set.
Correct result when setting limits12=0 N.
GetDist produces a file_root.sm file for use with sm. Run sm < file_root.sm to produce file_root.ps containing a plot of the 1D marginalized posterior distributions.
GetDist produces MatLab '.m' files to do 1D, 2D and 3D plots. Type file_root into a MatLab
window set to the directory containing the .m files to produce 1D marginalized plots. You can also do
You can use the blue matlab script (in the mscripts) directory to change to a B&W-friendly colormap (see also other colormaps in that directory). To compare two different sets of chains set compare_num=1 in the .ini file, and compare1 to the root name of some chains you have previously run GetDist on.
Some matlab scripts are also supplied for making custom matlab plots using the files produced by GetDist (see also CosmoloGui). The scripts are in the mscripts directory - you will probably want to add this to your matlab path using e.g. addpath('mscripts'). The scripts are:
confid2D('file_root1',8,17,'-k','b');
hold on;
confid2D('file_root2',8,17,'-k','r');
hold off;
show_contours_behind;
This last (optional) command is a supplied script which will show dotted lines lying behind other solid contours. If the last colour argument is omitted, confid2D plots unfilled contours only.
Convergence diagnostics
The getdist program will output convergence diagnostics, both short summary information when getdist is run, and also more detailed information in the file_root.converge file. When running with MPI the first two of the parameters below can also be calculated when running the chains for comparison with a stopping criterion (see the .ini input file).
Differences between GetDist and MPI run-time statistics
GetDist will cut out ignore_rows from the beginning of each chain, then compute the R statistic using the last half of the remaining samples. The MPI run-time statistic uses the last half of all of the samples. In addition, GetDist will use all the parameters, including derived parameters. If a derived parameter has poor convergence this may show up when running GetDist but not when running the chain (however the eigenvalues of covariance of means is computed using only base parameters). The run-time values also use thinned samples (by default every one in ten), whereas GetDist will use all of them. GetDist will allow you to use only subsets of the chains.
Parameterizations
Hard coded priors
The default installation hard codes a few priors, in some instances you may wish to edit these:
There is no prior on the positivity of Omega_Lambda.
Data
There is also built-in support for 2dF and (few years old) supernovae observations. Adding new data sets should be quite straightforward - you are encouraged to donate anything you add to be used by everyone. See add-ons and datasets.
Programming
This defines what the input variables mean. Change this to use different
variables. You can change which parameterization file to use in the Makefile. params_H.f90 is also supplied for using z_re, A_s and H_0 instead of tau, log(A_s) and theta.
You need to change this file to specify the l_max used. Chains can
be generated at low l_max, then post-processed with a compile using a higher
l_max. You can also change the num_cls number of (temperature plus polarization) Cls to compute and store.
This defines the number of parameters and their types. You will need
to change this if you use more parameters.
This reads in the CMB .dataset information and computes likelihoods.
You may wish to edit this, for example to use likelihood distributions
for the band powers, or to compute the likelihood from actual polarized data. This version assumes polarized data points are an arbitrary combination of the raw TT, TE, EE, and BB Cls, as specified in the window files in data/windows. WMAP data is handled as a special case.
Analagous to cmbdata, but for matter power spectrum measurements. Reads in generic dataset files, supported (fixed) covariance matrices.
This is the proposal density and related constants and subroutines. The efficiency
of MCMC is quite dependent on the proposal. Fast+slow and fast parameter subspaces are proposed separately. See the notes for a discussion of the proposal density and use of fast and slow parameters.
Routines for generating Cls, matter power spectra and sigma8 from CAMB.
Replace this file to use other generators, e.g. a fast approximator like
CMBfit, DASH, PICO, etc.
As of May 2008 uses UNION by default (thanks to Anze Slosar). Other alternative supernovae_xxx files are supplied.
SDSS Lyman alpha data (thanks to Kevork Abazajian). Note this is only tested and likely to be reliable for standard LCDM models. For more general code see the add-ons. Can also replace with lya.f90 and recompile to use with LUQAS (thanks to Matteo Viel, J.Lesgourgues).
Reads in .data files and re-calculates likelihoods or theory predictions. Unused in MCMC runs.
Add in calls to other likelihood calculators, etc., here.
Main program that reads in parameters and calls MCMC or post-processing.
The "getdist" program for analysing chains. Write your own importance
weighting function or parameter mapping.Add-ons and extra datasets
Information Theory, Inference and Learning Algorithms
Raftery and Lewis convergence diagnostics
There are also some notes on the proposal density, fast and slow parameters, and slice sampling as used by CosmoMC. See also the BibTex file of CosmoMC references you should cite, along with some references
of potential interest.
FAQ
Feel free to ask questions (and read answers to other people's) on the CosmoCoffee software forum. There is also a FAQ in the CosmoloGUI readme.