Initial import

This commit is contained in:
Guilhem Lavaux 2023-05-29 10:41:03 +02:00
commit 56a50eead3
820 changed files with 192077 additions and 0 deletions

View file

@ -0,0 +1,509 @@
.. _building:
Building
########
Prerequisites
=============
* cmake ≥ 3.13
* automake
* libtool
* pkg-config
* gcc ≥ 7 , or intel compiler (≥ 2018), or Clang (≥ 7)
* wget (to download dependencies; the flag ``--use-predownload`` can be
used to bypass this dependency)
Optional requirements are:
* An `OpenMP <http://www.openmp.org>`_-enabled compiler (with OpenMP >= 2.0)
|a| does not require any preinstalled external libraries; it will download
and compile all necessary dependencies by default.
Python scripts have been tested with the following:
* Python == 3.5
* healpy == 1.10.3 (Guilhem has also a special version of healpy on Github `here <https://github.com/glavaux/healpy>`__)
* HDF5Py == 2.7.0
* Numexpr == 2.6.2
* Numba == 0.33.0 - 0.35.0
In addition the vtktools binding in ares_tools has been used with
Paraview ≥ 5.2 . It should be safe to use to upload data into paraview
from numpy arrays.
.. _downloading_and_setting_up_for_building:
Downloading and setting up for building
=======================================
The first step to obtain and build ares is to clone the git repository
for bitbucket. On some supercomputing system, it is impossible to access
internet directly. The first clone should then be on your
laptop/workstation and then replicate it on the distant machine. Please
check next section for more details. If the computer has access to
internet this is easy:
.. code:: bash
git clone --recursive git@bitbucket.org:bayesian_lss_team/ares.git
Note that if you forget the "--recursive" option either start from
scratch or do a
.. code:: bash
git submodule init; git submodule update
Then you may want to choose a branch that interest you. At the time of
this writing (April 13th, 2021), there are 4 "main" branches:
* main (the bleeding edge variant of ARES)
* release/1.0
* release/2.0alpha
* release/2.1
The :code:`release/*` branches are stable, which means the existing code cannot
change significatively notably to alter API or features. Bug fixes can still go
in there, and exceptionally some late merging of features. The general advice
when starting is branch against the latest revision. Though if you particularly
need a feature of :code:`main`. There are of course lots of other sub-branches
for the different features and other development branches of each member of the
collaboration.
Normally you will want to choose . Otherwise you may change branch
by running ``git checkout THE_BRANCH_NAME_THAT_YOU_WANT``. Once you are
on the branch that you want you may run the ``get-aquila-modules.sh``
script. The first step consists in running
``bash get-aquila-modules.sh --clone``, this will clone all the
classical Aquila private modules in the "extra/" subdirectory. The
second step is to ensure that all branches are setup correctly by
running ``bash get-aquila-modules.sh --branch-set``.
Now that the modules have been cloned and setup we may now move to
building.
As a word of caution, Do not touch the gitmodules files. Whenever you
need to do changes create a new branch in either of the main repository
or the modules and work in that branch.
sync submodules:
.. code:: bash
cd ares
git submodule sync
git submodule update --init --recursive
.. _supercomputer_without_outgoing_access_to_internet:
Supercomputer without outgoing access to internet
=================================================
If the supercomputer does not accept to let you create connection to
internet (i.e. TGCC in France), things are bit more complicated. The
first clone of ares and its modules should be done on your
laptop/workstation. Make it a clean variant for example:
.. code:: bash
git clone --recursive git@bitbucket.org:bayesian_lss_team/ares.git ares_clean
Then proceed again with
.. code:: bash
bash get-aquila-modules.sh --clone
bash get-aquila-modules.sh --branch-set
bash build.sh --download-deps
Now replicate that tree to the computer:
.. code:: bash
cd ..
rsync -av ares_clean THE_COMPUTER:
And now you can proceed as usual for building
**However** for updating later the GIT tree later, we have two special
commands available in get-aquila-modules.sh. On your laptop/workstation,
run the following from the ares top source directory:
.. code:: bash
bash get-aquila-modules.sh --send-pack THE_COMPUTER ares_clean origin
This will send the content of the current git tree (including the
registered modules in .aquila-modules) from the remote ``origin`` to
remote directory ``ares_clean`` on the computer ``THE_COMPUTER``.
However the checked out branch will not be remotely merged! A second
operation is required. Now login on the distant computer and run
.. code:: bash
bash get-aquila-modules.sh --local-merge origin
This will merge all the corresponding branches from origin to the
checked out branches. If everything is ok you should not get any error
messages. Error can happen if you modified the branches in an
incompatible way. In that case you have to fix the git merge in the
usual way (edit files, add them, commit).
.. _the_build.sh_script:
The build.sh script
===================
To help with the building process, there is a script called build.sh in
the top directory. It will ensure cmake is called correctly with all the
adequate parameters. At the same time it does cleaning of the build
directory if needed.
The most basic scenario for building is the following:
.. code:: bash
bash build.sh
bash build.sh --download-deps
cd build
make
Please pay attention warnings and error messages. The most important are color
marked. Notably some problems may occur if two versions of the same compiler
are used for C and C++.
The full usage is the following (obtained with ``bash build.sh -h``):
.. code:: text
Ensure the current directory is ARES
This is the build helper. The arguments are the following:
--cmake CMAKE_BINARY instead of searching for cmake in the path,
use the indicated binary
--without-openmp build without openmp support (default with)
--with-mpi build with MPI support (default without)
--c-compiler COMPILER specify the C compiler to use (default to cc)
--cxx-compiler COMPILER specify the CXX compiler to use (default to c++)
--julia JULIA_BINARY specify the full path of julia interpreter
--build-dir DIRECTORY specify the build directory (default to "build/" )
--debug build for full debugging
--no-debug-log remove all the debug output to increase speed on parallel
filesystem.
--perf add timing instructions and report in the log files
--extra-flags FLAGS extra flags to pass to cmake
--download-deps Predownload dependencies
--use-predownload Use the predownloaded dependencies. They must be in
downloads/
--no-predownload Do not use predownloaded dependencies in downloads/
--purge Force purging the build directory without asking
questions.
--native Try to activate all optimizations supported by the
running CPU.
--python[=PATH] Enable the building of the python extension. If PATH
is provided it must point to the executable of your
choice for (e.g `/usr/bin/python3.9`)
--with-julia Build with Julia support (default false)
--hades-python Enable hades-python (implies --python)
--skip-building-tests Do not build all the tests
Advanced usage:
--eclipse Generate for eclipse use
--ninja Use ninja builder
--update-tags Update the TAGS file
--use-system-boost[=PATH] Use the boost install available from the system. This
reduces your footprint but also increases the
possibilities of miscompilation and symbol errors.
--use-system-fftw[=PATH] Same but for FFTW3. We require the prefix path.
--use-system-gsl Same but for GSL
--use-system-eigen=PATH Same but for EIGEN. Here we require the prefix path of
the installation.
--use-system-hdf5[=PATH] Same but for HDF5. Require an HDF5 with C++ support.
The path indicate the prefix path of the installation of HDF5
(e.g. /usr/local or /usr). By default it will use
environment variables to guess it (HDF5_ROOT)
After the configuration, you can further tweak the configuration using ccmake
(if available on your system).
Note that on some superclusters it is not possible to download files
from internet. You can only push data using SSH, but not run any wget,
curl or git pull. To account for that limitation, there are two options:
"download-deps" and "use-predownload". You should run "bash build.sh
--download-deps" on, e.g., your laptop or workstation and upload the
created "downloads" directory into the ARES source tree on the
supercomputer without touching anything inside that directory. Once you
did that you can build on the supercomputer login node, by adding
"--use-predownload" flag to build.sh in addition to others that you
need. If you want to compile with full MPI support, you have to give
'--with-mpi' as an argument to build.sh.
If you have built ARES before grabbing all the extra modules, it is fine
you can still recover your previous build. For that just go to your
build directory and run ``${CMAKE} .`` with ${CMAKE} being the cmake
executable that you have used originally. If you did not specify
anything just use 'cmake'.
A typical successful completion of the configuration ends like that:
.. code:: text
Configuration done.
Move to /home/lavaux/PROJECTS/ares/build and type 'make' now.
Please check the configuration of your MPI C compiler. You may need
to set an environment variable to use the proper compiler.
Some example (for SH/BASH shells):
OpenMPI:
OMPI_CC=cc
OMPI_CXX=c++
export OMPI_CC OMPI_CXX
It tells you that you should move to the build directory (by default it
is a subdirectory called "build/" in the root of the ARES source code).
There is a potential pitfall when using some MPI C compiler. They have
been installed by the system administrator to work by default with
another compiler (for example Intel C Compiler) though they work
completely fine also with another one (like GCC). In that case you have
to force the MPI C compiler to use the one that you chose with the
indicated environment variable, otherwise you will risk having
inconsistent generated code and errors at the final binary building.
.. code:: bash
cd build ; make
.. note::
* Use make parallelism if possible using the '-j'option. The number
indicates the number of CPU cores to use in parallel to compile all the source
code. For example ``make all -j4`` to compile using 4 parallel tasks. We have
not yet caught all the detailed dependencies and it may happen there is a
failure. Just execute 'make' again to ensure that everything is in order
(it should be).
* Use ``make VERBOSE=1`` to see exactly what the compilation is doing
Upon success of the compilation you will find executables in the ``src/`` subdirectory. Notably::
./src/ares3
.. _git_procedures:
Git procedures
==============
.. _general_checkup_management:
General checkup / management
----------------------------
.. code:: text
bash get-aquila-modules.sh --status
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This script can be run only by Aquila members.
if your bitbucket login is not accredited the next operations will fail.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Checking GIT status...
Root tree   (branch master) : good. All clear.
Module ares_fg (branch master) : good. All clear.
Module borg    (branch master) : good. All clear.
Module dm_sheet    (branch master) : good. All clear.
Module hades   (branch master) : good. All clear.
Module hmclet  (branch master) : good. All clear.
.. _git_submodules:
Git submodules
--------------
Contents of file 'BASE/ares/.gitmodules'
.. code:: bash
[submodule 'external/cosmotool']
path = external/cosmotool
url = https://bitbucket.org/glavaux/cosmotool.git
.. _frequently_encountered_problems_fep:
Frequently Encountered Problems (FEP)
=====================================
.. _non_linked_files:
Non-linked files
----------------
Problem
~~~~~~~
* Not being able to compile after transferring to a supercluster
* Error as following:
.. figure:: /user/building/Terminal_output.png
:alt: /user/building/Terminal_output.png
:width: 400px
Terminal_output.png
* Complains about not finding cfitsio in external/cfitsio while the
cfitsio is actually in external/cfitsio.
* Folder external/cfitsio:
.. figure:: /user/building/Terminal_output-2.png
:alt: /user/building/Terminal_output-2.png
:width: 400px
Terminal_output-2.png
Solution
~~~~~~~~
Purging all the .o and .a in external/cfitsio, and force a rebuild of
libcfitsio by removing the file
{BUILD}/external_build/cfitsio-prefix/src/cfitsio-stamp/cfitsio-build
and type make
MPI_CXX not found
-----------------
Problem
~~~~~~~
MPI_C is found but MPI_CXX is not found by CMake. The output of build.sh
contains something like:
.. code:: bash
-- Found MPI_C: /path/to/libmpi.so (found version "3.1")
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
-- Found MPI_Fortran: /path/to/libmpi_usempif08.so (found version "3.1")
.. _solution_1:
Solution
~~~~~~~~
You probably have two versions of MPI (the one you intend to use, e.g.
your installation of OpenMPI) and one which pollutes the environment
(e.g. your anaconda). Therefore the compilation of the MPI C++ test
program (``build/CMakeFiles/FindMPI/test_mpi.cpp``) by CMake fails. To
troubleshoot:
* Check the commands that defined your environment variables using
.. code:: bash
set | grep -i MPI
* check the paths used in ``CPATH``, ``CPP_FLAGS``, etc. for spurious
MPI headers (e.g. ``mpi.h``)
* control the file ``build/CMakeFiles/CMakeError.txt`` if it exists
.. _building_at_hpc_facilities:
Building at HPC facilities
--------------------------
First, if possible, clone ARES base directory with git on the target
system:
.. code:: bash
git clone git@bitbucket.org:bayesian_lss_team/ares.git
Initialize the submodules:
.. code:: bash
cd ares
git submodule init
git submodule update
Obtain the additional Aquila modules:
.. code:: bash
bash get-aquila-modules.sh --clone
Here either on your laptop/workstation or on the target system if it
allows all outgoing internet connection you can run the following
command:
.. code:: bash
bash build.sh --download-deps
A typical problem is that some of the dependencies have not been
downloaded correctly. You should check if all dependencies are available
in the directory "/downloads". If you downloaded on your local computer,
you must upload downloads directory on the target system in the
ares/downloads subdirectory.
Check which modules are available
.. code:: bash
module avail
Choose the compiler or build environment. Also load the CMake module and
Python3.
**Important note:** The intel compiler requires basic infrastructure
provided by GCC. Default environment may be very old and thus a modern
Intel Compiler (19 or 20) would be using old libraries from GCC 4.x. You
have to load the gcc compiler first (gcc>7.x) and then load the intel
compiler. You can check the compatibility with "icc -v" and see the
version of gcc that is used by intel.
.. _permissions_quota_etc:
Permissions, quota, etc
-----------------------
Some supercomputing facilities has peculiar quota system. You have to
belong to a group to get access to full disk quota (e.g. TGCC in
France). You can switch groups using "newgrp name_of_the_group" and
excecute all commands in the spawn shell.
.. _external_hdf5_not_found:
External HDF5 not found
-----------------------
Problem
~~~~~~~
When running build.sh (particularly with the flag
``--use-system-hdf5``), cmake gives some errors, such as
.. code:: text
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
HDF5_CXX_INCLUDE_DIR (ADVANCED)
CMake Error in libLSS/CMakeLists.txt:
Found relative path while evaluating include directories of "LSS":
"HDF5_CXX_INCLUDE_DIR-NOTFOUND"
Solution
~~~~~~~~
* HDF5 must be compiled with the flags ``--enable-shared`` and
``--enable-cxx``.
* the environment variable ``HDF5_ROOT`` must point to the HDF5 prefix
directory, and cmake should use it from version 3.12 (see also cmake
policy CMP0074 and `this commit
2ebe5e9 <https://bitbucket.org/bayesian_lss_team/ares/commits/2ebe5e9c323e30ece0caa124a0b705f3b1241273>`__).
.. include:: building/building_May_2020.inc.rst

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 288 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 372 KiB

View file

@ -0,0 +1,196 @@
Installing BORG for the Aquila meeting (May 2020)
=================================================
This note provides a step by step instruction for downloading and
installing the BORG software package. This step-by-step instruction has
been done using a MacBook Air running OS X El Capitan. I encourage
readers to modify this description as may be required to install BORG on
a different OS. Please indicate all necessary modifications and which OS
was used.
Some prerequisites
------------------
The total installation will take approximately **7-8 GByte** of disk
space. Software prerequisites:
cmake≥ 3.10 automake libtool pkg-config gcc ≥ 7 , or intel compiler (≥
2018), or Clang (≥ 7) wget (to download dependencies; the flag
--use-predownload can be used to bypass this dependency if you already
have downloaded the required files in the ``downloads`` directory)
Clone the repository from BitBucket
-----------------------------------
To clone the ARES repository execute the following git command in a
console:
``{r, engine='bash', count_lines} git clone --recursive git@bitbucket.org:bayesian_lss_team/ares.git``
After the clone is successful, you shall change directory to ``ares``,
and execute:
.. code:: bash
bash get-aquila-modules.sh --clone
Ensure that correct branches are setup for the submodules using:
.. code:: bash
bash get-aquila-modules.sh --branch-set
If you want to check the status of the currently checked out ARES and
its modules, please run:
.. code:: bash
bash get-aquila-modules.sh --status
You should see the following output:
.. code:: text
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This script can be run only by Aquila members.
if your bitbucket login is not accredited the next operations will fail.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Checking GIT status...
Root tree (branch master) : good. All clear.
Module ares_fg (branch master) : good. All clear.
Module borg (branch master) : good. All clear.
Module dm_sheet (branch master) : good. All clear.
Module hades (branch master) : good. All clear.
Module hmclet (branch master) : good. All clear.
Module python (branch master) : good. All clear.
Building BORG
-------------
To save time and bandwidth it is advised to pre-download the
dependencies that will be used as part of the building procedure. You
can do that with
.. code:: bash
bash build.sh --download-deps
That will download a number of tar.gz which are put in the
``downloads/`` folder.
Then you can configure the build itself:
.. code:: bash
bash build.sh --cmake CMAKE_BINARY --c-compiler YOUR_PREFERRED_C_COMPILER --cxx-compiler YOUR_PREFERRED_CXX_COMPILER --use-predownload
Add ``--with-mpi`` to add MPI support. E.g. (This probably needs to be
adjusted for your computer.):
.. code:: bash
bash build.sh --cmake /usr/local/Cellar/cmake/3.17.1/bin/cmake --c-compiler /usr/local/bin/gcc-10 --cxx-compiler /usr/local/bin/g++-10 --use-predownload
Once the configure is successful you should see a final output similar
to this:
.. code:: text
------------------------------------------------------------------
Configuration done.
Move to /Volumes/EXTERN/software/borg_fresh/ares/build and type 'make' now.
Please check the configuration of your MPI C compiler. You may need
to set an environment variable to use the proper compiler.
Some example (for SH/BASH shells):
- OpenMPI:
OMPI_CC=/usr/local/bin/gcc-9
OMPI_CXX=/usr/local/bin/g++-9
export OMPI_CC OMPI_CXX
------------------------------------------------------------------
It tells you to move to the default build directory using ``cd build``,
after what you can type ``make``. To speed up the compilation you can
use more computing power by adding a ``-j`` option. For example
.. code:: bash
make -j4
will start 4 compilations at once (thus keep 4 cores busy all the time
typically). Note, that the compilation can take some time.
Running a test example
----------------------
The ARES repository comes with some standard examples for LSS analysis.
Here we will use a simple standard unit example for BORG. From your ARES
base directory change to the examples folder:
.. code:: bash
cd examples
We will copy a few files to a temporary directory for executing the run. We
will assume here that ``$SOME_DIRECTORY`` is a directory that you have created
for the purpose of this tutorial. Please replace any occurence of it by the
path of your choice in the scripts below. We will also assume that ``$ARES``
represents the source directory path of the ares tree.
.. code:: bash
mkdir $SOME_DIRECTORY
cp 2mpp-chain.ini $SOME_DIRECTORY
cp completeness_12_5.fits.gz completeness_11_5.fits.gz 2MPP.txt $SOME_DIRECTORY
cd $SOME_DIRECTORY
In the above, we have copied the ini file describing the run, then the data
file (survey mask) and 2M++ data file for BORG. To start a BORG run just
execute the following code in the console:
.. code:: bash
$ARES/build/src/hades3 INIT 2mpp-chain.ini.txt
BORG will now execute a simple MCMC. You can interupt calculation at any
time. To resume the run you can just type:
.. code:: bash
$ARES/build/src/hades3 RESUME borg_unit_example.ini
You need at least on the order of 1000 samples to pass the initial
warm-up phase of the sampler. As the execution of the code will consume
about 2GB of your storage, we suggest to execute BORG in a directory
with sufficient free hard disk storage.
You can also follow the Aquila tutorial
---------------------------------------
You can find a tutorial on running and analysing a BORG run in the scripts
directory of the ARES base directory:
``$ARES/docs/users/building/Aquila_tutorial_0.ipynb``. It is a jupyter
notebook, so please have a `jupyter <https://jupyter.org>`_ running. We
provide access to the content of this notebook directly through this `link to the notebook <building/Aquila_tutorial_0.ipynb>`_.
It illustrates how to read and
plot some of the data produced by BORG.
Switching to another branch
---------------------------
Follow these steps to switch your ares clone to another branch (starting
from the ``ares/`` directory):
.. code:: bash
git checkout user/fancy_branch
git pull
# (the above step should only be necessary if you are not on a fresh clone and have not pulled recently)
bash get-aquila-modules.sh --branch-set
bash get-aquila-modules.sh --status
# ( verify that it responds with "all clear" on all repos)
bash get-aquila-modules.sh --pull
# ready to build: (make clean optional)
cd build ; make clean ; make

View file

@ -0,0 +1,9 @@
Clusters
########
.. _clusters:
.. include:: clusters/Horizon.inc.rst
.. include:: clusters/Occigen.inc.rst
.. include:: clusters/Imperial_RCS.inc.rst
.. include:: clusters/SNIC.inc.rst

View file

@ -0,0 +1,321 @@
.. _horizon:
Horizon
=======
Compiling and using ARES/BORG on Horizon
----------------------------------------
Modules
~~~~~~~
.. code:: bash
module purge
module load gcc/7.4.0
module load openmpi/3.0.3-ifort-18.0
module load fftw/3.3.8-gnu
module load hdf5/1.10.5-gcc5
module load cmake
module load boost/1.68.0-gcc6
module load gsl/2.5
module load julia/1.1.0
Building
~~~~~~~~
.. code:: bash
bash build.sh --use-predownload --use-system-hdf5 --use-system-gsl --build-dir /data34/lavaux/BUILD_ARES --c-compiler gcc --cxx-compiler g++
Running
~~~~~~~
Jupyter on Horizon
------------------
Jupyter is not yet installed by default on the horizon cluster. But it
offers a nice remote interface for people:
- with slow and/or unreliable connections,
- who wants to manage a notebook that can be annotated directly inline
with Markdown, and then later converted to html or uploaded to the
wiki with the figures included,
- Use ipyparallel more efficiently
They are not for:
- people who does not like notebooks for one reason or the other
Installation
~~~~~~~~~~~~
We use python 3.5, here. Load the following modules;
.. code:: bash
module load intel/16.0-python-3.5.2 gcc/5.3.0
Then we are going to install jupyter locally:
.. code:: bash
pip3.5 install --user jupyter-client==5.0.1 jupyter-contrib-core==0.3.1 jupyter-contrib-nbextensions==0.2.8 jupyter-core==4.3.0 jupyter-highlight-selected-word==0.0.11 jupyter-latex-envs==1.3.8.4 jupyter-nbextensions-configurator==0.2.5
At the moment (22 June 2017), I am using the above versions but later may well
work without problems.
Automatic port forwarding and launch of Jupyter instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jupyter can be cumbersome to start reliably, automatically and in a
consistent fashion. Guilhem Lavaux has written two scripts (`here <https://www.aquila-consortium.org/wiki/index.php/File:Jupyter_horizon.zip>`__) that
can help in that regard. The first script (``jupyter.sh``) has to be
left in the home directory on Horizon, it helps at starting a new
jupyter job and reporting where it is located and how to contact it. The
two scripts are here: . The second script has to be kept on the local
station (i.e. the laptop of the user or its workstation). It triggers
the opening of ssh tunnels, start jobs and forward ports. The second
script (``.horizon-env.sh``) should be loaded from ``.bashrc`` with a
command like source ``${HOME}/.horizon-env.sh``. After such steps are
taken several things are possible. First to start a jupyter on horizon
you may run juphorizon. It will give the following output:
.. code:: text
~ $ juphoriz
Forwarding 10000 to b20:8888
Now you use your web-browser and connect to
`localhost:10000 <https://localhost:10000>`__. You also know that your jupyter is on
beyond20 (port 8888).
To stop the session do the following:
.. code:: text
~ $ stopjup
Do you confirm that you want to stop the session ? [y/N]
y
Jupyter stopped
If you run it a second time you will get:
.. code:: text
[guilhem@gondor] ~ $ stopjup
Do you confirm that you want to stop the session ? [y/N]
y
No port forwarding indication. Must be down.
which means that the port forwarding information has been cleared out
and the script does not know exactly how to proceed. So it does nothing.
If you still have a job queued on the system it is your responsability
to close it off to avoid using an horizon node for nothing.
Two other commands are available:
- ``shuthorizon``, it triggers the shutdown of the tunnel to horizon.
Be careful as no checkings are done at the moment. So if you have
port forwarding they will be cancelled and you will have to set them
up manually again.
- ``hssh``, this opens a new ssh multi-plex connection to horizon. It
will not ask for your password as it uses the multiplexer available
in ssh. Note that it is not possible to start an X11 forwarding using
this.
IPyParallel
-----------
Now we need to install ipyparallel:
.. code:: bash
pip3.5 install --user ipyparallel
$HOME/.local/bin/ipcluster nbextension enable
Use `this pbs template <https://www.aquila-consortium.org/wiki/index.php/File:Pbs.engine.template.txt>`__.
You have to put several files in your $HOME/.ipython/profile_default:
- `IPCluster configuration <https://www.aquila-consortium.org/wiki/index.php/File:IPython_ipcluster_config_py.txt>`__
as *ipcluster_config.py*. This file indicates how to interact with
the computer cluster administration. Notable it includes a link to
aforementioned template for PBS. I have removed all the extra
untouched configuration options. However in the original file
installed by ipyparallel you will find all the other possible knobs.
- `IPCluster
configuration <https://www.aquila-consortium.org/wiki/index.php/File:IPython_ipcontroller_config_py.txt>`__ as
*ipcontroller_config.py*. This file is used to start up the
controller aspect which talks to all engines. It is fairly minor as I
have kept the controller on the login node to talk to engines on
compute nodes.
- `IPCluster configuration <https://www.aquila-consortium.org/wiki/index.php/File:IPython_ipengine_config_py.txt>`__ as
*ipengine_config.py*. This file is used to start up the engines on
compute nodes. The notable option is to indicate to listen to any
incoming traffic.
The documentation to ipyparallel is available from readthedocs
`here <http://ipyparallel.readthedocs.io/en/6.0.2/>`__.
Once you have put all the files in place you can start a new PBS-backed
kernel:
.. code:: text
$ ipcluster start -n 16
With the above files, that will start one job of 16 cores. If you have
chosen 32, then it would have been 2 MPI-task of 16 cores each one, etc.
To start using with ipyparallel open a new python kernel (either from
ipython, or more conveniently from jupyter notebook):
.. code:: text
import ipyparallel as ipp
c = ipp.Client()
Doing this will connect your kernel with a running ipyparallel batch
instance. ``c`` will hold a dispatcher object from which you can
instruct engines what to do.
IPyParallel comes with magic commands for IPython
`3 <http://ipyparallel.readthedocs.io/en/6.0.2/magics.html>`__. They are
great to dispatch all your commands, however you must be aware that the
contexts is different from your main ipython kernel. Any objects has to
be first transmitted to the remote engine first. Check that page
carefully to learn how to do that.
MPIRUN allocation
-----------------
These are tips provided by Stephane Rouberol for specifying finely the
core/socket association of a given MPI/OpenMP computation.
.. code:: text
# default is bind to *socket*
mpirun -np 40 --report-bindings /bin/true 2>&1 | sed -e 's/.*rank \([[:digit:]]*\) /rank \1 /' -e 's/bound.*://' | sort -n -k2 | sed -e 's/ \([[:digit:]]\) / \1 /'
rank 0 [B/B/B/B/B/B/B/B/B/B][./././././././././.][./././././././././.][./././././././././.]
rank 1 [./././././././././.][B/B/B/B/B/B/B/B/B/B][./././././././././.][./././././././././.]
(...)
.. code:: text
# we can bind to core
mpirun -np 40 --bind-to core --report-bindings /bin/true 2>&1 | sed -e 's/.*rank \([[:digit:]]*\) /rank \1 /' -e 's/bound.*://' | sort -n -k2 | sed -e 's/ \([[:digit:]]\) / \1
rank 0 [B/././././././././.][./././././././././.][./././././././././.][./././././././././.]
rank 1 [./././././././././.][B/././././././././.][./././././././././.][./././././././././.]
(...)
.. code:: text
# we can bind to core + add optimization for nearest-neighbour comms (put neighbouring ranks on the same socket)
mpirun -np 40 --bind-to core -map-by slot:PE=1 --report-bindings /bin/true 2>&1 | sed -e 's/.*rank \([[:digit:]]*\) /rank \1 /' -e 's/bound.*://' | sort -n -k2 | sed -e 's/ \([[:digit:]]\) / \1
rank 0 [B/././././././././.][./././././././././.][./././././././././.][./././././././././.]
rank 1 [./B/./././././././.][./././././././././.][./././././././././.][./././././././././.]
.. code:: text
# -----------------------------------------------------------
# case 2: 1 node, nb of ranks < number of cores (hybrid code)
# -----------------------------------------------------------
beyond08: ~ > mpirun -np 12 -map-by slot:PE=2 --report-bindings /bin/true 2>&1 | sort -n -k 4
[beyond08.iap.fr:34077] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34077] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./././././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34077] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B/./././.][./././././././././.][./././././././././.][./././././././././.]
.. code:: text
beyond08: ~ > mpirun -np 12 -map-by socket:PE=2 --report-bindings /bin/true 2>&1 | sort -n -k 4
[beyond08.iap.fr:34093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34093] MCW rank 1 bound to socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././././././.][B/B/./././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34093] MCW rank 2 bound to socket 2[core 20[hwt 0]], socket 2[core 21[hwt 0]]: [./././././././././.][./././././././././.][B/B/./././././././.][./././././././././.]
.. code:: text
beyond08: ~ > mpirun -np 12 -map-by socket:PE=2 --rank-by core --report-bindings /bin/true 2>&1 | sort -n -k 4
[beyond08.iap.fr:34108] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B/./././././././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34108] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: [././B/B/./././././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34108] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]: [././././B/B/./././.][./././././././././.][./././././././././.][./././././././././.]
[beyond08.iap.fr:34108] MCW rank 3 bound to socket 1[core 10[hwt 0]], socket 1[core 11[hwt 0]]: [./././././././././.][B/B/./././././././.][./././././././././.][./././././././././.]
Fighting the shared node curse
------------------------------
Horizon compute nodes are each made of a mother motherboard with 4 cpus
setup on it. The physical access to the resources is transparently
visible from any of the CPU. Unfortunately each memory bank is attached
physically to a preferred CPU. For a typical node with 512 GB of RAM,
each CPU gets 128 GB. If one of the CPU needs access to physical RAM
space hosted by another CPU, then the latency is significantly higher.
The Linux kernel wants to minimize this kind of problem so it will try
hard to relocated the processes so that memory access is not
delocalised, kicking out at the same time any computations already in
progress on that cpu. This results in computations residing on some CPU
to affect computations on another CPU.
The situation can be even worse if two computations are sharing the same
CPU (which holds each N cores, 8 < N < 14). In that case the
computations are fighting for CPU and memory resources. For pure
computation that is generally less of a problem, but this case is not so
frequent on computer designed to handle the analysis of large N-body
simulations.
To summarise, without checking and allocating that your computations are
sitting wholly on a CPU socket you may have catastrophic performance
degradation (I have experienced a few times at least a factor 10).
There are ways of avoiding this problem:
- check the number of cores available on the compute nodes and try your
best to allocate a single CPU socket. For example, beyond40cores
queue is composed of nodes of 10 cores x 4 cpus. You should then ask
to PBS "-l nodes=1:beyond40cores:ppn=10", which will give you 10
cores, i.e. a whole CPU socket.
- think that if you need 256 GB, then you should use the 2 cpu sockets
in practice. So allocate 2 N cores (as in the previous cases, we
would need 20 cores, even if in the end only one CPU is doing
computation).
- Use numactl to get informed and enforce the resources allocation. For
example, typing "numactl -H" on beyond08 gives the following:
.. code:: text
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9
node 0 size: 131039 MB
node 0 free: 605 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19
node 1 size: 131072 MB
node 1 free: 99 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29
node 2 size: 131072 MB
node 2 free: 103 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39
node 3 size: 131072 MB
node 3 free: 108 MB
node distances:
node 0 1 2 3
0: 10 21 30 21
1: 21 10 21 30
2: 30 21 10 21
3: 21 30 21 10
It states that the compute node is composed of 4 "nodes" (=CPU socket
here). The logical CPU affected to each physical CPU is given by "node X
cpus". The first line indicate that the Linux kernel logical cpu "0 1 2
... 9" are affected to the physical CPU 0. At the same time the node 0
has "node 0 size" RAM physically attached. The amount of free RAM on
this node is shown by "node 0 free". Finally there is a node distance
matrix. It tells the user how far are each node from each other in terms
of communication speed. It can be seen that there may be up to a factor
3 penalty for communication between node 0 and node 2.
Scratch space
-------------

View file

@ -0,0 +1,223 @@
.. _imperial_rcs:
Imperial RCS
============
This page contains notes on how to compile and run |a| (and extensions) on `Imperial Research Computing Services <https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/>`_.
.. _gain_access_to_imperial_rcs:
Gain access to Imperial RCS
---------------------------
See `this page <https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/support/getting-started/>`__.
.. _copy_configuration_files:
Copy configuration files
------------------------
Copy the pre-prepared configuration files in your home, by cloning :
.. code:: bash
cd ~/
git clone git@bitbucket.org:florent-leclercq/imperialrcs_config.git .bashrc_repo
and typing:
.. code:: bash
cd .bashrc_repo/
bash create_symlinks.bash
source ~/.bashrc
Load compiler and dependencies
------------------------------
Load the following modules (in this order, and **only these** to avoid
conflicts):
.. code:: bash
module purge
module load gcc/8.2.0 git/2.14.3 cmake/3.14.0 intel-suite/2019.4 mpi anaconda3/personal
You can check that no other module is loaded using:
.. code:: bash
module list
.. _prepare_conda_environment:
Prepare conda environment
-------------------------
If it's your first time loading anaconda you will need to run (see `this page <https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/support/applications/conda/>`__):
.. code:: bash
anaconda-setup
In any case, start from a clean conda environment (with only numpy) to
avoid conflicts between compilers. To do so:
.. code:: bash
conda create -n pyborg numpy
conda activate pyborg
.. _clone_ares_and_additional_packages:
Clone ARES and additional packages
----------------------------------
Clone the repository and additional packages using as usual (see :ref:`ARES Building <Building>`):
.. code:: bash
mkdir ~/codes
cd ~/codes
git clone --recursive git@bitbucket.org:bayesian_lss_team/ares.git
cd ares
bash get-aquila-modules.sh --clone
If a particular release or development branch is desired, these
additional lines (for example) must be run:
.. code:: bash
git checkout develop/2.1
bash get-aquila-modules.sh --branch-set develop/2.1
Note that 'git branch' should not be used. Once this is done, one should
check to see whether the repository has been properly cloned, and the
submodules are all in the correct branch (and fine). To do so, one
should run:
.. code:: bash
bash get-aquila-modules.sh --status
The output will describe whether the cloned modules are able to link to
the original repository.
If the root is not all well (for example, the error could be in
cosmotool), try:
.. code:: bash
git submodule update
and check the modules status again
.. _compile_ares:
Compile ARES
------------
Run the ARES build script using:
.. code:: bash
bash build.sh --with-mpi --c-compiler icc --cxx-compiler icpc --python
(for other possible flags, such as the flag to compile BORG python, type
``bash build.sh -h``). Note: for releases <= 2.0, a fortran compiler was
necessary: add ``--f-compiler ifort`` to the line above. One may have to
predownload dependencies for ares: for this, add the
::
--download-deps
flag on the first use of build.sh, and add
::
--use-predownload
on the second (which will then build ares).
Then compile:
.. code:: bash
cd build
make
The 'make' command can be sped up by specifying the number of nodes, N,
used to perform this:
.. code:: bash
cd build
make -j N
.. _run_ares_example_with_batch_script:
Run ARES example with batch script
----------------------------------
The following batch script (``job_example.bash``) runs the example using
mixed MPI/OpenMP parallelization (2 nodes, 32 processes/node = 16 MPI
processes x 2 threads per core). Check `this
page <https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/computing/job-sizing-guidance/>`__
for job sizing on Imperial RCS.
.. code:: bash
#!/bin/bash
# request bash as shell for job
#PBS -S /bin/bash
# queue, parallel environment and number of processors
#PBS -l select=2:ncpus=32:mem=64gb:mpiprocs=16:ompthreads=2
#PBS -l walltime=24:00:00
# joins error and standard outputs
#PBS -j oe
# keep error and standard outputs on the execution host
#PBS -k oe
# forward environment variables
#PBS -V
# define job name
#PBS -N ARES_EXAMPLE
# main commands here
module load gcc/8.2.0 intel-suite/2019.4 mpi
cd ~/codes/ares/examples/
mpiexec ~/codes/ares/build/src/ares3 INIT 2mpp_ares.ini
exit
As per `Imperial
guidance <https://www.imperial.ac.uk/admin-services/ict/self-service/research-support/rcs/computing/high-throughput-computing/configuring-mpi-jobs/>`__,
do not provide any arguments to ``mpiexec`` other than the name of the
program to run.
Submit the job via ``qsub job_example.bash``. The outputs will appear in
``~/codes/ares/examples``.
.. _select_resources_for_more_advanced_runs:
Select resources for more advanced runs
---------------------------------------
The key line in the submission script is
.. code:: bash
#PBS -lselect=N:ncpus=Y:mem=Z:mpiprocs=P:ompthreads=W
to select N nodes of Y cores each (i.e. NxY cores will be allocated to
your job). On each node there will be P MPI ranks and each will be
configured to run W threads. You must have PxW<=Y (PxW=Y in all
practical situations). Using W=2 usually makes sense since most nodes
have hyperthreading (2 logical cores per physical core).

View file

@ -0,0 +1,89 @@
.. _occigen:
Occigen
=======
Occigen is a CINES managed supercomputer in France. You need a time
allocation on this to use it. Check https://www.edari.fr
Module setup
------------
Compile with Intel
~~~~~~~~~~~~~~~~~~
.. code:: bash
module purge
module load gcc/8.3.0
module load intel/19.4
# WARNING: openmpi 2.0.4 has a bug with Multithread, cause hangs
module load openmpi-intel-mt/2.0.2
module load intelpython3/2019.3
export OMPI_CC=$(which icc)
export OMPI_CXX=$(which icpc)
Then run:
.. code:: bash
bash build.sh --use-predownload --no-debug-log --perf --native --c-compiler icc --cxx-compiler icpc --f-compiler ifort --with-mpi --build-dir $SCRATCHDIR/ares-build-icc --cmake $HOME/.local/bin/cmake
Compile with gcc
~~~~~~~~~~~~~~~~
.. code:: bash
module purge
module load gcc/8.3.0
# WARNING: openmpi 2.0.4 has a bug with Multithread, cause hangs
module load openmpi/gnu-mt/2.0.2
module load intelpython3/2019.3
export OMPI_CC=$(which gcc)
export OMPI_CXX=$(which g++)
Prerequisite
~~~~~~~~~~~~
Download cmake >= 3.10.
.. code:: bash
wget https://github.com/Kitware/CMake/releases/download/v3.15.5/cmake-3.15.5.tar.gz
Be sure the above modules are loaded and then compile:
.. code:: bash
cd cmake-3.15.5
./configure --prefix=$HOME/.local
nice make
make install
On your laptop run:
.. code:: bash
bash build.sh --download-deps
scp -r downloads occigen:${ARES_ROOT_ON_OCCIGEN}
Build
-----
.. _with_intel:
With intel
~~~~~~~~~~
.. code:: bash
bash build.sh --use-predownload --no-debug-log --perf --native --c-compiler icc --cxx-compiler icpc --f-compiler ifort --with-mpi --build-dir $SCRATCHDIR/ares-build-icc --cmake $HOME/.local/bin/cmake
.. _with_gcc:
With gcc
~~~~~~~~
.. code:: bash
bash build.sh --use-predownload --no-debug-log --perf --native --c-compiler gcc --cxx-compiler g++ --f-compiler gfortran --with-mpi --build-dir $SCRATCHDIR/ares-build-gcc --cmake $HOME/.local/bin/cmake

View file

@ -0,0 +1,80 @@
.. _snic:
SNIC
====
These instructions are for building on Tetralith - variations for other
systems may occur
Building at SNIC
----------------
Overview
~~~~~~~~
#. Ask for time
#. Load modules
#. Git clone the repo and get submodules
#. Use build.sh to build
#. Compile the code
#. Cancel remaining time
Detailed Instructions
~~~~~~~~~~~~~~~~~~~~~
1) ::
interactive -N1 --exclusive -t 2:00:00
2) ::
module load git
module load buildenv-gcc/2018a-eb
module load CMake/3.15.2
3) See instructions above
4) ::
bash build.sh --with-mpi --cmake /software/sse/manual/CMake/3.15.2/bin/cmake --c-compiler /software/sse/manual/gcc/8.3.0/nsc1/bin/gcc --cxx-compiler /software/sse/manual/gcc/8.3.0/nsc1/bin/g++ --debug
Note that these links are NOT the ones from the buildenv (as loaded
before). These are "hidden" in the systems and not accessible from the
"module avail". If trying to compile with the buildenv versions the
compilation will fail (due to old versions of the compilers)
5) ::
cd build
make -j
6) Find the jobID: ``squeue -u YOUR_USERNAME``
Find the jobID from the response
::
scancel JOBID
Running on Tetralith
--------------------
Use the following template:
.. code:: text
#!/bin/bash
####################################
#     ARIS slurm script template   #
#                                  #
# Submit script: sbatch filename   #
#                                  #
####################################
#SBATCH -J NAME_OF_JOB
#SBATCH -t HH:MM:SS
#SBATCH -n NUMBER_OF_NODES          
#SBATCH -c NUMBER_OF_CORES PER NODE (Max is 32)  
#SBATCH --output=log.%j.out # Stdout (%j expands to jobId) (KEEP AS IS)
#SBATCH --error=error.%j.err # Stderr (%j expands to jobId) (KEEP AS IS)
#SBATCH --account=PROJECT-ID
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK   ## you have to explicitly set this
mpprun ./PATH/TO/HADES3 INIT_OR_RESUME /PATH/TO/CONFIG/FILE.INI\

View file

@ -0,0 +1,9 @@
Extra modules
#############
.. _extras:
.. include:: extras/dm_sheet.inc.rst
.. include:: extras/hmclet.inc.rst
.. include:: extras/virbius.inc.rst
.. include:: extras/python.inc.rst

View file

@ -0,0 +1,52 @@
dm_sheet
========
This is a module for ARES/HADES/BORG.
It adds the algorithms **dm_sheet** to compute cosmological fields from
the dark matter phase-space sheet (in particular, density and velocity
fields from tetrahedra formalism).
``borg_forward`` supports the use of dm_sheet when it is available.
Setup
-----
To use this module, clone `the repository <https://bitbucket.org/bayesian_lss_team/dm_sheet/>`_ in $ARES_ROOT/extra/ (where $ARES_ROOT
represents the root source directory of ARES on your computer).
For example, you can do:
.. code:: bash
cd $ARES_SOURCE/extra
git clone git@bitbucket.org:/bayesian_lss_team/dm_sheet.git dm_sheet
and :ref:`rebuild <building>`.
Use
---
To use dm_sheet in ``borg_forward``, use the flag ``--dmsheet``. New
fields are then added to the :ref:`output files<outputs>`.
Contributors
------------
The main authors of this module are:
- Florent Leclercq
- Guilhem Lavaux
To add more features, please contact these people, or submit pull
requests.
Additional contributions from:
- James Prideaux-Ghee
References
----------
- T. Abel, O. Hahn, R. Kaehler (2012), Tracing the Dark Matter Sheet in Phase Space, arXiv:1111.3944
- O. Hahn, R. Angulo, T. Abel (2015), The Properties of Cosmic Velocity Fields, arXiv:1404.2280
- F. Leclercq, J. Jasche, G. Lavaux, B. Wandelt, W. Percival (2017), The phase-space structure of nearby dark matter as constrained by the SDSS, arXiv:1601.00093

View file

@ -0,0 +1,109 @@
hmclet
======
Guilhem has developped a much smaller variant of the Hamiltonian Markov
Chain algorithm to jointly sample a limited set of parameters (like <
100).
This is **HMCLET**: a small extra HMC framework for |a| to allow sampling a bunch of model parameters
together. It provides a self calibration step to estimate the masses for
the HMC.
Setup
-----
The code is available in "hmclet" module . To use it, clone this
repository into extra/hmclet in ARES source tree. You can for example
do:
.. code:: bash
cd $ARES_SOURCE/extra
git clone https://bitbucket.org/bayesian_lss_team/hmclet.git hmclet
Once it is checked out you can move to the build directory and run
``cmake .``, then ``make`` and you will have the new module compiled.
You can run ``libLSS/tests/test_hmclet`` to check that no error is
triggered and verify the content of "test_sample.h5". It must contain a
chain with 2 parameters for which the first one oscillates around 1 with
a variance of 10, and the other oscillates around 4 with a variance of
2.
Use
---
The Little HMC (HMClet, like Applet) framework consists in two classes
in the namespace ``LibLSS::HMCLet``:
- JointPosterior, which is the one acting like a parent to your class
describing the log-posterior,
- SimpleSampler, which is using an instance of JointPosterior to
generate samples using the HMC algorithm.
There is a demonstration (and test case) available in
libLSS/tests/test_hmclet.cpp, please have a look at it.
To use SingleSampler you have to make a new class derivative of
JointPosterior and implement three functions:
- ``getNumberOfParameters()`` which returns an integer corresponding to
the number of parameters supported by your posterior
- ``evaluate(parameters)`` which returns the opposite of the
log-posterior (i.e. like chi2/2)
- ``adjointGradient(parameters, adjoint_gradient)`` which fills the
adjoint gradient vector corresponding to the given parameters.
An example is as follow:
.. code:: cpp
class MyPosterior: virtual public JointPosterior {
public:
/* Bla bla for constructor and destructor */
virtual size_t getNumberOfParameters() const {
return 1;
}
virtual double evaluate(VectorType const& params) {
return 0.5 * square(params[0]-1)/10.;
}
virtual void adjointGradient(VectorType const& params, VectorType& params_gradient) {
params_gradient[0] = (params[0]-1)/10.;
}
};
The above posterior will represent a Gaussian distribution centered on
one, with a variance of 10. It depends on a single parameter.
The sampling would occur like this:
.. code:: cpp
auto posterior = std::make_shared<MyPosterior>();
SimpleSampler sampler(posterior);
/* Calibrate the mass matrix.
* comm: MPI communication
* rgen: Random number generator
* steps: number of steps to attempt for calibration
* init_params: initial parameters to start calibration
* init_step: typical step size to start with
*/
sampler.calibrate(comm, rgen, steps, init_params, init_step);
/* Generate a sample with HMC
* comm: MPI communication
* rgen: Random number generator
* params: current parameter state
*/
sampler.newSample(comm, rgen, init_params);
Contributors
------------
- Guilhem Lavaux
- Jens Jasche
You can submit pull requests to the BLSS team admin.

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,591 @@
Python
======
This pages presents the features of the ARES/BORG Python module
Installation
------------
``bash get-aquila-modules.sh --clone`` automatically retrieves the
module.
Use the ``--python`` flag in ``build.sh`` (see :ref:`building <building>`). The
python package installation is automatic if you run ``make install``. At the end
of the make phase, a python module will be installed in the user site-package
directory and made available to python VM. If you also require to run with
python defined likelihood (see :ref:`how to write a likelihood in python
<building_python_likelihood_script>`) with hades then you also need to append
``--hades-python`` while executing ``build.sh``. This requirement will probably
go away later.
.. note::
If you compile with MPI support the Python binding interface
will look for the MPI4PY package. If it is not found, it will just
proceed as usual. However, if it is found, the MPI4PY must have been
compiled with the *same* MPI framework as ARES/BORG. Not doing so will
very likely result in a segmentation fault when importing borg. A
succesfull import will look like the following:
.. code:: python3
>>> import borg
Initializing console.
[INFO ] libLSS version v2.0.0alpha-47-g7d560cc built-in modules ares_fg;borg;dm_sheet;hades;hmclet;python
[INFO S ] Registered forward models:
[INFO S ] - 2LPT_CIC
[INFO S ] - 2LPT_CIC_OPENMP
[INFO S ] - 2LPT_DOUBLE
[INFO S ] - 2LPT_NGP
[INFO S ] - Downgrade
[INFO S ] - EnforceMass
[INFO S ] - HADES_LOG
[INFO S ] - HADES_PT
[INFO S ] - Haar
[INFO S ] - LPT_CIC
[INFO S ] - LPT_CIC_OPENMP
[INFO S ] - LPT_DOUBLE
[INFO S ] - LPT_NGP
[INFO S ] - PATCH_MODEL
[INFO S ] - PM_CIC
[INFO S ] - PM_CIC_OPENMP
[INFO S ] - PRIMORDIAL
[INFO S ] - PRIMORDIAL_FNL
[INFO S ] - Softplus
[INFO S ] - TRANSFER_EHU
[INFO S ] - Transfer
[INFO S ] - Upgrade
[INFO S ] - bias::BrokenPowerLaw
[INFO S ] - bias::DoubleBrokenPowerLaw
[INFO S ] - bias::EFT
[INFO S ] - bias::EFT_Thresh
[INFO S ] - bias::Linear
[INFO S ] - bias::ManyPower_1^1
[INFO S ] - bias::ManyPower_1^2
[INFO S ] - bias::ManyPower_1^4
[INFO S ] - bias::ManyPower_2^2
[INFO S ] - bias::Noop
[INFO S ] - bias::PowerLaw
[INFO ] Found MPI4PY.
[INFO ] CPU features: MMX [!AVX] [!AVX2] SSE SSE2 [!SSE3] [!SSE4.1] [!SSE4.2]
>>>
As you can see there is a line "Found MPI4PY".
Usage
-----
First step:
.. code:: python
import borg
# This retrieve the console management object
console = borg.console()
# This prints at the STD level
console.print_std("Hello!")
# Reduce verbosity
console.setVerboseLevel(2)
.. _building_your_first_chain:
Building your first chain
-------------------------
The BORG python pipeline closely follow the BORGForwardModel v2 API.
This means that the input is assumed to be Gaussian random number with
unit variance in Fourier space. Fortunately the generation of such
numbers is easy:
.. code:: python3
import numpy as np
# Define a physical box (that is optional for this step, but it will be useful later
box = borg.forward.BoxModel()
box.L = (200,200,200)
box.N = (64,64,64)
# Generate gaussian random numbers, Fourier transform them, and rescale to ensure unit-variance
ic = np.fft.rfftn(np.random.randn(*box.N))/box.N[0]**(1.5)
In the above code snippet we have also defined a BORG box, which is at
the moment limited to 3d. ``box.L`` is the physical size (in Mpc/h) of
the box in each direction, while ``box.N`` is the grid size. In the
above you see that the Fourier transformed density has been rescaled by
:math:`1/\sqrt{N^3}`. This comes because of simple linear algebraic
properties, and the requirement of unit variance in the Fourier
representation.
Now we need to create a new chain object:
.. code:: python3
chain = borg.forward.ChainForwardModel(box)
chain.addModel(borg.forward.models.HermiticEnforcer(box))
We have immediately added an element that enforces that the elements of
the input Fourier density field to be self-complex conjugated. This is
not strictly required here as ``ic`` was generated by ``np.fft.rfftn``.
Our first real element of the chain is the injection of primordial
gravity fluctuation:
.. code:: python3
chain.addModel(borg.forward.models.Primordial(box, 0.1))
This multiplies in Fourier space the input density with a function:
:math:`A(k) \propto -k^{n_S/2-2}` The exact constant of proportionality
depends on :math:`\sigma_8` (or :math:`A_S`), the volume and the Hubble
constant. Note the ``0.1`` which indicates the scale factor at which the
potential is seeded in the chain. The next elements depend on that
number.
The next element is to add a physical transfer function to produce
density fluctuations out of this gravitational potential:
.. code:: python3
chain.addModel(borg.forward.models.EisensteinHu(box))
This is a simple Einsenstein & Hu power spectrum, which does not change
the scale factor of the universe.
Now we need to add a real gravity solver. One simple solver is provided
by "BorgLpt" (BORG 1-Lagrangian Perturbation Theory, also known as
Zel'dovich approximation).
.. code:: python3
lpt = borg.forward.models.BorgLpt(box=box, box_out=box, ai=0.1, af=1.0, supersampling=4)
chain.addModel(lpt)
(**Question from Andrija**: What does the supersampling param control?
The ai and af look intuitive enough, for initial scale factor and final
one essentially controlling the time, but supersampling I don't
understand. Also doing help(borg.forward.models.BorgLpt) didn't help me
much in understanding)
In the above case we keep the object ``lpt`` in the current scope to be
able to access more internal state later.
We can now setup the cosmology:
.. code:: python3
cosmo_par = borg.cosmo.CosmologicalParameters()
cosmo_par.default()
print(repr(cosmo_par))
chain.setCosmoParams(cosmo_par)
We have used some sane defaults for the cosmology in the above. The
values of the parameters are printed using the print statement. All the
elements of the chain are being updated with the last statement. They
try to do this "lazily", i.e. if the cosmology has not changed nothing
will happen (as updating the internal cached state may be very costly).
The model is run with ``chain.forwardModel_v2(ic)``, which goes through
the entire chain. The final density field is not yet produced. To do
this we need to request it explicitly:
.. code:: python3
rho = np.empty(chain.getOutputBoxModel().N)
chain.getDensityFinal(rho)
``rho`` holds now density contrast of the simulation. In IPython, one
can show check a slice using:
.. code:: python3
from matplotlib import pyplot as plt
plt.imshow(rho[:,:,chain.getOutputBoxModel().N[2]//2])
plt.show()
Computing the adjoint gradient
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The evaluation of the adjoint gradient follows the same pattern as for
the forward evaluation. Instead of the pair ``forwardModel_v2`` and
``getDensityFinal``, one must use ``adjointModel_v2`` and
``getAdjointModel``. However keep in mind the shapes of the arrays are
reversed: ``adjointModel_v2`` requires an array according to the output
of the forward model. Thus we have:
.. code:: python3
dlogL_drho = np.empty(chain.getOutputBoxModel().N)
# Here fill up dlogL_drho from the gradient of the likelihood
chain.adjointModel_v2(dlogL_drho)
ic = np.empty(chain.getBoxModel().N)
chain.getAdjointModel(ic)
Note also that we have requested the initial conditions in real
representation (and not Fourier). A Fourier representation may be
requested by providing an adequate sized complex array.
Computing the velocity field
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BORG comes pre-bundled with velocity field estimator (along with their
adjoint gradient of course). A very simple estimator is provided by the
CIC density estimator. It requires a particle based simulator to
estimate the velocity field from. Such particle based simulators are for
example BorgLpt, Borg2Lpt or BorgPM. If the types are not compatible, an
exception will be thrown.
The usage is simple, here is an example:
.. code:: python3
vmodel = borg.forward.velocity.CICModel(box, lpt)
out_v = np.empty((3,)+box.N)
vmodel.getVelocityField(out_v)
The first statement creates the velocity field estimator, with the
requested box to be produced and the particle based forward model
``lpt`` (same variable as in the :ref:`section "Building your first chain" <building_your_first_chain>`). The second statement
allocates the required memory. The last statement triggers the
computation. The above statements shall be run after executing
``forwardModel_v2`` on the ``chain`` object.
One can then show a slice (here of the x-component), and the check the
compatibility with the density field:
.. code:: python3
plt.imshow(out_v[0,:,:,chain.getOutputBoxModel().N[2]//2])
plt.show()
Computing some bias models directly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PyBORG has a submodule called "bias" which provides a direct route to
some of the bundled bias models (in C++ those are the generic bias
models). Not all models are linked in though. The usage is relatively
straightforward. We will use EFTBiasDefault as an example:
.. code:: python3
import numpy as np
import borg
boxm = borg.forward.BoxModel()
model = borg.forward.models.HadesLinear(boxm, 0.1, 1.0)
bias_model = borg.bias.EFTBiasDefault(0.1)
density = np.random.normal(size=boxm.N)
biased_density = np.zeros(boxm.N)
params = np.ones(7)
bias_model.compute(model, 1.0, params, density, biased_density)
The example starts by loading the ``borg`` module. Then we just
construct a forward model element for the example using ``HadesLinear``.
In your code that should be a reasonable element that you used to
produce the matter density field. The bias model may try to discuss
directly with that element so it is a good practice to really provide
meaningful elements. Then we construct a bias model object
``EFTBiasDefault``. This one has a mandatory argument to specify the
``Lambda`` parameter in that specific model, which we set to
:math:`0.1h \mathrm{Mpc}^{-1}` here. The next steps are just
initialization of the field used for ``bias_model.compute``. As can be
directly inferred from the call the following arguments are required:
- a borg forward model (``model``)
- the value of nmean (though it could be ignored depending on the
specific bias model)
- a 1d numpy array of float64 for the parameters of the model
- the 3d density contrast (``density``)
- the output 3d biased density (``biased_density``)
Running with MPI
----------------
Using MPI requires some care that is not completely handled
automatically.
One may initialize the python with MPI like this:
.. code:: python3
import numpy as np
import borg
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
In rank and size you will now have the rank of the current process in
the MPI communicator, and size will hold the total size. Then a typical
initialization chain of forward models may be constructed as indicated
:ref:`there <building_your_first_chain>`. Assuming that chain is such an
object one may query the expected slabs with ``getMPISlice()``
(for the input) and ``getOutputMPISlice()`` (for the output):
.. code:: python3
startN0,localN0,in_N1,in_N2 = chain.getMPISlice()
out_startN0,out_localN0,out_N1,out_N2 = chain.getOutputMPISlice()
These may be used like this:
.. code:: python3
x = np.zeros((localN0,in_N1,in_N2))
if localN0 > 0:
x[:,:,:] = ref_data[startN0:(startN0+localN0),:,:]
with ``ref_data`` some array that covers the entire box. As you can see,
the ``x`` array requires only the part between startN0 and
startN0+localN0 of that array. In practice that array (``ref_data``) may
not have to exist in memory.
Then ``x`` may be directly provided to ``chain.forwardModel_v2`` as a
first argument. The output density field follow the same rule as the
input density field.
Writing a new forward model
---------------------------
The interface of the forward model in python closely follow the one in
C++. The basic skeleton is given by the following lines of code:
.. code:: python3
import jax
class MyModel(borg.forward.BaseForwardModel):
# Constructor
def __init__(self, box):
super().__init__(box, box)
# IO "preferences"
def getPreferredInput(self):
return borg.forward.PREFERRED_REAL
def getPreferredOutput(self):
return borg.forward.PREFERRED_REAL
# Forward part
def forwardModel_v2(self, input_array):
self.save = jax.numpy.array(input_array)
def getDensityFinal(self, output_array):
output_array[:] = self.save**2
# Adjoint part
def adjointModel_v2(self, input_ag):
self.ag = input_ag
def getAdjointModel(self, output_ag):
output_ag[:] = 2 * self.ag * self.save
There are four main group in the function that needs be implemented:
- the constructor. It is crucial that the constructor of the parent is
explicitly called. Otherwise the interface will not work. The parent
constructor takes two argument: the input box (of type
``borg.forward.BoxModel``) and the output box (same type).
- the function providing the "preferred IO" for the forward and adjoint
functions. In practice the preferrence is enforced for python. This
means that the value indicated here will change the kind of arrays
that are provided to the forward and adjoint part. At the moment two
type of IO are possible:
- PREFERRED_REAL: the model wants a 3d real space representation as
an argument
- PREFERRED_FOURIER: the model wants a 3d fourier space
representation as an argument
- then the forward evaluation part itself has to be implemented in two
pieces: ``forwardModel_v2`` and ``getDensityFinal`` (it is optional
depending on what is put after that model). It is expected that
``forwardModel_v2`` executes the main part of the computation but it
is not fully required.
- finally the computation of the adjoint gradient follows the same
pattern as the forward computation. The difference is that the types
and shapes of arrays are reversed. ``input_ag`` has a shape/type
corresponding to the **output** and ``output_ag`` to the **input**.
Finally, as shown above, the input/output array are using a numpy
interface. They can thus be used in JAX/Tensorflow/whatever. In the
example code above the input array is saved in a jax array and evaluated
later. This is legit, though bear in mind that means there will be
memory that will not be freed while you retain that reference.
.. _building_python_likelihood_script:
Build a python likelihood script
--------------------------------
Ini file
~~~~~~~~
.. code:: text
[python]
likelihood_path=test_likelihood.py
bias_sampler_type=slice
The hades_python initializers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A typical python likelihood requires three initialization function. They
must be registered using the helper decorators
borg.registerGravityBuilder (for the forward model),
borg.registerLikelihoodBuilder (for the bias+likelihood part),
borg.registerSamplerBuilder (for extra sampling strategies).
An example of their use is the following piece of code:
.. code:: python3
import borg
@borg.registerGravityBuilder
def build_gravity_model(state, box):
global model
chain = borg.forward.ChainForwardModel(box)
chain.addModel(borg.forward.models.HermiticEnforcer(box))
chain.addModel(borg.forward.models.Primordial(box, 1.0))
chain.addModel(borg.forward.models.EisensteinHu(box))
model = chain
return chain
@borg.registerLikelihoodBuilder
def build_likelihood(state, info):
boxm = model.getBoxModel()
return MyLikelihood(model, boxm.N, boxm.L)
@borg.registerSamplerBuilder
def build_sampler(state, info):
return []
The build_gravity_model function returns a BORGForwardModel object, and
take a MarkovState and a BoxModel as parameters. The build_likelihood
function must return a Likelihood3d object (check
help(borg.likelihood.Likelihood3d)). Finally build_sampler must return a
list of sampler object.
The forward model elements can be either the C++ or Python object and
both work transparently. Likelihoods may also be written in pure python
though MPI is still untested at this time (August 2020).
Writing a likelihood
~~~~~~~~~~~~~~~~~~~~
In the previous section we have seen how to build the objects required
by hades_python to analyze data. We have not approached how to write a
likelihood in python. A lot of likelihood and bias are already available
from the C++ side, for example ``borg.likelihood.GaussianPassthrough``,
``borg.likelihood.GaussianLinear`` or
``borg.likelihood.PoissonPowerLaw``. To create new ones easily in
python, one has to write a class inheriting from
``borg.likelihood.BaseLikelihood`` and implement a number of functions.
An example of a simple gaussian likelihood is shown herebelow:
.. code:: python3
import borg
cons = borg.console()
myprint = lambda x: cons.print_std(x) if type(x) == str else cons.print_std(
repr(x))
class MyLikelihood(borg.likelihood.BaseLikelihood):
def __init__(self, fwd, N, L):
myprint(f" Init {N}, {L}")
super().__init__(fwd, N, L)
def initializeLikelihood(self, state):
myprint("Init likelihood")
self.data = state['galaxy_data_0']
state.newArray3d("my_density_field", True, self.data.shape[0],
self.data.shape[1], self.data.shape[2])
def updateMetaParameters(self, state):
cpar = state['cosmology']
myprint(f"Cosmology is {cpar}")
self.getForwardModel().setCosmoParams(cpar)
def generateMockData(self, s_hat, state):
fwd = self.getForwardModel()
output = np.zeros(fwd.getOutputBoxModel().N)
fwd.forwardModel_v2(s_hat)
fwd.getDensityFinal(output)
state['galaxy_data_0'][:] = output + np.random.normal(
size=output.shape) * sigma_noise
state['my_density_field'][:] = output
like = ((state['galaxy_data_0'][:] - output)**2).sum() / sigma_noise**2
myprint(
f"Initial log_likelihood: {like}, var(s_hat) = {np.var(s_hat)}")
def logLikelihoodComplex(self, s_hat, gradientIsNext):
fwd = self.getForwardModel()
output = np.zeros(fwd.getBoxModel().N)
fwd.forwardModel_v2(s_hat)
fwd.getDensityFinal(output)
L = 0.5 * ((output - self.data)**2).sum() / sigma_noise**2
myprint(f"var(s_hat): {np.var(s_hat)}, Call to logLike: {L}")
return L
def gradientLikelihoodComplex(self, s_hat):
fwd = self.getForwardModel()
output = np.zeros(fwd.getOutputBoxModel().N)
fwd.forwardModel_v2(s_hat)
fwd.getDensityFinal(output)
mygradient = (output - self.data) / sigma_noise**2
fwd.adjointModel_v2(mygradient)
mygrad_hat = np.zeros(s_hat.shape, dtype=np.complex128)
fwd.getAdjointModel(mygrad_hat)
return mygrad_hat
The function ``myprint`` is an helper to create nice output that streams
correctly with the rest of the C++ code. It is not mandatory but
strongly recommended to use the borg.console() object as it will
seemlessly integrate with other BORG tools.
We will now look at each function one after the other:
- ``__init__`` is the constructor. It is crucial that the base
constructor is called in the constructor of the new class: it will
not be done implicitly by the python virtual machine. The base
constructor takes a ``BORGForwardModel`` object, and the grid
specifications ``N`` and ``L`` as tuples.
- ``initializeLikelihood`` is called at the initialization of the chain
and before restoration. If you want to store additional fields in the
mcmc, you should allocate them at that moment in the state object. In
the above example, a new 3d array is allocated to store the density
field after the forward model evaluation.
Note that the forward model has to be evaluated in the log likelihood
and its gradient. Though it is in principle required to implement
logLikelihood and gradientLikelihood (the real counterpart of the
complex functions hereabove), in practice they are not used for the run.
More python jupyter tutorials
-----------------------------
.. toctree::
:maxdepth: 1
extras/python-jupyter/PM-tCOLA
* A notebook to showcase tCOLA and its convergence by considering at :math:`P(k)` is here__.
__ extras/python-jupyter/PM-tCOLA.ipynb

View file

@ -0,0 +1,4 @@
virbius
=======
*To be written...*

View file

@ -0,0 +1,10 @@
Inputs
######
.. include:: inputs/Configuration_file_v1.inc.rst
.. include:: inputs/Configuration_file_v2.inc.rst
.. include:: inputs/Configuration_file_v2.1.inc.rst
.. include:: inputs/Create_config-file.inc.rst
.. include:: inputs/Text_catalog_format.inc.rst
.. include:: inputs/HDF5_catalog_format.inc.rst
.. include:: inputs/Radial_selection.inc.rst

View file

@ -0,0 +1,249 @@
.. _configuration_file:
ARES_Configuration_file_v1
==========================
The configuration file for ARES uses the INI file syntax. It is
separated into sections among which three are main sections.
Main sections
-------------
Section [system]
~~~~~~~~~~~~~~~~
- console_output: Holds the prefix filename for all log output files.
- VERBOSE_LEVEL: Set the verbosity level for the console. Files get all
outputs.
- N0: Number of grid elements along the X axis.
- N1: Same for Y axis.
- N2: Same for Z axis.
- L0: Comoving length of the X axis
- L1: Same for Y axis
- L2: Same for Z axis
- corner0: Center of the voxel at the corner of the box in -X
direction, this should be the smallest X value.
- corner1: Same for Y
- corner2: Same for Z
- NUM_MODES: number of bins to represent the power spectrm
- N_MC: Maximum number of markov chain samples to produce in a single
run (**Note:** Used only for *v1*)
- borg_supersampling: Supersampling level of the grid for intermediate
calculations. The number of particles is
N0*N1*N2*borg_supersampling**3
- hades_likelihood: Likelihood to use in HADES run. Can be either one
of those values:
- BORG_POISSON: Use poisson likelihood
- BORG_POISSON_POWER:
- BORG_VOODOO:
- BORG_VOODOO_MAGIC:
- BORG_LINEAR: ARES likelihood model. Noise is Gaussian with
Variance equal to :math:`S \bar{N}`. Use power law bias.
- BORG_SH:
- BORG_NB: Negative binomial. Broken power law bias.
- Generic framework:
- GAUSSIAN_BROKEN_POWERLAW_BIAS
- GAUSSIAN_MO_WHITE_BIAS: Gaussian noise model, variance is
fitted. Double power law bias
- GAUSSIAN_POWERLAW_BIAS
- GAUSSIAN_2ND_ORDER_BIAS
- GENERIC_POISSON_BROKEN_POWERLAW_BIAS
- GENERIC_GAUSSIAN_LINEAR_BIAS
- GENERIC_GAUSSIAN_MANY_POWER_1^1
- GENERIC_GAUSSIAN_MANY_POWER_1^2
- GENERIC_GAUSSIAN_MANY_POWER_1^4
- GENERIC_POISSON_MANY_POWER_1^1
- GENERIC_POISSON_MANY_POWER_1^2
- GENERIC_POISSON_MANY_POWER_1^4
- hades_forward_model: Forward model to use
- LPT: Lagrangian perturbation theory, ModifiedNGP/Quad final
projection
- 2LPT: Second order Lagrangian perturbation theory,
ModifiedNGP/Quad final projection
- PM: Particle mesh, ModifiedNGP/Quad final projection
- LPT_CIC: Same as LPT, but use CIC for final projection
- 2LPT_CIC: Same as LPT, but use CIC for final projection
- PM_CIC: Same as LPT, but use CIC for final projection
- HADES_LOG: Use Exponential transform (HADES model) for the forward
model. Preserved mean density is enforced.
- borg_do_rsd: Do redshift space distortion if set to "true".
- projection_model: Specifies which projection to use for data. No
constraints are enforced on the likelihood, but of course they should be matched
to the value adopted here. The value is inspected in ``src/common/projection.hpp``.
There are two available at the moment: ``number_ngp`` and ``luminosity_cic``.
The ``number_ngp`` is just Nearest-Grid-Point number counting.
The ``luminosity_cic`` uses the value in ``Mgal`` to weight the object
before doing CIC projection.
- number_ngp: it just counts the number of galaxies/objects within a voxel
- luminosity_cic: it weights galaxies by their luminosity and do a CIC projection.
- test_mode: Runs ARES/BORG/HADES in test mode. Data is not used, mock
data is generated on the fly.
- seed_cpower: Set to true to seed the power spectrum with the correct
one according to the cosmology section. Otherwise it is set to a
small fraction of it.
- hades_max_epsilon: Stepsize for the HMC. It is unitless. Good
starting point is around 0.01.
- hades_max_timesteps: Maximum number of timesteps for a single HMC
sample.
- hades_mixing: Number of samples to compute before writing to disk.
- savePeriodicity: This reduces the number of times the restart files
are dumped to the hard drives. This is useful for reducing I/Os, as
restart files are heavy. You can set this to a number that is a
multiple of the number of mcmc steps. For example, 20 tells ares to
dump restart files every 20 mcmc steps.
- mask_precision: Precision to which you want to compute the mask. By
default it is "0.01", which is not related to the actual precision
(unfortunately not yet). It allows scaling the internal number of
evaluation of the selection function. So 0.001 will call it 100 times
more. The advice is not to decrease below 0.01.
- furious_seeding: if set to true the core sampler will reseed itself
from a system entropy source at each step of the MCMC. That means the
MCMC becomes unpredictable and the seed number is discarded.
- simulation: if set to true switches to N-body simulation analysis.
Additional cuts are possible depending on masses, spins, etc, of
halos.
Likelihoods that use the generic bias framework (currently
GAUSSIAN_MO_WHITE_BIAS) supports also the following tags:
- bias_XX_sampler_generic_blocked: if sets to true, it will not
sampling the XX parameter of the bias. XX varies depending on the
likelihood.
- block_sigma8_sampler: true by default, to sample sigma8 in the
initial conditions, sets this to false
Section [run]
~~~~~~~~~~~~~
- NCAT: Number of catalogs. This affects the number of "catalog"
sections.
- SIMULATION: Specify if the input is from simulation. Default is
false.
Section [cosmology]
~~~~~~~~~~~~~~~~~~~
- omega_r: Radiation density
- omega_k: Curvature
- omega_m: Total matter density
- omega_b: Baryonic matter density
- omega_q: Quintescence density
- w: Quintescence equation of state
- wprime: Derivative of the equation of state
- n_s: Slope of the power spectrum of scalar fluctuations
- sigma8: Normalisation of powerspectrum at 8 Mpc/h
- h100: Hubble constant in unit of 100 km/s/Mpc
Section [julia]
~~~~~~~~~~~~~~~
- likelihood_path: Path to the julia file describing the likelihood
(i.e. the main entry point for BORG in the likelihood)
- likelihood_module: Name of the julia module holding the likelihood
- bias_sampler_type: slice or hmclet, which sampling strategy to use to
sample the "bias" parameters
- ic_in_julia: true or false, whether the initial condition of the
Markov Chain is set in julia
- hmclet_diagonalMass: whether to use a diagonal or a dense mass matrix
estimed on the fly
- hmclet_burnin: number of steps allowed in "BURN IN" mode. This
depends on the complexity of the likelihood. A few hundred seems
reasonable.
- hmclet_burnin_memory: size of the memory in "BURN IN" mode. Something
like 50 is advocated to be sure it is fairly local but not too noisy.
- hmclet_maxEpsilon: maximum epsilon for the HMC integrator (take order
0.01)
- hmclet_maxNtime: maximum number of timesteps for the HMC integrator
(take a few decade like 20-50)
Catalog sections
----------------
Basic fields
~~~~~~~~~~~~
- datafile: Text filename holding the data
- maskdata: Healpix FITS file with the mask
- radial_selection: Type of selection function, can be either
"schechter", "file" or "piecewise".
- refbias: true if this catalog is a reference for bias. Bias will not
be sampled for it
- bias: Default bias value, also used for mock generation
- nmean: Initial mean galaxy density value, also used for mock
generation
Halo selection
~~~~~~~~~~~~~~
- halo_selection: Specifying how to select the halos from the halo catalog. Can be ``mass, radius, spin or mixed``. The ``mixed`` represents the combined cuts and can be applied by specifying, eg "halo_selection = mass radius"
- halo_low_mass_cut: this is log10 of mass in the same unit as the
masses of the input text file
- halo_high_mass_cut: same as for halo_low_mass_cut, this is log10 of
mass
- halo_small_radius_cut
- halo_large_radius_cut
- halo_small_spin_cut
- halo_high_spin_cut
Schechter selection function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- schechter_mstar: Mstar for Schechter function
- schechter_alpha: Power law slope of Schechter function
- schechter_sampling_rate: How many distance points to precompute from
Schechter (i.e. 1000)
- schechter_dmax: Maximum distance to precompute Schecter selection
functino
- galaxy_bright_apparent_magnitude_cut: Apparent magnitude where data
and selection must be truncated, bright end.
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in data
and selection function, bright end, useful to select different galaxy
populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- zmin: Minimum redshift for galaxy sample, galaxies will be truncated
- zmax: Maximum redshift for galaxy sample, galaxies will be truncated
'File' selection function
~~~~~~~~~~~~~~~~~~~~~~~~~
- radial_file: Text file to load the selection from
The file has the following format. Each line starting with a '#' is a
comment line, and discarded. The first line is a set of three numbers:
'rmin dr N'. Each line that follows must be a number between 0 and 1
giving the selection function at a distance r = rmin + dr \* i, where
'i' is the line number (zero based). Finally 'N' is the number of points
in the text file.
Two possibilities are offered for adjusting the catalog and the
selection together:
- either you chose not to do anything, and take the whole sample and
provided selection. Then you need to specify:
- file_dmin: Minimal distance for selection function and data
- file_dmax: same but maximal distance
- no_cut_catalog: set to false, if you do not set this you will get
an error message.
- or you want ares to preprocess the catalog and then you need:
- zmin
- zmax
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in
data and selection function, bright end, useful to select
different galaxy populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- no_cut_catalog: (not necessary, as it defaults to true)

View file

@ -0,0 +1,378 @@
ARES_Configuration_file_v2.1
============================
The configuration file for ARES uses the INI file syntax. It is
separated into sections among which three are main sections.
Main sections
-------------
Section [system]
~~~~~~~~~~~~~~~~
- console_output: Holds the prefix filename for all log output files.
- VERBOSE_LEVEL: Set the verbosity level for the console. Files get all
outputs.
- N0: Number of grid elements along the X axis.
- N1: Same for Y axis.
- N2: Same for Z axis.
- **Optionally:**
- Ndata0, Ndata1, Ndata2 specifies the same thing as N0, N1, N2 but
for the projection grid of the galaxy positions. This grid must be
different in the case the degrader bias pass is used (see bias
model section)
- L0: Comoving length of the X axis
- L1: Same for Y axis
- L2: Same for Z axis
- corner0: Center of the voxel at the corner of the box in -X
direction, this should be the smallest X value.
- corner1: Same for Y
- corner2: Same for Z
- NUM_MODES: number of bins to represent the power spectrm
- projection_model: Specifies which projection to use for data. No
constraints are enforced on the likelihood, but of course they should be matched
to the value adopted here. The value is inspected in ``src/common/projection.hpp``.
There are two available at the moment: ``number_ngp`` and ``luminosity_cic``.
The ``number_ngp`` is just Nearest-Grid-Point number counting.
The ``luminosity_cic`` uses the value in ``Mgal`` to weight the object
before doing CIC projection.
- number_ngp: it just counts the number of galaxies/objects within a voxel
- luminosity_cic: it weights galaxies by their luminosity and do a CIC projection.
- test_mode: Runs ARES/BORG/HADES in test mode. Data is not used, mock
data is generated on the fly.
- seed_cpower: Set to true to seed the power spectrum with the correct
one according to the cosmology section. Otherwise it is set to a
small fraction of it.
- savePeriodicity: This reduces the number of times the restart files
are dumped to the hard drives. This is useful for reducing I/Os, as
restart files are heavy. You can set this to a number that is a
multiple of the number of mcmc steps. For example, 20 tells ares to
dump restart files every 20 mcmc steps.
- mask_precision: Precision to which you want to compute the mask. By
default it is "0.01", which is not related to the actual precision
(unfortunately not yet). It allows scaling the internal number of
evaluation of the selection function. So 0.001 will call it 100 times
more. The advice is not to decrease below 0.01.
- furious_seeding: if set to true the core sampler will reseed itself
from a system entropy source at each step of the MCMC. That means the
MCMC becomes unpredictable and the seed number is discarded.
Section [block_loop]
~~~~~~~~~~~~~~~~~~~~
- hades_sampler_blocked: Prevents the density field from being sampled
Likelihoods that use the generic bias framework (currently
GAUSSIAN_MO_WHITE_BIAS) supports also the following tags:
- bias_XX_sampler_generic_blocked: if sets to true, it will not
sampling the XX parameter of the bias. XX varies depending on the
likelihood. '''WARNING: the code has not yet been updated to look for
these variables in [block_loop], they should still be located in
[system] at the moment. '''
- sigma8_sampler_blocked: true by default, to sample sigma8 in the
initial conditions, sets this to false
Section [mcmc]
~~~~~~~~~~~~~~
- number_to_generate: Maximum number of markov chain samples to produce
in a single run
- init_random_scaling: This is more specific to HADES. It starts the
MCMC run with a random initial condition, scaled with this number
(default 0.1) compared to the reference initial powerspectrum.
- random_ic: true if ic must be reshuffled before starting the MCMC
sampling, false to keep them at their value generated by the mock
data generator
- scramble_bias: true (default), reset the bias values to some other
values before starting the chain, after generating the mock.
Section [gravity]
~~~~~~~~~~~~~~~~~
- model: Forward model to use
- LPT: Lagrangian perturbation theory, ModifiedNGP/Quad final
projection
- 2LPT: Second order Lagrangian perturbation theory,
ModifiedNGP/Quad final projection
- PM: Particle mesh, ModifiedNGP/Quad final projection
- LPT_CIC: Same as LPT, but use CIC for final projection
- 2LPT_CIC: Same as 2LPT, but use CIC for final projection
- PM_CIC: Same as PM, but use CIC for final projection
- tCOLA: Same as PM_CIC but uses a TCOLA gravity machine. To enable,
specify model=PM_CIC, as above, AND set tCOLA=true.
- HADES_LOG: Use Exponential transform (HADES model) for the forward
model. Preserved mean density is enforced.
- supersampling: Controls the number of particles (supersampling level
of the particle grid with respect to the grid). The number of
particles is (N0*N1*N2*borg_supersampling)**3
- forcesampling
- a_initial
- a_final
- pm_start_z:
- pm_nsteps:
- part_factor:
- lightcone:
- do_rsd: Do redshift space distortion if set to "true".
Forward model elements can as well be chained and have different grid sizes. *"model"* can now be CHAIN, which then needs a specific list of model layers in *"models"*.
Here is an example:
.. code:: text
[gravity]
model=CHAIN
models=PRIMORDIAL,TRANSFER_EHU,LPT_CIC
[gravity_chain_0]
a_final=0.001
[gravity_chain_1]
[gravity_chain_2]
supersampling=2
lightcone=false
do_rsd=false
a_initial=0.001
a_final=1.
part_factor=2.0
mul_out=1
Each element of the chain gets its own configuration section which is
the same as previously when it was a global descriptor (see above). Note
that it you use the chain mechanism, you have to be explicit on the
production of initial conditions power spectrum. As you can see above,
we indicate "PRIMORDIAL,TRANSFER_EHU" to start with a primordial
scale-free gravitational potential, onto which we apply an Einstein-Hu
transfer function to form density fluctuations, which are then passed
down to LPT_CIC. Also keep in mind that the scale factors must be
compatibles and no checks are run by the code at the moment. \`mul_out\`
specifices how much the output grid as to be supersampled for the CIC
(i.e. the CIC grid is produced at mul_out times the initial grid size).
Model 'Primordial'
^^^^^^^^^^^^^^^^^^
Apply a primordial scale free power spectrum on the input. The output is
scaled linearly to a_final.
Model 'Transfer'
^^^^^^^^^^^^^^^^
* **CIC correction**: use_invert_cic=true: Transfer function is inverse CIC smoother=0.99 (in unit of grid)
* **Sharp K filter**: use_sharpk=true: Transfer function is sharp k filter k_max=0.1 (in h/Mpc)
Model 'Softplus'
^^^^^^^^^^^^^^^^
Apply a softplus transform hardness=1.0 , some parameter making the
transition more or less harder
Model 'Downgrade'
^^^^^^^^^^^^^^^^^
(No option)
Section [hades]
^^^^^^^^^^^^^^^
- max_epsilon: Stepsize for the HMC. It is unitless. Good starting
point is around 0.01.
- max_timesteps: Maximum number of timesteps for a single HMC sample.
- mixing: Number of samples to compute before writing to disk.
- algorithm:
- HMC: classical HMC algorithm
- QN-HMC: Quasi-Newton HMC algorithm
- FROZEN-PHASE: Fixed phase. They are not sampled at all but provide
some pipelines to allow the other samplers to work.
- phases: if ``algorithm`` is FROZEN-PHASE, you can specify an HDF5
filename here. This file must contain a "phase" array which is
conforming to the setup of the ini.
- noPhasesProvided: if phases is omitted, this one has to be set to
true, otherwise an error is thrown.
- phasesDataKey: this indicate which field to use in the ``phases``
HDF5 file.
- likelihood: Likelihood to use in HADES run. Can be either one of
those values:
- LINEAR: Gaussian likelihood
- BORG_POISSON: Use poisson likelihood
- Generic framework:
- GAUSSIAN_BROKEN_POWERLAW_BIAS
- GAUSSIAN_MO_WHITE_BIAS: Gaussian noise model, variance is
fitted. Double power law bias
- GAUSSIAN_POWERLAW_BIAS: Power law bias model with a Gaussian
noise model, variance is fitted.
- GAUSSIAN_2ND_ORDER_BIAS
- GENERIC_POISSON_BROKEN_POWERLAW_BIAS: Broken power law bias
model (also called Neyrinck's model), with Poisson noise lmodel
- GENERIC_GAUSSIAN_LINEAR_BIAS: Linear bias model, Gaussian noise
model
- GENERIC_GAUSSIAN_MANY_POWER_1^1
- GENERIC_GAUSSIAN_MANY_POWER_1^2
- GENERIC_GAUSSIAN_MANY_POWER_1^4
- GENERIC_POISSON_MANY_POWER_1^1
- GENERIC_POISSON_MANY_POWER_1^2
- GENERIC_POISSON_MANY_POWER_1^4
- GENERIC_POISSON_POWERLAW_BIAS: simple power law bias model with
Poisson noise model
- GENERIC_POISSON_POWERLAW_BIAS_DEGRADE4: power law bias models
preceded by a degrade pass (N -> N/4 in each direction)
- GENERIC_POISSON_BROKEN_POWERLAW_BIAS_DEGRADE4: broken power law
bias model preceded by a degrade pass (N -> N/4 in each
direction)
- scheme: SI_2A, SI_2B, SI_2C, SI_3A, SI_4B, SI_4C, SI_4D, SI_6A
Section [run]
~~~~~~~~~~~~~
- NCAT: Number of catalogs. This affects the number of "catalog"
sections.
- SIMULATION: Specify if the input is from simulation. Default is
false.
Section [likelihood]
~~~~~~~~~~~~~~~~~~~~
- MainPower_prior_width: Variance of the manypower parameters (except
mean which is always uniform positive)
- EFT_Lambda: Lambda truncation parameter of the EFT bias model
- Options related to robust likelihood. Each patch of a robust likelihood can be sliced in the redshift direction.
There are two options controlling the slicing: the maximum distance "rmax" and the number of slices "slices"
* rmax: Maximum distance accessible during the inference. In practice it is at least the farthest distance of a voxel in the box.
Unit is the one of the box, most generally :math:`h^{-1}` Mpc.
* slices: Number of slices to build in the redshift direction. Each patch will have a depth ~rmax/slices.
Section [cosmology]
~~~~~~~~~~~~~~~~~~~
- omega_r: Radiation density
- omega_k: Curvature
- omega_m: Total matter density
- omega_b: Baryonic matter density
- omega_q: Quintescence density
- w: Quintescence equation of state
- wprime: Derivative of the equation of state
- n_s: Slope of the power spectrum of scalar fluctuations
- sigma8: Normalisation of powerspectrum at 8 Mpc/h
- h100: Hubble constant in unit of 100 km/s/Mpc
- fnl: primordial non-Gaussianity
Section [julia]
~~~~~~~~~~~~~~~
- likelihood_path: path of the julia code
- likelihood_module: julia module where the likelihood is implemented
- bias_sampler_type: type of sampler for the bias parameters (hmclet,
slice)
- ic_in_julia: whether initial conditions of the MCMC are coded in
julia or choose some random numbers
- hmclet_diagonalMass: where to use a diagonal mass matrix or a full
dense
- mass_burnin: number of MCMC steps in burnin mode
- mass_burnin_memory: number of MCMC steps to store when in burnin mode
- hmclet_maxEpsilon: maximum epsilon for the leapfrog integrator
(~0.002-0.01 depending on likelihood complexity)
- hmclet_maxNtime: maximum number of steps for the leapfrog integrator
(~50-100)
- hmclet_massScale: amount of momentum reshuffling (0.0 = full, 1.0 =
none bad for MCMC)
- hmclet_correlationLimiter: reduce the correlations in the covariance
matrix by some number. Typically the smaller the number the less
reduction with :math:`\simeq 1` reducing the correlation by 2.
Catalog sections
----------------
Basic fields
~~~~~~~~~~~~
- datafile: Text filename holding the data
- maskdata: Healpix FITS file with the mask
- radial_selection: Type of selection function, can be either
"schechter", "file" or "piecewise".
- refbias: true if this catalog is a reference for bias. Bias will not
be sampled for it
- bias: Default bias value, also used for mock generation
- nmean: Initial mean galaxy density value, also used for mock
generation
Halo selection
~~~~~~~~~~~~~~
- halo_selection: Specifying how to select the halos from the halo catalog. Can be ``mass, radius, spin or mixed``. The ``mixed`` represents the combined cuts and can be applied by specifying, eg "halo_selection = mass radius"
- halo_low_mass_cut: this is log10 of mass in the same unit as the
masses of the input text file
- halo_high_mass_cut: same as for halo_low_mass_cut, this is log10 of
mass
- halo_small_radius_cut
- halo_large_radius_cut
- halo_small_spin_cut
- halo_high_spin_cut
Schechter selection function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- schechter_mstar: Mstar for Schechter function
- schechter_alpha: Power law slope of Schechter function
- schechter_sampling_rate: How many distance points to precompute from
Schechter (i.e. 1000)
- schechter_dmax: Maximum distance to precompute Schecter selection
function
- galaxy_bright_apparent_magnitude_cut: Apparent magnitude where data
and selection must be truncated, bright end.
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in data
and selection function, bright end, useful to select different galaxy
populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- zmin: Minimum redshift for galaxy sample, galaxies will be truncated
- zmax: Maximum redshift for galaxy sample, galaxies will be truncated
'File' selection function
~~~~~~~~~~~~~~~~~~~~~~~~~
- radial_file: Text file to load the selection from
The file has the following format. Each line starting with a '#' is a
comment line, and discarded. The first line is a set of three numbers:
'rmin dr N'. Each line that follows must be a number between 0 and 1
giving the selection function at a distance r = rmin + dr \* i, where
'i' is the line number (zero based). Finally 'N' is the number of points
in the text file.
Two possibilities are offered for adjusting the catalog and the
selection together:
- either you chose not to do anything, and take the whole sample and
provided selection. Then you need to specify:
- file_dmin: Minimal distance for selection function and data
- file_dmax: same but maximal distance
- no_cut_catalog: set to false, if you do not set this you will get
an error message.
- or you want ares to preprocess the catalog and then you need:
- zmin
- zmax
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in
data and selection function, bright end, useful to select
different galaxy populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- no_cut_catalog: (not necessary, as it defaults to true)

View file

@ -0,0 +1,393 @@
ARES_Configuration_file_v2
==========================
The configuration file for ARES uses the INI file syntax. It is
separated into sections among which three are main sections.
Main sections
-------------
Section [system]
~~~~~~~~~~~~~~~~
- console_output: Holds the prefix filename for all log output files.
- VERBOSE_LEVEL: Set the verbosity level for the console. Files get all
outputs. Check inside ``libLSS/tools/log_traits.hpp`` for details.
- **Values**:
- VERBOSE_LEVEL=1 : up to STD level
- VERBOSE_LEVEL=2 : INFO level
- VERBOSE_LEVEL=3 : VERBOSE level
- VERBOSE_LEVEL=4 : DEBUG level
- N0: Number of grid elements along the X axis.
- N1: Same for Y axis.
- N2: Same for Z axis.
- **Optionally:**
- Ndata0, Ndata1, Ndata2 specifies the same thing as N0, N1, N2 but
for the projection grid of the galaxy positions. This grid must be
different in the case the degrader bias pass is used (see bias
model section)
- L0: Comoving length of the X axis
- L1: Same for Y axis
- L2: Same for Z axis
- corner0: Center of the voxel at the corner of the box in -X
direction, this should be the smallest X value.
- corner1: Same for Y
- corner2: Same for Z
- NUM_MODES: number of bins to represent the power spectrm
- projection_model: Specifies which projection to use for data. No
constraints are enforced on the likelihood, but of course they should be matched
to the value adopted here. The value is inspected in ``src/common/projection.hpp``.
There are two available at the moment: ``number_ngp`` and ``luminosity_cic``.
The ``number_ngp`` is just Nearest-Grid-Point number counting.
The ``luminosity_cic`` uses the value in ``Mgal`` to weight the object
before doing CIC projection.
- number_ngp: it just counts the number of galaxies/objects within a voxel\
- luminosity_cic: it weights galaxies by their luminosity and do a CIC projection.
- test_mode: Runs ARES/BORG/HADES in test mode. Data is not used, mock
data is generated on the fly.
- seed_cpower: Set to true to seed the power spectrum with the correct
one according to the cosmology section. Otherwise it is set to a
small fraction of it.
- savePeriodicity: This reduces the number of times the restart files
are dumped to the hard drives. This is useful for reducing I/Os, as
restart files are heavy. You can set this to a number that is a
multiple of the number of mcmc steps. For example, 20 tells ares to
dump restart files every 20 mcmc steps.
- mask_precision: Precision to which you want to compute the mask. By
default it is "0.01", which is not related to the actual precision
(unfortunately not yet). It allows scaling the internal number of
evaluation of the selection function. So 0.001 will call it 100 times
more. The advice is not to decrease below 0.01.
- furious_seeding: if set to true the core sampler will reseed itself
from a system entropy source at each step of the MCMC. That means the
MCMC becomes unpredictable and the seed number is discarded.
Section [block_loop]
~~~~~~~~~~~~~~~~~~~~
- hades_sampler_blocked: Prevents the density field from being sampled
Likelihoods that use the generic bias framework (currently
GAUSSIAN_MO_WHITE_BIAS) supports also the following tags:
- bias_XX_sampler_generic_blocked: if sets to true, it will not
sampling the XX parameter of the bias. XX varies depending on the
likelihood. '''WARNING: the code has not yet been updated to look for
these variables in [block_loop], they should still be located in
[system] at the moment. '''
- **Note:**
Whenever a bias model uses $b_0$ to hold the normalization,
inside its header you should set/see ``NmeanIsBias=True``.
Take a look inside ``libLSS/physics/bias/*`` (for example ``linear.hpp``).
- sigma8_sampler_blocked: true by default, to sample sigma8 in the
initial conditions, sets this to false
Section [mcmc]
~~~~~~~~~~~~~~
- number_to_generate: Maximum number of markov chain samples to produce
in a single run
- init_random_scaling: This is more specific to HADES. It starts the
MCMC run with a random initial condition, scaled with this number
(default 0.1) compared to the reference initial powerspectrum.
- random_ic: true if ic must be reshuffled before starting the MCMC
sampling, false to keep them at their value generated by the mock
data generator
Section [gravity]
~~~~~~~~~~~~~~~~~
- model: Forward model to use
- LPT: Lagrangian perturbation theory, ModifiedNGP/Quad final
projection
- 2LPT: Second order Lagrangian perturbation theory,
ModifiedNGP/Quad final projection
- PM: Particle mesh, ModifiedNGP/Quad final projection
- LPT_CIC: Same as LPT, but use CIC for final projection
- 2LPT_CIC: Same as 2LPT, but use CIC for final projection
- PM_CIC: Same as PM, but use CIC for final projection
- tCOLA: Same as PM_CIC but uses a TCOLA gravity machine. To enable,
specify model=PM_CIC, as above, AND set tCOLA=true.
- HADES_LOG: Use Exponential transform (HADES model) for the forward model. Preserved mean density is enforced.
- supersampling: Controls the number of particles (supersampling level of the particle grid with respect to the grid). The number of particles is :math:`(N_0 \cdot N_1 \cdot N_2 \cdot \mathrm{supersampling})^3`
- forcesampling : This is the oversampling for computing the gravitational field (and thus the force in the PM). A current rule of thumb is to have forcesampling at least twice of supersampling, and supersampling at least two. For tCOLA, the requirements are less.
- **To be checked:** Setup with forcesampling=supersampling.
- a_initial : Scale factor value reflects the time. This parameter controls the value of the a_initial (:math:`a_i`) which should be :math:`10^{-3} \leq a_i \leq 1.0`, with :math:`a_i=10^{-3}` corresponding to the time of CMB
- a_final : Same as a_initial parameter, but :math:`a_f > a_i`
- pm_start_z: This is relevant only for the PM forward model and represents the starting redshift for the PM simulation.
- pm_nsteps: Relevant only for PM model, see ``extra/borg/libLSS/physics/forwards/borg_multi_pm.cpp``. There are two scalings in the code, controlled with ``LOG_SCALE_STEP``. If ``LOG_SCALE_STEP`` is set to ``False`` then steps are splitted linearly in :math:`a`. It seems the linear scaling gives better results in tests of :math:`P(k)`.
- part_factor: An option relevant for MPI run. This is the overallocation of particles on each node to allow for moving them in and out of the node. It is required because the density projection needs to have only the relevant particles on the node. If one of them is outside the slab it will cause a failure.
- **Note**: ``part_factor`` is indipendent of ``forcesampling`` and ``supersampling`` It will likely be larger for smaller boxes (physical length) and smaller box (in terms of mesh / grid size). The first case because particles travel larger distances w.r.t to the size of the box, and the second because there is more shot noise.
- lightcone: See equation 2 from the `SDSS3-BOSS inference paper <https://arxiv.org/pdf/1909.06396.pdf>`_. This option is more relevant for larger boxes.
- do_rsd: Do redshift space distortion if set to ``True``.
- **Note:** The DM particles are shifted directly. But, this will never be the case in observations, for which it is ensemble of gas particles around a galaxy that is shifted.
Forward model elements can as well be chained and have different grid sizes. *"model"* can now be CHAIN, which then needs a specific list of models in *"models"*.
Here is an example:
.. code:: text
[gravity]
model=CHAIN
models=PRIMORDIAL,TRANSFER_EHU,LPT_CIC
[gravity_chain_0]
a_final=0.001
[gravity_chain_1]
[gravity_chain_2]
supersampling=2
lightcone=false
do_rsd=false
a_initial=0.001
a_final=1.
part_factor=2.0
mul_out=1
Each element of the chain gets its own configuration section which is
the same as previously when it was a global descriptor (see above). Note that
if you use the chain mechanism, you have to be explicit on the production of initial conditions power spectrum.
As you can see above, we indicate "PRIMORDIAL,TRANSFER_EHU" to start with a primordial scale-free gravitational potential,
onto which we apply an Einstein-Hu transfer function to form density fluctuations, which are then
passed down to LPT_CIC. Also keep in mind that the scale factors must be compatibles and no checks
are run by the code at the moment. ``mul_out`` specifices how much the output grid as to be supersampled for the
CIC (i.e. the CIC grid is produced at mul_out times the initial grid size).
Model 'Primordial'
^^^^^^^^^^^^^^^^^^
Apply a primordial scale free power spectrum on the input. The output is
scaled linearly to a_final.
Model 'Transfer'
^^^^^^^^^^^^^^^^
* **CIC correction**: use_invert_cic=true: Transfer function is inverse CIC smoother=0.99 (in unit of grid)
* **Sharp K filter**: use_sharpk=true: Transfer function is sharp k filter k_max=0.1 (in h/Mpc)
Model 'Softplus'
^^^^^^^^^^^^^^^^
Apply a softplus transform hardness=1.0 , some parameter making the
transition more or less harder
Model 'Downgrade'
^^^^^^^^^^^^^^^^^
(No option)
Section [hades]
~~~~~~~~~~~~~~~
- max_epsilon: Stepsize for the HMC. It is unitless. Good starting
point is around 0.01.
- max_timesteps: Maximum number of timesteps for a single HMC sample.
- mixing: Number of samples to compute before writing to disk.
- algorithm:
- HMC: classical HMC algorithm
- QN-HMC: Quasi-Newton HMC algorithm
- FROZEN-PHASE: Fixed phase. They are not sampled at all but provide
some pipelines to allow the other samplers to work.
- phases: if ``algorithm`` is FROZEN-PHASE, you can specify an HDF5
filename here. This file must contain a "phase" array which is
conforming to the setup of the ini.
- noPhasesProvided: if phases is omitted, this one has to be set to
true, otherwise an error is thrown.
- phasesDataKey: this indicate which field to use in the ``phases``
HDF5 file.
- likelihood: Likelihood to use in HADES run. Can be either one of
those values:
- LINEAR: Gaussian likelihood
- BORG_POISSON: Use poisson likelihood
- Generic framework:
- GAUSSIAN_BROKEN_POWERLAW_BIAS
- GAUSSIAN_MO_WHITE_BIAS: Gaussian noise model, variance is
fitted. Double power law bias
- GAUSSIAN_POWERLAW_BIAS: Power law bias model with a Gaussian
noise model, variance is fitted.
- GAUSSIAN_2ND_ORDER_BIAS
- GENERIC_POISSON_BROKEN_POWERLAW_BIAS: Broken power law bias
model (also called Neyrinck's model), with Poisson noise lmodel
- GENERIC_GAUSSIAN_LINEAR_BIAS: Linear bias model, Gaussian noise
model
- GENERIC_GAUSSIAN_MANY_POWER_1^1
- GENERIC_GAUSSIAN_MANY_POWER_1^2
- GENERIC_GAUSSIAN_MANY_POWER_1^4
- GENERIC_POISSON_MANY_POWER_1^1
- GENERIC_POISSON_MANY_POWER_1^2
- GENERIC_POISSON_MANY_POWER_1^4
- GENERIC_POISSON_POWERLAW_BIAS: simple power law bias model with
Poisson noise model
- GENERIC_POISSON_POWERLAW_BIAS_DEGRADE4: power law bias models
preceded by a degrade pass (N -> N/4 in each direction)
- GENERIC_POISSON_BROKEN_POWERLAW_BIAS_DEGRADE4: broken power law
bias model preceded by a degrade pass (N -> N/4 in each
direction)
- scheme: SI_2A, SI_2B, SI_2C, SI_3A, SI_4B, SI_4C, SI_4D, SI_6A
Section [run]
~~~~~~~~~~~~~
- NCAT: Number of catalogs. This affects the number of "catalog"
sections.
-**Note:** If ``NCAT>1`` then it is supposed catalogues are independently taken (no double counting of galaxies etc.)
and hence when one evaluates the log-likelihood, they are just summed together.
- SIMULATION: Specify if the input is from simulation. Default is
false.
Section [cosmology]
~~~~~~~~~~~~~~~~~~~
- omega_r: Radiation density
- omega_k: Curvature
- omega_m: Total matter density
- omega_b: Baryonic matter density
- omega_q: Quintescence density
- w: Quintescence equation of state
- wprime: Derivative of the equation of state
- n_s: Slope of the power spectrum of scalar fluctuations
- sigma8: Normalisation of powerspectrum at 8 Mpc/h
- h100: Hubble constant in unit of 100 km/s/Mpc
- fnl: primordial non-Gaussianity
Section [likelihood]
~~~~~~~~~~~~~~~~~~~~
- Options related to robust likelihood. Each patch of a robust likelihood can be sliced in the redshift direction.
There are two options controlling the slicing: the maximum distance "rmax" and the number of slices "slices"
- rmax: Maximum distance accessible during the inference. In practice it is at least the farthest distance of a voxel in the box.
Unit is the one of the box, most generally :math:`h^{-1}` Mpc.
- slices: Number of slices to build in the redshift direction. Each patch will have a depth ~rmax/slices.
Section [julia]
~~~~~~~~~~~~~~~
- likelihood_path: path of the julia code
- likelihood_module: julia module where the likelihood is implemented
- bias_sampler_type: type of sampler for the bias parameters (hmclet,
slice)
- ic_in_julia: whether initial conditions of the MCMC are coded in
julia or choose some random numbers
- hmclet_diagonalMass: where to use a diagonal mass matrix or a full
dense
- mass_burnin: number of MCMC steps in burnin mode
- mass_burnin_memory: number of MCMC steps to store when in burnin mode
- hmclet_maxEpsilon: maximum epsilon for the leapfrog integrator
(~0.002-0.01 depending on likelihood complexity)
- hmclet_maxNtime: maximum number of steps for the leapfrog integrator
(~50-100)
- hmclet_massScale: amount of momentum reshuffling (0.0 = full, 1.0 =
none bad for MCMC)
- hmclet_correlationLimiter: reduce the correlations in the covariance
matrix by some number. Typically the smaller the number the less
reduction with :math:`\simeq 1` reducing the correlation by 2.
Catalog sections
----------------
Basic fields
~~~~~~~~~~~~
- datafile: Text filename holding the data
- maskdata: Healpix FITS file with the mask
- radial_selection: Type of selection function, can be either
"schechter", "file" or "piecewise".
- refbias: true if this catalog is a reference for bias. Bias will not
be sampled for it
- bias: Default bias value, also used for mock generation
- nmean: Initial mean galaxy density value, also used for mock
generation
Halo selection
~~~~~~~~~~~~~~
- halo_selection: Specifying how to select the halos from the halo catalog. Can be ``mass, radius, spin or mixed``. The ``mixed`` represents the combined cuts and can be applied by specifying, eg "halo_selection = mass radius"
- halo_low_mass_cut: this is log10 of mass in the same unit as the
masses of the input text file
- halo_high_mass_cut: same as for halo_low_mass_cut, this is log10 of
mass
- halo_small_radius_cut
- halo_large_radius_cut
- halo_small_spin_cut
- halo_high_spin_cut
Schechter selection function
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- schechter_mstar: Mstar for Schechter function
- schechter_alpha: Power law slope of Schechter function
- schechter_sampling_rate: How many distance points to precompute from
Schechter (i.e. 1000)
- schechter_dmax: Maximum distance to precompute Schecter selection
function
- galaxy_bright_apparent_magnitude_cut: Apparent magnitude where data
and selection must be truncated, bright end.
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in data
and selection function, bright end, useful to select different galaxy
populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- zmin: Minimum redshift for galaxy sample, galaxies will be truncated
- zmax: Maximum redshift for galaxy sample, galaxies will be truncated
'File' selection function
~~~~~~~~~~~~~~~~~~~~~~~~~
- radial_file: Text file to load the selection from
The file has the following format. Each line starting with a '#' is a
comment line, and discarded. The first line is a set of three numbers:
'rmin dr N'. Each line that follows must be a number between 0 and 1
giving the selection function at a distance r = rmin + dr \* i, where
'i' is the line number (zero based). Finally 'N' is the number of points
in the text file.
Two possibilities are offered for adjusting the catalog and the
selection together:
- either you chose not to do anything, and take the whole sample and
provided selection. Then you need to specify:
- file_dmin: Minimal distance for selection function and data
- file_dmax: same but maximal distance
- no_cut_catalog: set to false, if you do not set this you will get
an error message.
- or you want ares to preprocess the catalog and then you need:
- zmin
- zmax
- galaxy_faint_apparent_magnitude_cut: Same for faint end.
- galaxy_bright_absolute_magnitude_cut: Absolute magnitude cut in
data and selection function, bright end, useful to select
different galaxy populations
- galaxy_faint_absolute_magnitude_cut: Similar but faint end
- no_cut_catalog: (not necessary, as it defaults to true)

View file

@ -0,0 +1,19 @@
How to create a config file from python
=======================================
This page is about running the ``gen_subcat_conf.py`` script under
``scripts/ini_generator`` in ares. For an explanation of the config-file itself, see :ref:`here<configuration_file>`.
Config-file for 2M++ and SDSS(MGS)
----------------------------------
The folder containing the scripts and the ini files below is located in ``$SOURCE/scripts/ini_generator``. Steps to generate the config-file are the following:
- Manipulate ``header.ini`` for your needs
- (If needed) alter template files (``template_sdss_main.py``,
``template_2mpp_main.py`` and ``template_2mpp_second.py``) for the cutting and adjusting of data
- To create ini file, run this command:
.. code:: bash
python gen_subcat_conf.py  --output NAME_OF_OUTPUT_FILE.ini --configs template_sdss_main.py:template_2mpp_main.py:template_2mpp_second.py  --header header.ini

View file

@ -0,0 +1,64 @@
HDF5 catalog format
===================
Passing in the :ref:`ini file<configuration_file>` the following
option in the catalog sections:
- ``dataformat=HDF5``
- ``datakey=KEY``
one can load from an HDF5 file the needed data for a catalog. The data
are taken from the entry "KEY" in the HDF5. This allows to store several
catalogs at the same time in the same file.
HDF5 catalog format
-------------------
The catalog must have the following columns:
- id (``unsigned long int`` compatible)
- phi (longitude in radians, ``double`` compatible)
- theta (latitude in radians, ``double`` compatible)
- zo (observed redshift, dimensionless, ``double`` compatible)
- m (apparent magnitude, ``double`` compatible)
- M_abs (absolute magnitude, optional, ``double`` compatible)
- z (redshift, optional, ``double`` compatible)
- w (weight, ``double`` compatible, should be 1)
HDF5 halo catalog format
------------------------
- id (``unsigned long int`` compatible)
- Mgal (mass, ``double`` compatible)
- radius (``double`` compatible)
- spin (``double`` compatible)
- posx (x position Mpc, ``double`` compatible)
- posy (y position Mpc, ``double`` compatible)
- posz (z position Mpc, ``double`` compatible)
- vx (velocity x, km/s, ``double`` compatible)
- vy (velocity x, km/s, ``double`` compatible)
- vz (velocity x, km/s, ``double`` compatible)
- w (weight, ``double`` compatible, should be 1)
An example converter can be found hereafter:
.. code:: python
import numpy as np
import h5py as h5
# Load text data file
data0 = np.loadtxt("./halo.txt", dtype=[("id",int),("Mgal", float),("radius",float),("spin",float),("posx",float),("posy",float),("posz",float),("vx",float),("vy",float),("vz",float)])
# Build a new one with a weight column
data = np.empty(data0.size, dtype=[("id",int),("Mgal", float),("radius",float),("spin",float),("posx",float),("posy",float),("posz",float),("vx",float),("vy",float),("vz",float),("w",float)])
for n in data0.dtype.names:
data[n] = data0[n]
# Set the weight to one
data['w'] = 1
# Write the hdf5
print("Writing catalog")
with h5.File("halo.h5", mode="w") as f:
f['data'] = data

View file

@ -0,0 +1,27 @@
Radial selection format
=======================
The file format for radial selection is the following:
- First line is : ``rmin dr numPoints``
- ``rmin`` is the minimal distance of the completeness (the first point
in the following)
- ``dr`` is the space between two samples
- ``numPoints`` is the number of points
- Comment line start with ``#``
- All following lines are completeness
For example, the following would create a completeness equal to one
between :math:`100 \, \mathrm{Mpc} \, h^{-1}` and :math:`4000 \, \mathrm{Mpc} \, h^{-1}`:
.. code:: text
# some comment
100 800 5
1
1
1
1
1

View file

@ -0,0 +1,34 @@
Text catalog format
===================
It is determined by the function ``loadGalaxySurveyFromText`` in
``libLSS/data/survey_load_txt.hpp`` (ARES git tree)
**[Galaxy Survey]**
For galaxy survey, the standard catalog format includes 7-8 columns. The meaning of each column, from left to right, is listed below.
- galaxy id
- phi: longitude, :math:`2\pi >= \phi >= 0` [rad].
- theta: latitude, :math:`\pi/2 >= \theta >= -\pi/2` [rad].
- zo: total observed redshift, to be used with photo-z.
- m: apparent magnitude.
- M_abs: absolute magnitude, not really used as it is derived from
other quantities.
- z: redshift, used to position the galaxies, cosmology is used to
transform this to comoving distance at the moment.
- w: weight, used as a multiplier when creating the grid of galaxy
distribution.
**[Dark Matter Simulation]**
For Dark Matter simulation, the standard catalog format includes 10
columns. The meaning of each column, from left to right, is listed
below.
- halo id
- halo mass: given in unit of solar mass
- halo radius
- halo spin
- x, y, z: comoving coordinates
- vz, vy, vz: velocities

View file

@ -0,0 +1,96 @@
.. _outputs:
Outputs
#######
hmc_perfomance.txt
==================
This text file is appended with a new line every time the HMC is used.
Each column has the following meaning:
- epsilon used in the integrator
- number of timesteps
- variation of energy between first and last step (:math:`\Delta H = H_{final} - H_{initial}`). Please note
that you actually want this one to be negative or order 1 as the acceptance is determined by the probability
:math:`exp(-\Delta H)`.
- wall seconds taken to do the entire HMC run
- scheme used to integrate
- value of the final hamiltonian
.. _log_files:
log files
=========
The log files are formatted by libLSS/tools/console.hpp. If you have not
explicitly disabled the debug level, then all the messages emitted by
the code are saved in those files. Otherwise, it is limited to verbose
level. Each line starts with square brackets, with the level of the
message indicated "[LEVEL]". Each new indentation corresponds to a new
subcontext. If timing information were requested at compile time, each
termination of context gives also the time taken in the context itself,
including everything called inside this same context.
.. _restart_files:
restart files
=============
This file gives you access to the relevant infromation required to
restart an MCMC run, such as the initial configuration. The ares
framework creates one restart file per MPI task. Each file is suffixed
by "_X" where X is the MPI task id. Most of the variables are just the
same from one file to the other. The exception are the arrays explicitly
sliced by the MPI parallelization which are only present by slab.
The file contains the following groups:
- galaxy_catalog_0
- galaxy_kecorrection_0
- random_generator
- scalars
The python script "scripts/merge_mpi_restart.py" can merge all these
restart files into a single restart.h5 file. Be aware that it may
consume quite a lot of memory. However it is a required step to allow
the user to change the number of MPI task for an exisiting ARES run. The
MPI run may be resumed with the option "SPECIAL_RESUME" instead of
"RESUME" and it will read restart.h5 to recreate the set of
"restart.h5_XX" files with the new number of MPI tasks.
.. _mcmc_files:
MCMC files
==========
Depending on length of run, a series of mcmc files will be produced with
file names 'mcmc_chainNumber.h5'. All attributes of the file are
contained within the group 'scalars', for example the following for the
basic run in "examples":
- catalog_foreground_coefficient_0
- galaxy_bias_0
- galaxy_nmean_0
- powerspectrum
- s_field
- spectrum_c_eval_counter
For reference, these groups and attributes can be easily searched
through a few lines of python:
.. code:: python
import h5py as h5
# access mcmc file
hf = h5.File("mcmc_0.h5")
# list groups within file
list(hf.keys())
# list attributes within 'scalars' group
list(hf['scalars'].keys())
A tutorial to read and plot basic ARES outputs with python is available :ref:`here <tutorial_ares_basic_outputs>`.
If one wishes to access the MCMC files in C++, functions are available
in CosmoTool and LibLSS: see :ref:`this code tutorial <reading_in_meta_parameters_and_arrays>`.

View file

@ -0,0 +1,9 @@
Postprocessing
##############
.. _postprocessing:
.. include:: postprocessing/Postprocessing_scripts.inc.rst
.. include:: postprocessing/ARES_basic_outputs.inc.rst
.. include:: postprocessing/Diagnostics_ARES_BORG_chains.inc.rst
.. include:: postprocessing/HADES_generate_constrained_simulations.inc.rst

View file

@ -0,0 +1,206 @@
.. _tutorial_ares_basic_outputs:
Tutorial: checking ARES outputs in python
=========================================
We first import numpy (to handle arrays), h5py (to read hdf5 files) and
matplotlib.pyplot (to plot density slices):
.. code:: ipython3
import numpy as np
import h5py as h5
import matplotlib.pyplot as plt
%matplotlib inline
We then load the hdf5 file with h5py:
.. code:: ipython3
fdir="./" # directory to the ARES outputs
isamp=0 # sample number
fname_mcmc="mcmc_"+str(isamp)+".h5"
hf=h5.File(fname_mcmc)
We can then list the datasets in the hdf5 file:
.. code:: ipython3
list(hf.keys())
.. code:: text
['scalars']
.. code:: ipython3
list(hf['scalars'].keys())
.. code:: text
['catalog_foreground_coefficient_0',
'galaxy_bias_0',
'galaxy_nmean_0',
'powerspectrum',
's_field',
'spectrum_c_eval_counter']
The density contrast is stored as scalars/s_field:
.. code:: ipython3
density=np.array(hf['scalars/s_field'])
We now plot a slice through the box:
.. code:: ipython3
plt.imshow(density[16,:,:])
.. image:: /user/postprocessing/ARES_basic_outputs_files/ares_basic_outputs_12_1.png
The “restart” files contain a lot of useful information.
.. code:: ipython3
fname_restart=fdir+"restart.h5_0"
hf2=h5.File(fname_restart)
list(hf2.keys())
.. code:: text
['galaxy_catalog_0', 'galaxy_kecorrection_0', 'random_generator', 'scalars']
.. code:: ipython3
list(hf2['scalars'].keys())
.. code:: text
['ARES_version',
'K_MAX',
'K_MIN',
'L0',
'L1',
'L2',
'MCMC_STEP',
'N0',
'N1',
'N2',
'N2_HC',
'N2real',
'NCAT',
'NFOREGROUNDS',
'NUM_MODES',
'adjust_mode_multiplier',
'ares_heat',
'bias_sampler_blocked',
'catalog_foreground_coefficient_0',
'catalog_foreground_maps_0',
'corner0',
'corner1',
'corner2',
'cosmology',
'data_field',
'fourierLocalSize',
'fourierLocalSize1',
'galaxy_bias_0',
'galaxy_bias_ref_0',
'galaxy_data_0',
'galaxy_nmean_0',
'galaxy_schechter_0',
'galaxy_sel_window_0',
'galaxy_selection_info_0',
'galaxy_selection_type_0',
'galaxy_synthetic_sel_window_0',
'growth_factor',
'k_keys',
'k_modes',
'k_nmodes',
'key_counts',
'localN0',
'localN1',
'messenger_field',
'messenger_mask',
'messenger_signal_blocked',
'messenger_tau',
'power_sampler_a_blocked',
'power_sampler_b_blocked',
'power_sampler_c_blocked',
'powerspectrum',
'projection_model',
's_field',
'sampler_b_accepted',
'sampler_b_tried',
'spectrum_c_eval_counter',
'spectrum_c_init_sigma',
'startN0',
'startN1',
'total_foreground_blocked',
'x_field']
There we have in particular cosmological parameters:
.. code:: ipython3
cosmo=np.array(hf2['scalars/cosmology'])
print("h="+str(cosmo['h'][0])+", omega_m="+str(cosmo['omega_m'][0]))
.. code:: text
h=0.6711, omega_m=0.3175
We also have the k modes to plot the power spectrum in our mcmc files:
.. code:: ipython3
k_modes=np.array(hf2['scalars/k_modes'])
The power spectrum is stored in the mcmc files as
scalars/powerspectrum:
.. code:: ipython3
powerspectrum=np.array(hf['scalars/powerspectrum'])
We can now make a plot.
.. code:: ipython3
plt.xlabel("$k$ [$h$/Mpc]")
plt.ylabel("$P(k)$ [$(\mathrm{Mpc}/h)^3$]")
plt.title("Power spectrum of the Oth sample")
plt.loglog(k_modes,powerspectrum)
.. image:: /user/postprocessing/ARES_basic_outputs_files/ares_basic_outputs_23_1.png
Finally we close the hdf5 files.
.. code:: ipython3
hf.close()
hf2.close()

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View file

@ -0,0 +1,742 @@
Tutorial: diagnostics of ARES/BORG chains
=========================================
What this tutorial covers:
--------------------------
In this tutorial, we will cover how to do some basic plots of a
BORG-run. These plots are useful for monitoring the burn-in progress of
the run and diagnostics. Furthermore, how to plot BORGs ability to
sample/infer a specific parameter.
Prerequisites
~~~~~~~~~~~~~
Packages: numpy, h5py, pandas, matplotlib, tqdm What is assumed: I wont
go into much detail of how the python-code works. That said, this
python-code is probably not the optimal way to do certain things, and I
am sure it can be improved. BORG-Stuff: Have installed/compiled BORG as
well as managed a first run. We will be using the data-products (the
restart.h5_0-file and mcmc_#.h5-files)
Overview of tutorial - what are we producing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) Galaxy projections
2) Statistics of the Ensemble density field
3) Burn-in of the powerspectra
4) Correlation matrix of the bias parameters
5) Trace plot and histogram of sampled parameter
6) Correlation length of a parameter
7) Acceptance Rate
8) Animations (gifs) of the density field and galaxy field
Take-aways/Summary - What can be used in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The aim of this tutorial is to provide some tools to view the
data-products that are in the mcmc-files, and to view features of the
chain itself.
Dont forget that this jupyter notebook can be exported to a .py-file!
We import some packages here. Note that we have ares_tools here, which is found under ares/scripts/ares_tools/. Move this to the working directory, or create a symbolic link (e.g. add to Python-path) in order to get this tutorial to work.
.. code:: ipython3
import os
import sys
import numpy as np
import h5py as h5
import pandas as pd
from tqdm import tqdm
import ares_tools as at
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from matplotlib import gridspec
mpl.rcParams['font.size'] = 15
Here we set our own colormap, can be fun if you want to customize your plots
.. code:: ipython3
import matplotlib.colors as mcolors
low = 'indigo'#
midlow = 'darkviolet'#
mid = 'darkgrey'
midhigh = 'gold'#
high = 'goldenrod' #
color_array = [low, midlow, mid, midhigh, high]
my_cmap = mcolors.LinearSegmentedColormap.from_list('my_cmap',color_array)
cm.register_cmap(cmap=my_cmap)
.. code:: ipython3
# LOAD FILES/CHECK FILES
startMC = 0
names=[]
PP=[]
Fmax=startMC
while True:
try:
os.stat("mcmc_%d.h5" % Fmax)
names.append(Fmax)
Fmax += mcDelta
except:
break
loc_names = list(names)
num = np.shape(names)[0]
print("Number of mcmc-files found: %d" % num)
restarts=[]
Gmax = 0
while True:
try:
os.stat("restart.h5_%d" % Gmax)
restarts.append(Gmax)
Fmax += mcDelta
except:
break
loc_restarts = list(restarts)
rnum = np.shape(restarts)[0]
print("Number of restart-files found: %d" % rnum)
Load some constants of the run from the restart-file:
.. code:: ipython3
#LOAD THE RESTART-FILE
filepath = "restart.h5_0"
restart_file = h5.File(filepath,'r')
#LOAD CONFIG OF RUN
N = restart_file['scalars/N0'][0]
NCAT = restart_file['scalars/NCAT'][0]
no_bias_params = (restart_file['scalars/galaxy_bias_0'][:]).shape[0]
restart_file.close()
#PREPARE GALAXY FIELD
gal_field = np.zeros((N,N,N))
restart_dens_field = np.zeros((N,N,N))
#STORE ALL OF THE GALAXIES
for r in np.arange(rnum):
temp_restart = h5.File('restart.h5_%d' % r,'r')
for i in np.arange(NCAT):
gal_field[(r*N:(r+1)*N),:,:] += temp_restart['scalars/galaxy_data_%d' % i][:]
restart_dens_field[(r*N:(r+1)*N),:,:] += temp_restart['scalars/BORG_final_density'][:]
temp_restart.close()
print('Total number of galaxies: %d' % np.sum(gal_field))
Galaxy projection & ensemble density field: mean and standard deviation
-----------------------------------------------------------------------
In this plot, I have gathered the galaxy projection as well as ensemble
statistics for the density field. The galaxy projection is a sum over
all the galaxies in one direction at a time. We are viewing the input
data (the galaxies) as a whole, which is found in the restart-file. With
the ensemble statistics for the density field, we sum up all of the
reconstructed density fields in the mcmc-files (mcmc_#.h5) and then
compute the mean and the standard deviation of the field in each voxel.
The aim of these plots are to:
1) Check so that the galaxy data is fully within the datacube. If the
datacube is misaligned with the galaxy data, we are not using all of
the input data. This may sometimes be intended, but for most of the
times we want to avoid this.
2) Check so that the reconstructed density fields coincide with the
data-filled regions (i.e., where we have galaxies/data). We expect to
have values distinct from the cosmic mean (usually zero) where we
have data, and values close to the cosmic mean where we do not have
data.
3) Check so that we have less variance inside the data-filled regions
than outside the data-filled regions.
.. code:: ipython3
#PREPARE THE ENSEMBLE DENSITY FIELD HOLDER - FOR THE MEAN DENSITY FIELD
dens_fields = np.array(np.full((N,N,N),0),dtype=np.float64)
#COMPUTE THE MEAN-DENSITY FIELD
for idx in tqdm(np.arange(num)):
mcmc_file = h5.File("mcmc_%d.h5" % idx,'r')
temp_field = np.array(mcmc_file['scalars/BORG_final_density'][...],dtype=np.float64)
dens_fields += temp_field
mcmc_file.close()
mean_field = dens_fields/np.float64(num)
#PREPARE THE ENSEMBLE DENSITY FIELD HOLDER - FOR THE STANDARD DEVIATION DENSITY FIELD
dens_fields = np.array(np.full((N,N,N),0),dtype=np.float64)
#COMPUTE THE STANDARD DEVIATION DENSITY FIELD
for idx in tqdm(np.arange(num)):
mcmc_file = h5.File("mcmc_%d.h5" % idx,'r')
temp_field = np.array(mcmc_file['scalars/BORG_final_density'][...],dtype=np.float64)
temp_field -= mean_field
dens_fields += temp_field*temp_field
mcmc_file.close()
std_field = np.sqrt(dens_fields/(num-1))
print(std_field)
#SAVE THE FIELDS
np.savez('projection_fields.npz',mean_field = mean_field,
gal_field = gal_field,
std_field = std_field,
restart_dens_field = restart_dens_field)
Here we load the data from the previous step and produce projection plots
.. code:: ipython3
#LOAD DATA FROM THE .NPZ-FILES
data = np.load('projection_fields.npz')
mean_field = data['mean_field']
std_field = data['std_field']
gal_field = data['gal_field']
restart_dens_field = data['restart_dens_field']
#FIRST GALAXY PROJECTION IN THE X-DIRECTION
plt.figure(figsize=(20,20))
print('First subplot')
plt.subplot(3,3,1)
plt.title('No Galaxies: ' + str(np.sum(gal_field)))
proj_gal_1 = np.sum(gal_field,axis = 0)
im = plt.imshow(np.log(proj_gal_1),cmap=my_cmap)
clim=im.properties()['clim']
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('Y')
#SECOND GALAXY PROJECTION IN THE Y-DIRECTION
print('Second subplot')
plt.subplot(3,3,4)
proj_gal_2 = np.sum(gal_field,axis = 1)
plt.imshow(np.log(proj_gal_2), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('X')
#THIRD GALAXY PROJECTION IN THE Z-DIRECTION
print('Third subplot')
plt.subplot(3,3,7)
proj_gal_3 = np.sum(gal_field,axis = 2)
plt.imshow(np.log(proj_gal_3), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('X')
#FIRST ENSEMBLE DENSITY MEAN IN THE X-DIRECTION
print('Fourth subplot')
plt.subplot(3,3,2)
plt.title("Ensemble Mean Density field")
proj_dens_1 = np.sum(mean_field,axis = 0)
im2 = plt.imshow(np.log(1+proj_dens_1),cmap=my_cmap)
clim=im2.properties()['clim']
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('Y')
#SECOND ENSEMBLE DENSITY MEAN IN THE Y-DIRECTION
print('Fifth subplot')
plt.subplot(3,3,5)
proj_dens_2 = np.sum(mean_field,axis = 1)
plt.imshow(np.log(1+proj_dens_2), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('X')
#THIRD ENSEMBLE DENSITY MEAN IN THE Z-DIRECTION
print('Sixth subplot')
plt.subplot(3,3,8)
proj_dens_3 = np.sum(mean_field,axis = 2)
plt.imshow(np.log(1+proj_dens_3), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('X')
#FIRST ENSEMBLE DENSITY STD. DEV. IN THE X-DIRECTION
print('Seventh subplot')
plt.subplot(3,3,3)
plt.title('Ensemble Std. Dev. Dens. f.')
proj_var_1 = np.sum(std_field,axis = 0)
im3 = plt.imshow(np.log(1+proj_var_1),cmap=my_cmap)
clim=im3.properties()['clim']
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('Y')
#SECOND ENSEMBLE DENSITY STD. DEV. IN THE Y-DIRECTION
print('Eighth subplot')
plt.subplot(3,3,6)
proj_var_2 = np.sum(std_field,axis = 1)
plt.imshow(np.log(1+proj_var_2), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Z')
plt.ylabel('X')
#THIRD ENSEMBLE DENSITY STD. DEV. IN THE Z-DIRECTION
print('Ninth subplot')
plt.subplot(3,3,9)
proj_var_3 = np.sum(std_field,axis = 2)
plt.imshow(np.log(1+proj_var_3), clim=clim,cmap=my_cmap)
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('X')
plt.savefig('GalaxyProjection.png')
plt.show()
Burn-in power spectra
---------------------
This plot computes and plots the powerspectrum for each of the mcmc-file
together with the reference (or “true”) powerspectrum. In the bottom
plot, we divide each powerspectrum with the reference powerspectrum, in
order to see how much they deviate.
We expect that the powerspectra of the mcmc-files “rise” throughout the
run to the reference powerspectrum. The colormap is added to more easily
see the different powerspectra of the run.
.. code:: ipython3
# COMPUTE BURN-IN P(k) AND SAVE TO FILE
ss = at.analysis(".")
opts=dict(Nbins=N,range=(0,ss.kmodes.max()))
Pref = ss.rebin_power_spectrum(startMC, ==opts)
PP = []
loc_names = list(names)
mcDelta = 1
step_size = 1
print('Computing Burn-In Powerspectra')
for i in tqdm(loc_names[0::step_size]):
PP.append(ss.compute_power_shat_spectrum(i, ==opts))
bins = 0.5*(Pref[2][1:]+Pref[2][:-1])
suffix = 'test'
np.savez("power_%s.npz" % suffix, bins=bins, P=PP, Pref=Pref)
print('File saved!')
Plotting routines
~~~~~~~~~~~~~~~~~
.. code:: ipython3
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
# LOAD DATA
suffix = 'test'
x=np.load("power_%s.npz" % suffix, allow_pickle=True)
sampled_pk = np.array([x['P'][i,0][:] for i in range(len(x['P']))]).transpose()
# PREPARE FIRST SUBPLOT
plt.figure(figsize=(10,10))
gs = gridspec.GridSpec(2, 1, height_ratios=[2, 1])
p = plt.subplot(gs[0])
# PLOT THE BURN-IN POWERSPECTRA
no_burn_ins = (sampled_pk).shape[1]
color_spectrum = iter(my_cmap(np.linspace(0,1,no_burn_ins))); #Here we include the colormap
for j in np.arange(no_burn_ins):
p.loglog(x['bins'], sampled_pk[:,j], color = next(color_spectrum), alpha=0.25)
# PLOT THE REFERENCE POWERSPECTRUM
p.loglog(x['bins'], x['Pref'][0],color='k',lw=0.5,
label = "Reference powerspectrum")
# SOME CONTROL OVER THE AXES
#cond = x['Pref'][0] > 0
#xb = x['bins'][cond]
#p.set_xlim(0.01, 0.2)
#p.set_ylim(1,0.9*1e5)
# LABELLING
plt.xlabel(r'$k \ [\mathrm{Mpc} \ h^{-1} ]$')
plt.ylabel(r'$P(k) \ [\mathrm{Mpc^{3}} \ h^{-3} ]$')
plt.title('Powerspectrum Burn-in for run: ' + suffix)
p.tick_params(bottom = False,labelbottom=False)
plt.legend()
# SET THE COLORBAR MANUALLY
norm = mpl.colors.Normalize(vmin=0,vmax=2)
sm = plt.cm.ScalarMappable(cmap=my_cmap, norm=norm)
sm.set_array([])
cbaxes = inset_axes(p, width="30%", height="3%", loc=6)
cbar = plt.colorbar(sm,cax = cbaxes,orientation="horizontal",
boundaries=np.arange(-0.05,2.1,.1))
cbar.set_ticks([0,1,2])
cbar.set_ticklabels([0,int(no_burn_ins/2),no_burn_ins])
# PREPARE THE SECOND PLOT, THE ERROR PLOT
p2 = plt.subplot(gs[1], sharex = p)
color_spectrum = iter(my_cmap(np.linspace(0,1,no_burn_ins)));
# PLOT THE ALL THE SAMPLED/RECONSTRUCTED POWERSPECTRA DIVIDED BY THE REFERENCE POWERSPECTRUM
for j in np.arange(no_burn_ins):
p2.plot(x['bins'],sampled_pk[:,j]/(x['Pref'][0]),color = next(color_spectrum),alpha = 0.25)
# PLOT THE REFERENCE PLOT
p2.plot(x['bins'],(x['Pref'][0])/(x['Pref'][0]), color = 'k',lw = 0.5)
# SOME CONTROL OF THE AXES AND LABELLING
p2.set_yscale('linear')
#p2.set_ylim(0,2)
#plt.yticks(np.arange(0.6, 1.6, 0.2))
plt.xlabel(r'$k \ [\mathrm{Mpc} \ h^{-1} ]$')
plt.ylabel(r'$P(k)/P_{\mathrm{ref}}(k) $')
#plt.subplots_adjust(hspace=.0)
plt.savefig("burnin_pk.png")
plt.show()
Correlation matrix
------------------
Bias parameters are parameters of the galaxy bias model. While these are
treated as nuisance parameters (i.e. they are required for the modelling
procedure but are integrated out as they are not of interest) its
important to check if there are internal correlations in the model. If
there are internal correlations, we run the risk of “overfitting” the
model, e.g. by having a bunch of parameters which do not add new
information, but give rise to redundancies. An uncorrelated matrix
suggests independent parameters, which is a good thing.
While I have only used bias parameters in this example, it is a good
idea to add cosmological parameters (which are sampled!) to this matrix.
Thereby, we can detect any unwanted correlations between inferred
parameters and nuisance parameters.
.. code:: ipython3
# CORR-MAT
#A MORE FLEXIBLE WAY TO DO THIS? NOT HARDCODE THE BIAS MODEL OF CHOICE....?
bias_matrix = np.array(np.full((num,NCAT,no_bias_params+1),0),dtype=np.float64)
#num - files
#NCAT - catalogs
#no_bias_params = number of bias parameters
df = pd.DataFrame()
"""
# If you have an array of a sampled parameter (how to get this array, see next section),
# then you can add it to the correlation matrix like below:
df['Name_of_cosmo_param'] = sampled_parameter_array
"""
for i in tqdm(np.arange(num)):
mcmc_file = h5.File("mcmc_%d.h5" % i,'r')
for j in np.arange(NCAT):
for k in np.arange(no_bias_params+1):
if k == 0:
bias_value = mcmc_file['scalars/galaxy_nmean_%d' % j][0]
else:
bias_value = mcmc_file['scalars/galaxy_bias_%d' % j][k-1]
bias_matrix[i,j,k] = bias_value
mcmc_file.close()
for j in np.arange(NCAT):
for k in np.arange(no_bias_params+1):
if k == 0:
column_name = r"$\bar{N}^{%s}$" % j
else:
column_name = (r"$b_{0}^{1}$".format(k,j))
df[column_name]=bias_matrix[:,j,k]
#print(df) #PRINT THE RAW MATRIX
# Save the DataFrame
df.to_csv('bias_matrix.txt', sep=' ', mode='a')
f = plt.figure(figsize=(15,15))
plt.matshow(df.corr(), fignum=f.number, cmap=my_cmap, vmin=-1, vmax=1)
plt.xticks(range(df.shape[1]), df.columns, fontsize=14, rotation=45)
plt.yticks(range(df.shape[1]), df.columns, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=15)
#plt.title(title, fontsize=30);
plt.show()
plt.savefig('corrmat.png')
Trace-histogram
---------------
BORG can infer cosmological parameters and sample these throughout the
run. One way to visualize BORGs constraining power is to use trace
plots and/or histograms. Basically, we gather the sampled values from
each mcmc-file, store them to an array, and plot each value vs. step
number (trace-plot) as well as the histogram of the distribution.
If the “true” value is known (for instance in mock runs), it can be
added and plotted in the example below.
Also note, the example below is done on an array of bias parameters:
change this to an array of a cosmological parameter.
.. code:: ipython3
from matplotlib.patches import Rectangle
def trace_hist(array_of_sampling_parameter,true_param=None, name_of_file='test'):
# =============================================================================
# Compute statistics
# =============================================================================
mean = np.mean(array_of_sampling_parameter)
sigma = np.sqrt(np.var(array_of_sampling_parameter))
xvalues = np.linspace(0,num-1,num)
mean_sampled = mean*np.ones(num)
# =============================================================================
# Trace-plot
# =============================================================================
plt.figure(figsize=(15,10))
ax1 = plt.subplot(2, 1, 1)
plt.plot(xvalues,array_of_sampling_parameter,
label = "Sampled Parameter Values",color = low,)
if true_param != None:
sampled_true_line = true_param*np.ones(num)
plt.plot(xvalues,sampled_true_line,'--',color = midhigh,
label = "True value of Sampled Parameter")
plt.plot(xvalues,mean_sampled, '-.',color = mid,
label = "True value of Sampled Parameter")
plt.xlabel(r'$\mathrm{Counts}$',size=30)
plt.ylabel("Sampled Parameter",size=30,rotation=90)
plt.legend()
# =============================================================================
# Histogram
# =============================================================================
plt.subplot(2,1, 2)
(n, bins, patches) = plt.hist(array_of_sampling_parameter,bins = 'auto',color = low)
samp_line = plt.axvline(mean, color=midhigh, linestyle='-', linewidth=2)
if true_param != None:
true_line = plt.axvline(true_param, color=mid, linestyle='--', linewidth=2)
sigma_line = plt.axvline(mean+sigma,color = midlow, linestyle='-', linewidth=2)
plt.axvline(mean-sigma,color = midlow, linestyle='-', linewidth=2)
extra = Rectangle((0, 0), 1, 1, fc="w", fill=False, edgecolor='none', linewidth=0)
if true_param != None:
plt.legend([samp_line,true_line,sigma_line,extra, extra, extra],
('Sampled$','True$',
'$1\sigma$ Interval',
'$N_{total}$: ' + str(num),
"$\mu$: "+str(round(mean,3)),
"$\sigma$: "+str(round(sigma,3))))
else:
plt.legend([samp_line,sigma_line,extra, extra, extra],
('Sampled$',
'$1\sigma$ Interval',
'$N_{total}$: ' + str(num),
"$\mu$: "+str(round(mean,3)),
"$\sigma$: "+str(round(sigma,3))))
"""
#HERE WE INCLUDE A SUMMARY STATISTICS STRING IN THE PLOT, OF THE SAMPLED PARAMETER
x_pos = int(-1.5*int(sigma))
summary_string = 'Sampled value = ' + str(round(mean,2)) +'$\pm$'+str(round(sigma,2))
plt.text(x_pos, int(np.sort(n)[-3]), summary_string, fontsize=30)
"""
plt.savefig('trace_hist_%s.png' % name_of_file)
plt.show()
plt.clf()
"""
# Here is an example of how to collect a
# sampled parameter from the mcmc-files
sampled_parameter_array = np.zeros(num)
cosmo_index = 1 #The index of the parameter of interest
for idx in tqdm(np.arange(num)):
mcmc_file = h5.File("mcmc_%d.h5" % idx,'r')
sampled_parameter_array[idx] = mcmc_file['scalars/cosmology'][0][cosmo_index]
mcmc_file.close()
trace_hist(sampled_parameter_array)
"""
trace_hist(bias_matrix[:,1,1])
Correlation length
------------------
This plot demonstrates the correlation length of the chain, i.e. how
many steps it takes for the sampling chain to become uncorrelated with
the initial value. It gives some insight into “how long” the burn-in
procedure is.
.. code:: ipython3
def correlation_length(array_of_sampling_parameter):
# COMPUTES THE CORRELATION LENGTH
autocorr = np.fft.irfft( (
np.abs(np.fft.rfft(
array_of_sampling_parameter - np.mean(array_of_sampling_parameter))) )**2 )
zero_line = np.zeros((autocorr/autocorr[0]).shape)
# PLOT THE CORRELATION LENGTH
fig = plt.figure(figsize = (15,10))
plt.plot(autocorr/autocorr[0],color = low)
plt.plot(zero_line, 'r--',color = mid)
Fmax=num
mcDelta=1
plt.xlim(0,Fmax/(2*mcDelta))
plt.ylabel(r'$\mathrm{Correlation}$')
plt.xlabel(r'$\mathrm{n \ (Step \ of \ mcmc \ chain)}$')
plt.savefig('corr.png')
plt.show()
# Runs the function on one of the bias-parameters
# -> adjust this call as in the trace-histogram field!
correlation_length(bias_matrix[:,1,1])
Acceptance rate
---------------
A way to visualize “how well” BORG manages to generate samples. A high
rate of trials suggests that BORG is struggling and requires many runs
to generate a sample. We expect that the acceptance rate is high at the
start of the run then decreases over the course of the burn-in until it
fluctuates around a certain value.
THIS PLOT IS NOT CORRECT YET!
.. code:: ipython3
# ACCEPTANCE-RATE
acc_array = np.full((num),0)
# GET THE ACCEPTANCE COUNTS FROM THE FILES
for i in np.arange(num):
mcmc_file = h5.File("mcmc_%d.h5" % idx,'r')
acceptance_number = mcmc_file['scalars/hades_accept_count'][0]
acc_array[i] = acceptance_number
# COMPUTE THE MEAN SO THAT IT CAN BE INCLUDED INTO THE PLOT
mean_rate = np.mean(acc_array)
xvalues = np.linspace(0,num-1,num)
mean_acc = mean_rate*np.ones(num)
# PLOT THE FINDINGS
fig = plt.figure(figsize = (15,10))
plt.scatter(xvalues,acc_array,color = low, label = "Acceptance Rate")
plt.plot(xvalues,mean_acc, '-.',color = mid,
label = "Mean Acceptance Rate")
plt.ylabel(r'$\mathrm{Acceptance}$')
plt.xlabel(r'$\mathrm{n \ (Step \ of \ mcmc \ chain)}$')
plt.savefig('acceptance_rate.png')
plt.show()
Animations/Gif-generator
------------------------
A fun way to view the data is the use gifs. In this example, Im slicing
up the density field and the galaxy field (in three different directions
of the data cube), saving each image (with imshow), then adding them to
a gif.
First, we save the slices of the fields to a folder:
.. code:: ipython3
def density_slices(dens_field,catalog):
# CREATE THE DIRECTORY TO SAVE SLICES
os.system('mkdir %s' % catalog)
# STORE THE MAX- AND MIN-POINTS FOR THE COLORBARS -> THIS CAN BE ADJUSTED
dens_max = np.log(1+np.max(dens_field))
dens_min = np.log(1+np.min(dens_field))
# SAVE THE DENSITY SLICES
for i in np.arange(N):
plt.figure(figsize=(20,20))
plt.imshow(np.log(1+dens_field[i,:,:]),
cmap = my_cmap,vmin = dens_min, vmax = dens_max)
plt.title('X-Y Cut')
plt.colorbar()
plt.savefig(catalog+"/slice_X_Y_" + str(i) + ".png")
plt.clf()
plt.imshow(np.log(1+dens_field[:,i,:]),
cmap = my_cmap,vmin = dens_min, vmax = dens_max)
plt.title('X-Z Cut')
plt.colorbar()
plt.savefig(catalog+"/slice_X_Z_" + str(i) + ".png")
plt.clf()
plt.imshow(np.log(1+dens_field[:,:,i]),
cmap = my_cmap,vmin = dens_min, vmax = dens_max)
plt.title('Y-Z Cut')
plt.colorbar()
plt.savefig(catalog+"/slice_Y_Z_" + str(i) + ".png")
plt.clf()
plt.close()
return
# RUN THE FUNCTION FOR THREE DIFFERENT FIELDS
density_slices(restart_dens_field,'dens_slices')
density_slices(gal_field,"gal_slices")
density_slices(mean_field,"mean_slices")
We generate the gifs below
.. code:: ipython3
import imageio
images1 = []
images2 = []
images3 = []
images4 = []
images5 = []
images6 = []
images7 = []
images8 = []
images9 = []
for i in np.arange(N):
images1.append(imageio.imread("gal_slices/slice_X_Z_%d.png" % i))
images2.append(imageio.imread("gal_slices/slice_X_Y_%d.png" % i))
images3.append(imageio.imread("gal_slices/slice_Y_Z_%d.png" % i))
images4.append(imageio.imread("dens_slices/slice_X_Z_%d.png" % i))
images5.append(imageio.imread("dens_slices/slice_X_Y_%d.png" % i))
images6.append(imageio.imread("dens_slices/slice_Y_Z_%d.png" % i))
images7.append(imageio.imread("mean_slices/slice_X_Z_%d.png" % i))
images8.append(imageio.imread("mean_slices/slice_X_Y_%d.png" % i))
images9.append(imageio.imread("mean_slices/slice_Y_Z_%d.png" % i))
imageio.mimsave('gal_X_Z.gif', images1)
imageio.mimsave('gal_X_Y.gif', images2)
imageio.mimsave('gal_Y_Z.gif', images3)
imageio.mimsave('dens_X_Z.gif', images4)
imageio.mimsave('dens_X_Y.gif', images5)
imageio.mimsave('dens_Y_Z.gif', images6)
imageio.mimsave('mean_X_Z.gif', images7)
imageio.mimsave('mean_X_Y.gif', images8)
imageio.mimsave('mean_Y_Z.gif', images9)

View file

@ -0,0 +1,197 @@
Tutorial: generating constrained simulations from HADES
=======================================================
Get the source
--------------
First you have to clone the bitbucket repository
.. code:: text
git@bitbucket.org:bayesian_lss_team/borg_constrained_sims.git
Ensure that you have the package H5PY and numexpr installed.
How to run
----------
If you run "python3 gen_ic.py -h" it will print the following help:
.. code:: text
usage: gen_ic.py [-h] --music MUSIC [--simulator SIMULATOR] [--sample SAMPLE]
[--mcmc MCMC] [--output OUTPUT] [--augment AUGMENT]
optional arguments:
-h, --help show this help message and exit
--music MUSIC Path to music executable
--simulator SIMULATOR
Which simulator to target (Gadget,RAMSES,WHITE)
--sample SAMPLE Which sample to consider
--mcmc MCMC Path of the MCMC chain
--output OUTPUT Output directory
--augment AUGMENT Factor by which to augment small scales
All arguments are optional except "music" if it is not available in your
PATH.
The meaning of each argument is the following:
- music: Full path to MUSIC executable
- simulator: Type of simulator that you wish to use. It can either be
- WHITE, if you only want the 'white' noise (i.e. the Gaussian
random number, with variance 1, which are used to generate ICs)
- Gadget, for a gadget simulation with initial conditions as Type 1
- RAMSES, for a ramses simulation (Grafic file format)
- sample: Give the integer id of the sample in the MCMC to be used to
generate ICs.
- output: the output directory for the ICs
- augment: whether to increase resolution by augmenting randomly the
small scales (with unconstrained gaussian random numbers of variance
1). This parameter must be understood as a power of two multiplier to
the base resolution. For example, 'augment 2' on a run at 256 will
yield a simulation at 512. 'augment 4' will yield a simulation at
1024.
Generating initial conditions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*TO BE IMPROVED*
The main script can be found
`here <https://bitbucket.org/bayesian_lss_team/borg_constrained_sims/src/master/>`__,
which generates ICs for one or a small number of steps in the MCMC
chain. You will need all the restart_* files, along with the mcmc_*
files of the step you want to analyse. You also need the Music
executable. Using ``src.bcs``, the default is to generate ICs over the
entire simulation volume, with resolution increased by a factor of
``fac_res`` (i.e. white noise generated up to this scale). If you set
``select_inner_region=True`` then ICs are generated over only the
central half of the simulation volume, which effectively doubles your
resolution. An alternative is to use src.bcs_zoom, which instead zooms
in on the central sphere with radius and resolution as specified in that
script. In this case ``fac_res`` is irrelevant. Besides the properties
of the ellipse, the relevant parameter is the number in levelmax which
is the resolution with which you want to zoom in (e.g. if you start with
a :math:`256^3` grid ``[levelmin=8]``, specifying ``levelmax=11`` will
mean the zoom region starts at :math:`2048^3` resolution). For either
script you can choose to generate ICs for either the Ramses or Gadget
simulators.
Result
------
Gadget
~~~~~~
You will find a "gadget_param.txt" in the output directory and a file
called ic.gad in the subdirectory "ic". The log of the generation is in
"white_noise/"
Ramses
~~~~~~
Clumpfinding on the fly
^^^^^^^^^^^^^^^^^^^^^^^
There is a merger tree patch in Ramses which does halo-finding and
calculates merger trees as the simulation runs. The code is in
``patch/mergertree`` in the ramses folder where there is also some
documentation. The halos are calculated and linked at each of the
specified outputs of the simulation, so for the merger trees to be
reliable these outputs must be fairly frequent. The most conservative
choice is to have an output every coarse time step. The mergertree patch
is activated by specifying clumpfind=.true. in the run_params block, and
adding a clumpfind_params block to specify the parameters of the
clumpfinding. The extra files that this generates at each output are
halo_* (properties of the halos), clump_* (properties of the clumps,
essentially subhalos; this should include all the halos as well),
mergertree_* (information on the connected halos across the timesteps)
and progenitor_data_* (which links the halos from one step to the
next). If you wish to store the merger tree information more frequently
than the full particles (restart) information, you can hack the code in
``amr/output_amr`` to only output the ``part_*``, ``amr_*`` and
``grav_*`` files on some of the outputs (specified for example by the
scale factor ``aexp``). You can also hack the code in
``patch/mergertree/merger_tree.py`` to remove for example the
``clump_*`` files (if you only want to keep main halos), and/or remove
the ``progenitor_data_*`` files before the preceding snapshot when they
are no longer necessary. Finally, you may wish to concatenate the
remaining files (e.g. ``mergertree_*`` and ``halo_*``) over all the
processors.
Example namelist
^^^^^^^^^^^^^^^^^
.. code:: text
&RUN_PARAMS
cosmo=.true.
pic=.true.
poisson=.true.
hydro=.false.
nrestart=0
nremap=20
nsubcycle=1,1,1,1,20*2
ncontrol=1
clumpfind=.true.
verbose=.false.
debug=.false.
/
&INIT_PARAMS
aexp_ini=0.0142857
filetype='grafic'
initfile(1)='/cosma7/data/dp016/dc-desm1/Ramses_8600/ic/ramses_ic/level_008'
initfile(2)='/cosma7/data/dp016/dc-desm1/Ramses_8600/ic/ramses_ic/level_009'
initfile(3)='/cosma7/data/dp016/dc-desm1/Ramses_8600/ic/ramses_ic/level_010'
initfile(4)='/cosma7/data/dp016/dc-desm1/Ramses_8600/ic/ramses_ic/level_011'
/
&AMR_PARAMS
ngridmax=3500000
npartmax=8000000
levelmin=8
levelmax=19
nexpand=0,0,20*1
/
&REFINE_PARAMS
m_refine=30*8.
mass_cut_refine=2.32831e-10
ivar_refine=0
interpol_var=0
interpol_type=2
/
&CLUMPFIND_PARAMS
!max_past_snapshots=3
relevance_threshold=3 ! define what is noise, what real clump
density_threshold=80 ! rho_c: min density for cell to be in clump
saddle_threshold=200 ! rho_c: max density to be distinct structure
mass_threshold=100 ! keep only clumps with at least this many particles
ivar_clump=0 ! find clumps of mass density
clinfo=.true. ! print more data
unbind=.true. ! do particle unbinding
nmassbins=100 ! 100 mass bins for clump potentials
logbins=.true. ! use log bins to compute clump grav. potential
saddle_pot=.true. ! use strict unbinding definition
iter_properties=.true. ! iterate unbinding
conv_limit=0.01 ! limit when iterated clump properties converge
make_mergertree=.true.
nmost_bound=200
make_mock_galaxies=.false.
/
&OUTPUT_PARAMS
aout=1.
foutput=1
/
White
~~~~~
This is a dummy output for which the output is only the whitened initial
conditions.

View file

@ -0,0 +1,189 @@
Postprocessing scripts
======================
ARES Plotting library
---------------------
There is one repository that concentrate plotting routines and ready to
use program to postprocess ARES MCMC chains. It is located at
https://bitbucket.org/bayesian_lss_team/ares_visualization/. Please
enrich it at the same time as this page.
show_log_likelihood.py
~~~~~~~~~~~~~~~~~~~~~~
To be run in the directory containing the MCMC chain. Compute the power
spectrum of initial conditions, binned correctly, for each sample and
store it into a NPZ file. The output can be used by plot_power.py
plot_power.py
~~~~~~~~~~~~~
Contrast field in scatter plot
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python3
import numpy as np
dset_test=np.ones((32,32,32))
def contrast2cic(dset):
Nbox=dset.shape[0]
cic=np.zeros((Nbox,Nbox,Nbox))
min_dset=min(dset.flatten())
for m in range(Nbox):
for k in range(Nbox):
for j in range(Nbox):
d=dset[m,k,j]
cic[m][k][j]=int(np.floor((1+d)/(1+min_dset)))
return cic
cic=contrast2cic(dset_test)
Acceptance rate
~~~~~~~~~~~~~~~
.. code:: python3
import matplotlib.pyplot as plt
import h5py
acceptance=[]
accept=0
for m in range(latest_mcmc()):
f1=h5py.File('mcmc_'+str(m)+'.h5','r')
accept=accept+np.array(f1['scalars/hades_accept_count'][0])
acceptance.append(accept/(m+1))
plt.plot(acceptance)
plt.show()
Create gifs
~~~~~~~~~~~
.. code:: python3
import imageio
images = []
filenames=[]
for m in range(64,88):
filenames.append('galaxy_catalogue_0x - slice '+str(m)+'.png')
for filename in filenames:
images.append(imageio.imread(filename))
imageio.mimsave('datax.gif', images)
Scatter plot from galaxy counts in restart.h5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python3
import h5py
import pyplot.matplotlib as plt
f=h5py.File('restart.h5_0','r')
data1=np.array(f['scalars/galaxy_data_0'])
xgrid=[]
ygrid=[]
zgrid=[]
for m in range(Nbox):
for k in range(Nbox):
for j in range(Nbox):
if data1[m,k,j]!=0:
xgrid.append(m)
ygrid.append(k)
zgrid.append(j)
fig = plt.figure()
ax = Axes3D(fig)
ax.view_init(0, 80)
ax.scatter(xgrid, ygrid, zgrid,s=1.5,alpha=0.2,c='black')
plt.show()
Plot data on mask
~~~~~~~~~~~~~~~~~
.. code:: python3
import numpy as np
import healpy
# Import your ra and dec from the data
# Then projscatter wants a specific transform
# wrt what BORG outputs
ra=np.ones(10)
dec=np.ones(10)
corr_dec=-(np.pi/2.0)*np.ones(len(ra))
decmask=corr_dec+dec
corr_ra=np.pi*np.ones(len(ra))
ramask=ra+corr_ra
map='WISExSCOSmask.fits.gz'
mask = hp.read_map(map)
hp.mollview(mask,title='WISE mock')
hp.projscatter(decmask,ramask,s=0.2)
Non-plotting scripts
--------------------
Download files from remote server (with authentication):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python3
from requests.auth import HTTPBasicAuth
import requests
def download_from_URL(o):
URL='https://mysite.com/dir1/dir2/'+'filename_'+str(o)+'.h5'
r = requests.get(URL, auth=HTTPBasicAuth('login', 'password'),allow_redirects=True)
open('downloaded_file_'+str(o)+'.h5', 'wb').write(r.content)
return None
for o in range(10000):
download_from_URL(o)
This works for horizon with the login and password provided in the
corresponding page.
Get latest mcmc_%d.h5 file from a BORG run
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python3
import os
def latest_mcmc():
strings=[]
for root, dirs, files in os.walk("."):
for file in files:
if file.startswith("mcmc_"):
string=str(os.path.join(root, file))[7:]
string=string.replace('.h5','')
strings.append(int(string))
return max(strings)
But beware: we want the file before the latest one to not destroy the writing process in the restart files.
Template generator
------------------
Jens Jasche has started a
specific repository that gather python algorithms to post-process the
BORG density field to create predictive maps for other effects on the
cosmic sky. The effects that has been implemented are the following:
- CMB lensing
- Integrated Sachs-Wolfe effect
- Shapiro Time-delay
The repository is available on bitbucket `here <https://bitbucket.org/jjasche/lss_template_generator/>`__.

View file

@ -0,0 +1,9 @@
Running the executables
#######################
.. _running:
.. include:: running/ARES_Tutorials.inc.rst
.. include:: running/HADES_Tutorials.inc.rst
.. include:: running/BORG_Tutorials.inc.rst
.. include:: running/BORG_with_simulation_data.inc.rst

View file

@ -0,0 +1,337 @@
Running ARES: basic run on 2M++
===============================
Introduction
------------
First of course please :ref:`build ARES<building>`. We will call $BUILD,
the directory where you built the entire code. By default it is located
in the source directory in the subdirectory build. But if you have
specified a different directory with the argument "--build-dir", then
$BUILD represent that last directory name. We will also call $SOURCE the
top source directory of ARES. In that case ``$SOURCE/README.rst`` would
point to the README file at the top source directory.
After a successful build you should find a binary program in
$BUILD/src/ares3. This is the main ARES3 program. If you type
``$BUILD/src/ares3``, then you should get the following output:
.. code:: text
setupMPI with threads
Initializing console.
[0/1] [DEBUG ] INIT: MPI/FFTW
[STD ]
[STD ] o
[STD ] ,-.|____________________
[STD ] O==+-|(>-------- -- - .>
[STD ] `- |"""""""d88b"""""""""
[STD ] | o d8P 88b
[STD ] | \ 98=, =88
[STD ] | \ 8b _, 88b
[STD ] `._ `. 8`..'888
[STD ] | \--'\ `-8___ __________________________________
[STD ] \`-. \ ARES3
[STD ] `. \ - - / < (c) Jens Jasche 2012 - 2017
[STD ] \ `--- ___/|_-\ Guilhem Lavaux 2014 - 2017
[STD ] |._ _. |_-| __________________________________
[STD ] \ _ _ /.-\
[STD ] | -! . !- || |
[STD ] \ "| ^ |" /\ |
[STD ] =oO)<>(Oo= \ /
[STD ] d8888888b < \
[STD ] d888888888b \_/
[STD ] d888888888b
[STD ]
[STD ] Please acknowledge XXXX
[0/1] [DEBUG ] INIT: FFTW/WISDOM
[0/1] [INFO ] Starting ARES3. rank=0, size=1
[0/1] [INFO ] ARES3 base version c9e74ec93121f9d99a3b2fecb859206b4a8b74a3
[0/1] [ERROR ] ARES3 requires exactly two parameters: INIT or RESUME as first parameter and the configuration file as second parameter.
We will now go step by step for this output:
- First we encounter ``setupMPI with threads``, it means the code asks
for the MPI system to support multithreading for hybrid
parallelization. The console is then initialized as it needs MPI to
properly chose which file should receive the output.
- After that the console logs get a prefix ``[R/N]``, with R and N
integers. R is the MPI rank of the task emitting the information, and
N is total number of MPI tasks. Then there is another ``[ XXXX ]``,
where XXXX indicates the console log level. The amount of output you
get is dependent on some flags in the configuration file. But by
default you get everything, till the ini file is read. Note that
"STD" level is only printed on the task of rank 0.
- Then ``[0/1] [DEBUG ] INIT: MPI/FFTW`` indicates that the code asks
for the MPI variant of FFTW to be initialized. It means the code is
indeed compiled with FFTW with MPI support.
- The ascii art logo is then shown.
- ``[0/1] [DEBUG ] INIT: FFTW/WISDOM`` indicates the wisdom is
attempted to be recovered for faster FFTW plan constructions.
- ``[0/1] [INFO ] Starting ARES3. rank=0, size=1`` Reminds you that we
are indeed starting an MPI run.
- ``ARES3 base version XXXX`` gives the git version of the ARES base
git repository used to construct the binary. In case of issue it is
nice to report this number to check if any patch has been applied
compared to other repository and make debugging life easier.
- Finally you get an error::
ARES3 requires exactly two parameters: INIT or RESUME as first parameter and the configuration file as second parameter,
which tells you that you need to pass down two arguments: the first
one is either "INIT" or "RESUME" (though more flags are available but
they are documented later on) and the second is the parameter file.
First run
---------
Now we can proceed with setting up an actual run. You can use the files available in ``$SOURCE/examples/``. There are (as of 27.10.2020)
ini files for running the executables on the given datasets (in this case the 2MPP dataset). Create a directory (e.g.
test_ares/, which we call $TEST_DIR) and now proceeds as follow:
.. code:: bash
cd $TEST_DIR
$BUILD/src/ares3 INIT $SOURCE/examples/2mpp_ares3.ini
Note if you are using SLURM, you should execute with ``srun``. With the above options ares3 will start as a single MPI task, and allocate
as many parallel threads as the computer can support. The top of the output is the following (after the splash and the other outputs
aforementioned):
.. code:: text
[0/1] [DEBUG ] Parsing ini file
[0/1] [DEBUG ] Retrieving system tree
[0/1] [DEBUG ] Retrieving run tree
[0/1] [DEBUG ] Creating array which is UNALIGNED
[0/1] [DEBUG ] Creating array which is UNALIGNED
[INFO S ] Got base resolution at 64 x 64 x 64
[INFO S ] Data and model will use the folllowing method: 'Nearest Grid point number count'
[0/1] [INFO ] Initializing 4 threaded random number generators
[0/1] [INFO ] Entering initForegrounds
[0/1] [INFO ] Done
[INFO S ] Entering loadGalaxySurveyCatalog(0)
[STD ] | Reading galaxy survey file '2MPP.txt'
[0/1] [WARNING] | I used a default weight of 1
[0/1] [WARNING] | I used a default weight of 1
[STD ] | Receive 67224 galaxies in total
[INFO S ] | Set the bias to [1]
[INFO S ] | No initial mean density value set, use nmean=1
[INFO S ] | Load sky completeness map 'completeness_11_5.fits.gz'
Again, we will explain some of these lines
- ``Got base resolution at 64 x 64 x 64`` indicates ARES understands
you want to use a base grid of 64x64x64. In the case of HADES however
multiple of this grid may be used.
- ``Data and model will use the folllowing method: 'Nearest Grid point number count'``
indicates that galaxies are going to binned.
- ``[0/1] [INFO ] Initializing 4 threaded random number generators``,
we clearly see here that the code is setting up itself to use 4
threads. In particular the random number generator is getting seeded
appropriately to generate different sequences on each of the thread.
- ``[STD ] | Reading galaxy survey file '2MPP.txt'`` indicates the data
are being read from the indicated file.
- ``[0/1] [WARNING] | I used a default weight of 1``, in the case of
this file there is a missing last column which can indicate the
weight. By default it gets set to one.
The code then continues proceeding. All the detailed outputs are sent to
logares.txt_rank_0 . The last digit indices the MPI rank task , as each
task will output in its own file to avoid synchronization problems. Also
it reduces the clutter in the final file.
Restarting
----------
If for some reason you have to interrupt the run, then it is not a
problem to resuming it at the same place. ARES by default saves a
restart file each time a MCMC file is emitted. This can be reduced by
changing the flag "savePeriodicity" to an integer number indicating the
periodicity (i.e. 5 to emit a restart file every 5 mcmc files).
Then you can resume the run using: ``$BUILD/src/ares3 RESUME 2mpp.ini``.
ARES will initialize itself, then reset its internal state using the
values contained in the restart file. Note that there is one restart
file per MPI task (thus the suffix ``_0`` if you are running with only
the multithreaded mode).
Checking the output
-------------------
After some (maybe very long) time, you can check the output files that
have been created by ARES. By default the ini file is set to run for
10,000 samples, so waiting for the end of the run will take possibly
several hours on a classic workstation. The end of the run will conclude
like:
.. code:: text
[STD ] Reached end of the loop. Writing restart file.
[0/1] [INFO ] Cleaning up parallel random number generators
[0/1] [INFO ] Cleaning up Messenger-Signal
[0/1] [INFO ] Cleaning up Powerspectrum sampler (b)
[0/1] [INFO ] Cleaning up Powerspectrum sampler (a)
Looking at the powerspectrum
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now we are going to set the ``PYTHONPATH`` to ``$SOURCE/scripts``. I.e.,
if you are using bash you can run the following piece:
.. code:: bash
PYTHONPATH=$SOURCE/scripts:$PYTHONPATH
export PYTHONPATH
Then we can start analyzing the powerspectrum of the elements of the
chain. You can copy paste the following code in a python file (let's
call it show_powerspectrum.py) and run it with your python3 interpreter
(depending on your installation it can be python3, python3.5, python3.6
or later):
.. code:: python
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import ares_tools as at
chain = at.read_chain_h5(".", ['scalars.powerspectrum'])
meta = at.read_all_h5("restart.h5_0", lazy=True)
fig = plt.figure(1)
ax = fig.add_subplot(111)
ax.loglog(meta.scalars.k_modes, chain['scalars.powerspectrum'].transpose(),color='k',alpha=0.05)
ax.set_xlim(1e-2,1)
ax.set_ylim(1e2,1e6)
ax.set_xlabel('$k$ ($h$ Mpc$^{-1}$)')
ax.set_ylabel('$P(k)$ (($h^{-1}$ Mpc)$^3$)')
fig.savefig("powerspectrum.png")
We will see what each of the most important lines are doing:
- line 1-2: we import matplotlib and enforce that we only need the Agg
backend (to avoid needing a real display connection).
- line 4: we import the ares_tools analysis scripts
- line 6: we ask to read the entire chain contained in the current path
(``"."``). Also we request to obtain the field
``scalars.powerspectrum`` from each file. The result is stored in a
named column array ``chain``. We could have asked to only partially
read the chain using the keyword ``start``, ``end`` or ``step``. Some
help is available using the command ``help(at.read_chain_h5)``.
- line 8: we ask to read the entirety of ``restart.h5_0``, however it
is done lazily (``lazy=True``), meaning the data is not read in
central memory but only referenced to data in the file. The fields of
the file are available as recursive objects in ``meta``. For example,
``scalars.k_modes`` here is available as the array stored as
``meta.scalars.k_modes``. While we are at looking this array, it
corresponds to the left side of the bins of powerspectra contained in
``scalars.powerspectrum``.
- line 12: we plot all the spectra using k_modes on the x-axis and the
content of ``chain['scalars.powerspectrum']`` on the y-axis. The
array is transposed so that we get bins in *k* on the first axis of
the array, and each sample on the second one. This allows to use only
one call to ``ax.loglog``.
- line 18: we save the result in the given image file.
After this script is run, you will get a plot containing all the sampled
powerspectra in the chain. It is saved in *powerspectrum.png*
| Running this script will result typically in the following plot (here
for 10,000 samples):
.. raw:: html
<center>
.. figure:: /user/running/ARES_Tutorials_files/Powerspectrum_tutorial1_ares.png
:alt: Powerspectrum_tutorial1_ares.png
:width: 400px
running/ARES_Tutorials_files/Powerspectrum_tutorial1_ares.png
.. raw:: html
</center>
Looking at the density field
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now we can also compute the aposteriori mean and standard deviation per
voxel of the matter density field. The following script does exactly
this:
.. code:: python
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
import ares_tools as at
density = at.read_chain_avg_dev(".", ['scalars.s_field'], slicer=lambda x: x[32,:,:], do_dev=True, step=1)
meta = at.read_all_h5("restart.h5_0", lazy=True)
L = meta.scalars.L0[0]
N = meta.scalars.N0[0]
ix = np.arange(N)*L/(N-1) - 0.5*L
fig = plt.figure(1, figsize=(16,5))
ax = fig.add_subplot(121)
im = ax.pcolormesh(ix[:,None].repeat(N,axis=1), ix[None,:].repeat(N,axis=0), density['scalars.s_field'][0],vmin=-1,vmax=2)
ax.set_aspect('equal')
ax.set_xlim(-L/2,L/2)
ax.set_ylim(-L/2,L/2)
ax.set_title('Mean density')
ax.set_xlabel('$h^{-1}$ Mpc')
ax.set_ylabel('$h^{-1}$ Mpc')
fig.colorbar(im)
ax = fig.add_subplot(122)
im = ax.pcolormesh(ix[:,None].repeat(N,axis=1), ix[None,:].repeat(N,axis=0), density['scalars.s_field'][1],vmin=0,vmax=1.8)
ax.set_aspect('equal')
ax.set_xlim(-L/2,L/2)
ax.set_ylim(-L/2,L/2)
ax.set_xlabel('$h^{-1}$ Mpc')
ax.set_ylabel('$h^{-1}$ Mpc')
ax.set_title('Standard deviation')
fig.colorbar(im)
fig.savefig("density.png")
In this script we introduce ``read_chain_avg_dev`` (line 7) which allows
to compute mean and standard deviation without loading the chain in
memory. Additionally the *slicer* argument allows to only partially load
the field. The *step* argument allows for thinning the chain by the
indicator factor. In the above case we do not thin the chain. Also we
request the field *scalars.s_field* (which contains the density field)
and take only the plane *x=32*. The returned object is a named-columned
object. Also, *density['scalars.s_field']* is a [2,M0,...] array, with
M0,... being the dimensions returned by the slicer function. The first
slice is the mean field (as can be seen on line 18) and the second is
the standard deviation (line 28).
Once the script is run we get the following pictures:
.. raw:: html
<center>
.. figure:: /user/running/ARES_Tutorials_files/Density_tutorial1_ares.png
:alt: Density_tutorial1_ares.png
Density_tutorial1_ares.png
.. raw:: html
</center>
We can see that there are large scale features in the mean field (like
ringing here). Though even in perfect conditions this feature could
occur, this could also indicate a defect in the selection
characterization process.

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

View file

@ -0,0 +1,445 @@
Running BORG: a tutorial to perform a cosmological analysis
===========================================================
Downloading and Installing BORG
-------------------------------
This note provides a step by step instruction for downloading and
installing the BORG software package. This step-by-step instruction has
been done using a MacBook Air running OS X El Capitan. I encourage
readers to modify this description as may be required to install BORG on
a different OS. Please indicate all necessary modifications and which OS
was used.
Some Prerequisites
~~~~~~~~~~~~~~~~~~
cmake≥ 3.10 automake libtool pkg-config gcc ≥ 7 , or intel compiler (≥
2018), or Clang (≥ 7) wget (to download dependencies; the flag
use-predownload can be used to bypass this dependency if you already
have downloaded the required files in the ``downloads`` directory)
Clone the repository from BitBucket
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To clone the ARES repository execute the following git command in a
console:
.. code:: bash
git clone --recursive git@bitbucket.org:bayesian_lss_team/ares.git
After the clone is successful, you shall change directory to ``ares``,
and execute:
.. code:: bash
bash get-aquila-modules.sh --clone
Ensure that correct branches are setup for the submodules using:
.. code:: bash
bash get-aquila-modules.sh --branch-set
If you want to check the status of the currently checked out ARES and
its modules, please run:
.. code:: bash
bash get-aquila-modules.sh --status
You should see the following output:
.. code:: text
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This script can be run only by Aquila members.
if your bitbucket login is not accredited the next operations will fail.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Checking GIT status...
Root tree (branch master) : good. All clear.
Module ares_fg (branch master) : good. All clear.
Module borg (branch master) : good. All clear.
Module dm_sheet (branch master) : good. All clear.
Module hades (branch master) : good. All clear.
Module hmclet (branch master) : good. All clear.
Module python (branch master) : good. All clear.
Building BORG
~~~~~~~~~~~~~
To save time and bandwidth it is advised to pre-download the
dependencies that will be used as part of the building procedure. You
can do that with
.. code:: bash
bash build.sh --download-deps
That will download a number of tar.gz which are put in the
``downloads/`` folder.
Then you can configure the build itself:
.. code:: bash
bash build.sh --cmake CMAKE_BINARY --c-compiler YOUR_PREFERRED_C_COMPILER --cxx-compiler YOUR_PREFERRED_CXX_COMPILER --use-predownload
E.g. (This probably needs to be adjusted for your computer.):
::
bash build.sh --cmake /usr/local/Cellar/cmake/3.15.5/bin/cmake --c-compiler /usr/local/bin/gcc-9 --cxx-compiler /usr/local/bin/g++-9 --use-predownload
Once the configure is successful you should see a final output similar
to this:
.. code:: text
------------------------------------------------------------------
Configuration done.
Move to /Volumes/EXTERN/software/borg_fresh/ares/build and type 'make' now.
Please check the configuration of your MPI C compiler. You may need
to set an environment variable to use the proper compiler.
Some example (for SH/BASH shells):
- OpenMPI:
OMPI_CC=/usr/local/bin/gcc-9
OMPI_CXX=/usr/local/bin/g++-9
export OMPI_CC OMPI_CXX
------------------------------------------------------------------
It tells you to move to the default build directory using ``cd build``,
after what you can type ``make``. To speed up the compilation you can
use more computing power by adding a ``-j`` option. For example
.. code:: bash
make -j4
will start 4 compilations at once (thus keep 4 cores busy all the time
typically). Note, that the compilation can take some time.
Running a test example
----------------------
The ARES repository comes with some standard examples for LSS analysis.
Here we will use a simple standard unit example for BORG. From your ARES
base directory change to the examples folder:
.. code:: bash
cd examples
To start a BORG run just execute the following code in the console:
.. code:: bash
../build/src/hades3 INIT borg_unit_example.ini
BORG will now execute a simple MCMC. You can interupt calculation at any
time. To resume the run you can just type:
.. code:: bash
../build/src/hades3 RESUME borg_unit_example.ini
You need at least on the order of 1000 samples to pass the initial
warm-up phase of the sampler. As the execution of the code will consume
about 2GB of your storage, we suggest to execute BORG in a directory
with sufficient free hard disk storage.
Analysing results
-----------------
Now we will look at the out puts generated by the BORG run. Note, that
you do not have to wait for the run to complete, but you can already
investigate intermediate results while BORG still runs. BORG results are
stored in two major HDF5 files, the restart and the mcmc files. The
restart files contain all information on the state of the Markov Chain
required to resume the Markov Chain if it has been interrupted. The
restart file also contains static information, that will not change
during the run, such as the data, selection functions and masks and
other settings. The mcmc files contain the current state of the Markov
Chain. They are indexed by the current step in the chain, and contain
the current sampled values of density fields, power-spectra, galaxy bias
and cosmological parameters etc.
Opening files
~~~~~~~~~~~~~
The required python preamble:
.. code:: ipython3
import numpy as np
import h5py as h5
import matplotlib.pyplot as plt
import ares_tools as at
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
.. parsed-literal::
Skipping VTK tools
Now please indicate the path where you stored your BORG run:
.. code:: ipython3
fdir='../testbed/'
Investigating the restart file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The restart file can be opened by
.. code:: ipython3
hf=h5.File(fdir+'restart.h5_0')
The content of the file can be investigated by listing the keys of the
scalar section
.. code:: ipython3
list(hf['scalars'].keys())
.. code:: text
['ARES_version',
'BORG_final_density',
'BORG_version',
'BORG_vobs',
'K_MAX',
'K_MIN',
'L0',
'L1',
'L2',
'MCMC_STEP',
'N0',
'N1',
'N2',
'N2_HC',
'N2real',
'NCAT',
'NFOREGROUNDS',
'NUM_MODES',
'Ndata0',
'Ndata1',
'Ndata2',
'adjust_mode_multiplier',
'ares_heat',
'bias_sampler_blocked',
'borg_a_final',
'borg_a_initial',
'catalog_foreground_coefficient_0',
'catalog_foreground_maps_0',
'corner0',
'corner1',
'corner2',
'cosmology',
'forcesampling',
'fourierLocalSize',
'fourierLocalSize1',
'galaxy_bias_0',
'galaxy_bias_ref_0',
'galaxy_data_0',
'galaxy_nmean_0',
'galaxy_sel_window_0',
'galaxy_selection_info_0',
'galaxy_selection_type_0',
'galaxy_synthetic_sel_window_0',
'gravity.do_rsd',
'growth_factor',
'hades_accept_count',
'hades_attempt_count',
'hades_mass',
'hades_sampler_blocked',
'hmc_Elh',
'hmc_Eprior',
'hmc_bad_sample',
'hmc_force_save_final',
'k_keys',
'k_modes',
'k_nmodes',
'key_counts',
'lightcone',
'localN0',
'localN1',
'localNdata0',
'localNdata1',
'localNdata2',
'localNdata3',
'localNdata4',
'localNdata5',
'momentum_field',
'nmean_sampler_blocked',
'part_factor',
'pm_nsteps',
'pm_start_z',
'powerspectrum',
'projection_model',
's_field',
's_hat_field',
'sigma8_sampler_blocked',
'startN0',
'startN1',
'supersampling',
'tCOLA',
'total_foreground_blocked']
For example the input galaxy data can be viewed by:
.. code:: ipython3
data=np.array(hf['scalars/galaxy_data_0'])
#Plot data
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16, 8))
ax1.set_title('A Slice through the y-z plane of the data cube')
im1=ax1.imshow(data[16,:,:])
ax2.set_title('A Slice through the x-z plane of the data cube')
im2=ax2.imshow(data[:,16,:])
plt.show()
.. image:: /user/running/BORG_Tutorials_files/BORG_Tutorials_12_0.png
Investigating MCMC files
~~~~~~~~~~~~~~~~~~~~~~~~
MCMC files are indexed by the sample number :math:`i_{samp}`. Each file
can be opened separately. Suppose we want to open the :math:`10`\ th
mcmc file, then:
.. code:: ipython3
isamp=10 # sample number
fname_mcmc=fdir+"mcmc_"+str(isamp)+".h5"
hf=h5.File(fname_mcmc)
Inspect the content of the mcmc files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: ipython3
list(hf['scalars'].keys())
.. code:: text
['BORG_final_density',
'BORG_vobs',
'catalog_foreground_coefficient_0',
'cosmology',
'galaxy_bias_0',
'galaxy_nmean_0',
'hades_accept_count',
'hades_attempt_count',
'hmc_Elh',
'hmc_Eprior',
'powerspectrum',
's_field',
's_hat_field']
Plotting density fields
~~~~~~~~~~~~~~~~~~~~~~~
We can for instance be interested in plotting inferred initial and final
density samples.
.. code:: ipython3
delta_in=np.array(hf['scalars/s_field'])
delta_fi=np.array(hf['scalars/BORG_final_density'])
.. code:: ipython3
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16, 8))
ax1.set_title('initial density')
im1=ax1.imshow(delta_in[16,:,:])
ax2.set_title('final density')
im2=ax2.imshow(delta_fi[16,:,:])
plt.show()
.. image:: /user/running/BORG_Tutorials_files/BORG_Tutorials_19_0.png
Plotting the power-spectrum
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ARES repository provides some routines to analyse the BORG runs. A
particularly useful routine calculates the posterior power-spectra of
inferred initial density fields.
.. code:: ipython3
ss = at.analysis(fdir)
#Nbin is the number of modes used for the power-spectrum binning
opts=dict(Nbins=32,range=(0,ss.kmodes.max()))
#You can choose the sample numper
isamp=10
P=ss.compute_power_shat_spectrum(isamp, **opts)
kmode = 0.5*(P[2][1:]+P[2][:-1])
P_k = P[0]
plt.loglog(kmode,P_k)
plt.xlabel('k [h/Mpc]')
plt.xlabel(r'$P(k)$')
plt.show()
.. image:: /user/running/BORG_Tutorials_files/BORG_Tutorials_21_0.png
Monitoring power-spectrum warm-up phase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rather than looking just at individual posterior sample power-spectra we
can follow the evolution of power-spectra across the chain. Suppose you
want to monitor the first 100 samples.
.. code:: ipython3
Nsamp=100
PPs=[]
for isamp in np.arange(Nsamp):
PPs.append(ss.compute_power_shat_spectrum(isamp, **opts))
#plot power-spectra
color_idx = np.linspace(0, 1, Nsamp)
idx=0
for PP in PPs:
plt.loglog(kmode,PP[0],alpha=0.5,color=plt.cm.cool(color_idx[idx]), lw=1)
idx=idx+1
plt.xlim([min(kmode),max(kmode)])
plt.show()
.. image:: /user/running/BORG_Tutorials_files/BORG_Tutorials_23_0.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

View file

@ -0,0 +1,97 @@
Running BORG with simulation data
=================================
Pre-run test
------------
Gradient test
~~~~~~~~~~~~~
- Run ``<ARES_REPO_DIR>/build.sh`` with ``~~debug``
- Execute ``<BUILD_DIR>/libLSS/tests/test_gradient_<bias_model>``
- Grab ``dump.h5``.
- Plot analytical and numerical gradient (by finite difference), can
use the script in ``<ARES_REPO_DIR>/scripts/check_gradients.py``
- Example:
.. image:: /user/running/BORG_with_simulation_data_files/Gradient_test_for_2nd_order_bias.png
Setup and tuning
----------------
ARES configuration file and input files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- ARES configuration file:
- Documentation: :ref:`here<configuration_file>`
- Set SIMULATION = True in ARES configuration file,
``<FILENAME>.ini``.
- Set corner0, corner1, corner2 = 0.
- See, for example, `ARES configuration file for BORG runs using
SIMULATION
data <https://datashare.mpcdf.mpg.de/s/wzOJo6XwGDN1bbD>`__
- Halo catalog:
- ASCII format: 5 columns (ID, :math:`M_h`, :math:`R_h`, spin, x, y,
z, :math:`v_x`, :math:`v_y`, :math:`v_z`). See, for example,
`Python scripts to convert AHF output to ASCII catalog for
BORG <https://datashare.mpcdf.mpg.de/s/p0AZJhQEsxFl9M6>`__.
- HDF5 format: similar to above. See, for example, `Python scripts
to convert AHF output to HDF5 catalog for
BORG <https://datashare.mpcdf.mpg.de/s/lEwZDKQGWOsSiYo>`__.
- Trivial HEALPix mask where all pixels are set to 1 (choose approriate
NSIDE for your BORG grid resolution).
- Flat selection function in ASCII format. See, for example, `Flat
selection function
file <https://datashare.mpcdf.mpg.de/s/cdBlmHf0PPjuWXx>`__.
HMC performance tuning
~~~~~~~~~~~~~~~~~~~~~~
- Grab ``<OUTPUT_DIR>/hmc_performance.txt``.
- Plot :math:`\Delta H` and :math:`|\Delta H|`.
- Tune ``max_epsilon`` and ``max_timestep`` in the ``.ini`` file
accordingly.
- An example of bad HMC performance. The horizontal dashed line denotes
:math:`|\Delta H|=0.5`. Red dots denote negative :math:`\Delta H`:
.. image:: /user/running/BORG_with_simulation_data_files/Bad_HMC.png
- An example of good HMC performance:
.. image:: /user/running/BORG_with_simulation_data_files/Good_HMC.png
After-run checks
----------------
Convergence check
~~~~~~~~~~~~~~~~~
- Grab all ``<OUTPUT_DIR>/mcmc_<mcmc_identifier>.h5``.
- Plot :math:`P_{mm, \mathrm{ini}}^s(k)` vs.
:math:`P_{mm, \mathrm{ini}}^{\mathrm{theory}}(k)`.
.. figure:: /user/running/BORG_with_simulation_data_files/Pk_convergence.png
:alt: Pk convergence
BORG_with_simulation_data_files/Pk_convergence.png
Correlation check
~~~~~~~~~~~~~~~~~
- Compute noise residual in each BORG :math:`s`-th sample as
:math:`\vec{\delta}_{\mathrm{res}}^s=\vec{\delta}_{m,\mathrm{ini}}^s-\left\langle\vec{\delta}_{m,\mathrm{ini}}\right\rangle_{s'}`.
- Plot
:math:`r_{\mathrm{residual}}(\Delta s=s'-s)\equiv\frac{\mathrm{Cov}\left(\vec{\delta}_{\mathrm{res}}^s,\,\vec{\delta}_{\mathrm{res}}^{s'}\right)}{\sigma_s \sigma_{s'}}`.
.. figure:: /user/running/BORG_with_simulation_data_files/Residual_correlation_length.png
:alt: Residual correlation length
BORG_with_simulation_data_files/Residual_correlation_length.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 261 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 460 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 378 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

View file

@ -0,0 +1,30 @@
Running HADES
=============
Hades3 is built at the same time as ares3. The final binary is located
in ``$BUILD/src/hades3``, which is the main HADES3 program. Again typing
``$BUILD/src/hades3`` should give the following output:
.. code:: text
setupMPI with threads
Initializing console.
[0/1] [DEBUG ] INIT: MPI/FFTW
[STD ]
[STD ] /\_/\____, ____________________________
[STD ] ,___/\_/\ \ ~ / HADES3
[STD ] \ ~ \ ) XXX
[STD ] XXX / /\_/\___, (c) Jens Jasche 2012 - 2017
[STD ] \o-o/-o-o/ ~ / Guilhem Lavaux 2014 - 2017
[STD ] ) / \ XXX ____________________________
[STD ] _| / \ \_/
[STD ] ,-/ _ \_/ \
[STD ] / ( /____,__| )
[STD ] ( |_ ( ) \) _|
[STD ] _/ _) \ \__/ (_
[STD ] (,-(,(,(,/ \,),),)
[STD ] Please acknowledge XXXX
[0/1] [DEBUG ] INIT: FFTW/WISDOM
[0/1] [INFO ] Starting HADES3. rank=0, size=1
[0/1] [INFO ] ARES3 base version c9e74ec93121f9d99a3b2fecb859206b4a8b74a3
[0/1] [ERROR ] HADES3 requires exactly two parameters: INIT or RESUME as first parameter and the configuration file as second parameter.