better documentation

This commit is contained in:
Martin Reinecke 2019-01-12 12:31:26 +01:00
parent b5eda7fc0a
commit 3a9705e3cc

86
COMPILE
View file

@ -1,23 +1,83 @@
Libsharp is configured, compiled and installed using GNU autotools.
The most complicated step for the user is selecting the appropriate compiler
flags (and in some cases the compiler).
Here are a few (hopefully helpful) examples:
If you have cloned the libsharp repository, you have to run
"autoreconf -i" before starting the configuration, which requires several
GNU developer tools to be available on your system.
GCC, OpenMP, portable executable:
CFLAGS="-std=c99 -O3 -ffast-math -flto -fopenmp" ./configure
When using a release tarball, configuration is done via
GCC, OpenMP, specific optimization for the target CPU:
CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure
[CC=...] [CFLAGS=...] ./configure
GCC, no OpenMP, specific optimization for the target CPU:
CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto" ./configure
The following sections briefly describe possible choices for compilers and
flags.
Clang:
CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure
MPI support:
CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math -flto" ./configure
Fast math
---------
Specifying "-ffast-math" is important for all compilers, since it allows the
compiler to fuse multiplications and additions into FMA instructions, which is
forbidden by the C99 standard. Since FMAs are a central aspect of the algorithm,
they are needed for optimum performance.
If you are calling libsharp from other code which requires strict adherence
to the C99 standard, you should still be able to compile libsharp with
"-ffast-math" without any problems.
Runtime CPU selection with gcc
------------------------------
When using a recent gcc (6.0 and newer) on an x86_64 platform, the build
machinery will compile the time-critical functions for several different
architectures (SSE2, AVX, AVX2, FMA3, FMA4, AVX512F), and the appropriate
implementation will be selected at runtime.
This only happens if you do _not_ explicitly specify a target architecture via
the compiler flags. I.e., please do _not_ specify "-march=native" or
"-mtarget=avx" or similar if you want a portable binary that will run
efficiently on different x86_64 CPUs.
If you are compiling libsharp for a particular target CPU only, or if you are
using a different compiler, however, "-march-native" should be used. The
resulting binary will most likely not run on other computers, though.
OpenMP
------
OpenMP should be switched on for maximum performance, and at runtime
OMP_NUM_THREADS should be set to the number of hardware threads (not physical
cores) of the system.
(Usually this is already the default setting when OMP_NUM_THREADS is not
specified.)
MPI
---
MPI support is enabled by using the MPI compiler (typically "mpicc") _and_
adding the flag "-DUSE_MPI".
When using MPI and OpenMP simultaneously, the product of MPI tasks per node
and OMP_NUM_THREADS should be equal to the number of hardware threads available
on the node. One MPI task per node should result in the best performance.
Example configure invocations
=============================
GCC, OpenMP, portable binary:
CFLAGS="-std=c99 -O3 -ffast-math -fopenmp" ./configure
GCC, no OpenMP, portable binary:
CFLAGS="-std=c99 -O3 -ffast-math" ./configure
Clang, OpenMP, nonportable binary:
CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure
Intel C compiler, OpenMP, nonportable binary:
CC=icc CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure
MPI support, nonportable binary:
CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math" ./configure
Additional GCC flags for pedantic warning and debugging: