libsharp2/COMPILE
2019-12-06 13:53:27 +01:00

87 lines
3.2 KiB
Text

Libsharp2 is configured, compiled and installed using GNU autotools.
If you have cloned the libsharp2 repository, you have to run
"autoreconf -i" before starting the configuration, which requires several
GNU developer tools to be available on your system.
When using a release tarball, configuration is done via
[CC=...] [CFLAGS=...] ./configure
The following sections briefly describe possible choices for compilers and
flags.
Fast math
---------
Specifying "-ffast-math" or "-ffp-contract=fast" is important for all compilers,
since it allows the compiler to fuse multiplications and additions into FMA
instructions, which is forbidden by the C99 standard. Since FMAs are a central
aspect of the algorithm, they are needed for optimum performance.
If you are calling libsharp2 from other code which requires strict adherence
to the C99 standard, you should still be able to compile libsharp2 with
"-ffast-math" without any problems.
Runtime CPU selection with gcc and clang
----------------------------------------
When using a recent gcc (6.0 and newer) or a recent clang (successfully tested
with versions 6 and 7) on an x86_64 platform, the build machinery can compile
the time-critical functions for several different architectures (SSE2, AVX,
AVX2, FMA3, FMA4, AVX512F), and the appropriate implementation will be selected
at runtime.
This is enabled by passing "-DMULTIARCH" as part of the CFLAGS.
If this is enabled, please do _not_ specify "-march=native" or
"-mtarget=avx" or similar!
If you are compiling libsharp2 for a particular target CPU only, or if you are
using a different compiler, however, "-march-native" should be used. The
resulting binary will most likely not run on other computers, though.
OpenMP
------
OpenMP is enabled by default if the selected compiler supports it.
It can be disabled at configuration time by specifying "--disable-openmp" at the
configure command line.
At runtime OMP_NUM_THREADS should be set to the number of hardware threads
(not physical cores) of the system.
(Usually this is already the default setting when OMP_NUM_THREADS is not
specified.)
MPI
---
MPI support is enabled by using the MPI compiler (typically "mpicc") _and_
adding the flag "-DUSE_MPI".
When using MPI and OpenMP simultaneously, the product of MPI tasks per node
and OMP_NUM_THREADS should be equal to the number of hardware threads available
on the node. One MPI task per node should result in the best performance.
Example configure invocations
=============================
GCC, OpenMP, portable binary:
CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure
GCC, no OpenMP, portable binary:
CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure --disable-openmp
Clang, OpenMP, portable binary:
CC=clang CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure
Intel C compiler, OpenMP, nonportable binary:
CC=icc CFLAGS="-std=c99 -O3 -march=native -ffast-math -D__PURE_INTEL_C99_HEADERS__" ./configure
MPI support, OpenMP, portable binary:
CC=mpicc CFLAGS="-DUSE_MPI -DMULTIARCH -std=c99 -O3 -ffast-math" ./configure
Additional GCC flags for pedantic warning and debugging:
-Wall -Wextra -Wshadow -Wmissing-prototypes -Wfatal-errors -pedantic -g
-fsanitize=address