Libsharp2 is configured, compiled and installed using GNU autotools. If you have cloned the libsharp2 repository, you have to run "autoreconf -i" before starting the configuration, which requires several GNU developer tools to be available on your system. When using a release tarball, configuration is done via [CC=...] [CFLAGS=...] ./configure The following sections briefly describe possible choices for compilers and flags. Fast math --------- Specifying "-ffast-math" or "-ffp-contract=fast" is important for all compilers, since it allows the compiler to fuse multiplications and additions into FMA instructions, which is forbidden by the C99 standard. Since FMAs are a central aspect of the algorithm, they are needed for optimum performance. If you are calling libsharp2 from other code which requires strict adherence to the C99 standard, you should still be able to compile libsharp2 with "-ffast-math" without any problems. Runtime CPU selection with gcc and clang ---------------------------------------- When using a recent gcc (6.0 and newer) or a recent clang (successfully tested with versions 6 and 7) on an x86_64 platform, the build machinery can compile the time-critical functions for several different architectures (SSE2, AVX, AVX2, FMA3, FMA4, AVX512F), and the appropriate implementation will be selected at runtime. This is enabled by passing "-DMULTIARCH" as part of the CFLAGS. If this is enabled, please do _not_ specify "-march=native" or "-mtarget=avx" or similar! If you are compiling libsharp2 for a particular target CPU only, or if you are using a different compiler, however, "-march-native" should be used. The resulting binary will most likely not run on other computers, though. OpenMP ------ OpenMP is enabled by default if the selected compiler supports it. It can be disabled at configuration time by specifying "--disable-openmp" at the configure command line. At runtime OMP_NUM_THREADS should be set to the number of hardware threads (not physical cores) of the system. (Usually this is already the default setting when OMP_NUM_THREADS is not specified.) MPI --- MPI support is enabled by using the MPI compiler (typically "mpicc") _and_ adding the flag "-DUSE_MPI". When using MPI and OpenMP simultaneously, the product of MPI tasks per node and OMP_NUM_THREADS should be equal to the number of hardware threads available on the node. One MPI task per node should result in the best performance. Example configure invocations ============================= GCC, OpenMP, portable binary: CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure GCC, no OpenMP, portable binary: CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure --disable-openmp Clang, OpenMP, portable binary: CC=clang CFLAGS="-DMULTIARCH -std=c99 -O3 -ffast-math" ./configure Intel C compiler, OpenMP, nonportable binary: CC=icc CFLAGS="-std=c99 -O3 -march=native -ffast-math -D__PURE_INTEL_C99_HEADERS__" ./configure MPI support, OpenMP, portable binary: CC=mpicc CFLAGS="-DUSE_MPI -DMULTIARCH -std=c99 -O3 -ffast-math" ./configure Additional GCC flags for pedantic warning and debugging: -Wall -Wextra -Wshadow -Wmissing-prototypes -Wfatal-errors -pedantic -g -fsanitize=address