diff --git a/COMPILE b/COMPILE index 8823bd3..8a5f3cd 100644 --- a/COMPILE +++ b/COMPILE @@ -1,23 +1,83 @@ Libsharp is configured, compiled and installed using GNU autotools. -The most complicated step for the user is selecting the appropriate compiler -flags (and in some cases the compiler). -Here are a few (hopefully helpful) examples: +If you have cloned the libsharp repository, you have to run +"autoreconf -i" before starting the configuration, which requires several +GNU developer tools to be available on your system. -GCC, OpenMP, portable executable: -CFLAGS="-std=c99 -O3 -ffast-math -flto -fopenmp" ./configure +When using a release tarball, configuration is done via -GCC, OpenMP, specific optimization for the target CPU: -CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure +[CC=...] [CFLAGS=...] ./configure -GCC, no OpenMP, specific optimization for the target CPU: -CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto" ./configure +The following sections briefly describe possible choices for compilers and +flags. -Clang: -CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure -MPI support: -CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math -flto" ./configure +Fast math +--------- + +Specifying "-ffast-math" is important for all compilers, since it allows the +compiler to fuse multiplications and additions into FMA instructions, which is +forbidden by the C99 standard. Since FMAs are a central aspect of the algorithm, +they are needed for optimum performance. + +If you are calling libsharp from other code which requires strict adherence +to the C99 standard, you should still be able to compile libsharp with +"-ffast-math" without any problems. + + +Runtime CPU selection with gcc +------------------------------ + +When using a recent gcc (6.0 and newer) on an x86_64 platform, the build +machinery will compile the time-critical functions for several different +architectures (SSE2, AVX, AVX2, FMA3, FMA4, AVX512F), and the appropriate +implementation will be selected at runtime. +This only happens if you do _not_ explicitly specify a target architecture via +the compiler flags. I.e., please do _not_ specify "-march=native" or +"-mtarget=avx" or similar if you want a portable binary that will run +efficiently on different x86_64 CPUs. +If you are compiling libsharp for a particular target CPU only, or if you are +using a different compiler, however, "-march-native" should be used. The +resulting binary will most likely not run on other computers, though. + + +OpenMP +------ + +OpenMP should be switched on for maximum performance, and at runtime +OMP_NUM_THREADS should be set to the number of hardware threads (not physical +cores) of the system. +(Usually this is already the default setting when OMP_NUM_THREADS is not +specified.) + + +MPI +--- + +MPI support is enabled by using the MPI compiler (typically "mpicc") _and_ +adding the flag "-DUSE_MPI". +When using MPI and OpenMP simultaneously, the product of MPI tasks per node +and OMP_NUM_THREADS should be equal to the number of hardware threads available +on the node. One MPI task per node should result in the best performance. + + +Example configure invocations +============================= + +GCC, OpenMP, portable binary: +CFLAGS="-std=c99 -O3 -ffast-math -fopenmp" ./configure + +GCC, no OpenMP, portable binary: +CFLAGS="-std=c99 -O3 -ffast-math" ./configure + +Clang, OpenMP, nonportable binary: +CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure + +Intel C compiler, OpenMP, nonportable binary: +CC=icc CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure + +MPI support, nonportable binary: +CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math" ./configure Additional GCC flags for pedantic warning and debugging: