better documentation

2019-01-12 12:31:26 +01:00 · 2019-01-12 12:31:26 +01:00 · 3a9705e3cc
commit 3a9705e3cc
parent b5eda7fc0a
1 changed files with 73 additions and 13 deletions
--- a/86
+++ b/86
@ -1,23 +1,83 @@
 Libsharp is configured, compiled and installed using GNU autotools.
-The most complicated step for the user is selecting the appropriate compiler
-flags (and in some cases the compiler).

-Here are a few (hopefully helpful) examples:
+If you have cloned the libsharp repository, you have to run
+"autoreconf -i" before starting the configuration, which requires several
+GNU developer tools to be available on your system.

-GCC, OpenMP, portable executable:
-CFLAGS="-std=c99 -O3 -ffast-math -flto -fopenmp" ./configure
+When using a release tarball, configuration is done via

-GCC, OpenMP, specific optimization for the target CPU:
-CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure
+[CC=...] [CFLAGS=...] ./configure

-GCC, no OpenMP, specific optimization for the target CPU:
-CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto" ./configure
+The following sections briefly describe possible choices for compilers and
+flags.

-Clang:
-CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -flto -fopenmp" ./configure

-MPI support:
-CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math -flto" ./configure
+Fast math
+---------
+
+Specifying "-ffast-math" is important for all compilers, since it allows the
+compiler to fuse multiplications and additions into FMA instructions, which is
+forbidden by the C99 standard. Since FMAs are a central aspect of the algorithm,
+they are needed for optimum performance.
+
+If you are calling libsharp from other code which requires strict adherence
+to the C99 standard, you should still be able to compile libsharp with
+"-ffast-math" without any problems.
+
+
+Runtime CPU selection with gcc
+------------------------------
+
+When using a recent gcc (6.0 and newer) on an x86_64 platform, the build
+machinery will compile the time-critical functions for several different
+architectures (SSE2, AVX, AVX2, FMA3, FMA4, AVX512F), and the appropriate
+implementation will be selected at runtime.
+This only happens if you do _not_ explicitly specify a target architecture via
+the compiler flags. I.e., please do _not_ specify "-march=native" or
+"-mtarget=avx" or similar if you want a portable binary that will run
+efficiently on different x86_64 CPUs.
+If you are compiling libsharp for a particular target CPU only, or if you are
+using a different compiler, however, "-march-native" should be used. The
+resulting binary will most likely not run on other computers, though.
+
+
+OpenMP
+------
+
+OpenMP should be switched on for maximum performance, and at runtime
+OMP_NUM_THREADS should be set to the number of hardware threads (not physical
+cores) of the system.
+(Usually this is  already the default setting when OMP_NUM_THREADS is not
+specified.)
+
+
+MPI
+---
+
+MPI support is enabled by using the MPI compiler (typically "mpicc") _and_
+adding the flag "-DUSE_MPI".
+When using MPI and OpenMP simultaneously, the product of MPI tasks per node
+and OMP_NUM_THREADS should be equal to the number of hardware threads available
+on the node. One MPI task per node should result in the best performance.
+
+
+Example configure invocations
+=============================
+
+GCC, OpenMP, portable binary:
+CFLAGS="-std=c99 -O3 -ffast-math -fopenmp" ./configure
+
+GCC, no OpenMP, portable binary:
+CFLAGS="-std=c99 -O3 -ffast-math" ./configure
+
+Clang, OpenMP, nonportable binary:
+CC=clang CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure
+
+Intel C compiler, OpenMP, nonportable binary:
+CC=icc CFLAGS="-std=c99 -O3 -march=native -ffast-math -fopenmp" ./configure
+
+MPI support, nonportable binary:
+CC=mpicc CFLAGS="-DUSE_MPI -std=c99 -O3 -march=native -ffast-math" ./configure

 Additional GCC flags for pedantic warning and debugging: