OpenMP is a convenient directive based approach to shared-memory work sharing. In GS2 this is currently restricted to distributing loop iterations over a team of threads. Only a small subset of loops are parallelised with OpenMP, however these should mostly represent the main areas of work in standard simulations.
To compile with OpenMP one can simply pass USE_OPENMP=on
to make. We
also support the use of threaded FFTW routines (FFTW3 only) so if
building with FFTs then one may need to add an additional library to
FFT_LIB
, -lfftw3_omp
. We attempt to handle this automatically in
the Makefile, but it may be necessary to modify this if using a
different library (e.g. mkl). One can then simply do:
make USE_OPENMP=on gs2
to build GS2 with OpenMP support enabled.
The environment variable OMP_NUM_THREADS
controls the maximum number
of threads to be used and this maximum is reported during GS2
initialisation. If unset this will default to the number of cores on
the system. It is usually advisable to explicitly set
OMP_NUM_THREADS
and whilst it's possible to over-subscribe a machine
(i.e. nproc * omp_num_threads > ncpu
) this will generally
significantly harm performance.
The optimal choice for OMP_NUM_THREADS
depends both on the problem
size and characteristics of the machine. For example, on Archer2 there
are 16 cores per NUMA region (sharing main memory) but performance
often drops considerably when using larger than 4 threads as groups of
four cores share L3 cache.
If is often recommended to run ulimit -s unlimited
prior to
launching OpenMP enabled executables. In addition to this one may need
to set the
OMP_STACKSIZE
environement variable to ensure that each thread has a sufficient
stack size. Failing to set this can lead to "stack smashing detected"
run time error messages. This is particularly important at higher
OMP_NUM_THREADS
values when the local problem size is large.
There are generally two motivations for using OpenMP in GS2: