Heisenbug
heisenbug is a computing cluster of the computational seismology group at LMU. It is an AMD EPYC based machine with 128 cores that can run 256 threads (near) simultaneously. It also has 2 GPGPUs (NVIDIA GeForce RTX 3090), that can be used to run the GPU version of SeisSol. The RTX 3090 belongs to a consumer kind of graphics cards and thus does not perform well with double precision. Therefore, it is preferable to compile SeisSol with single precision.
A module integrating all libraries relevant for compiling SeisSol with CUDA and SYCL is available on heisenbug.
It can be discovered at startup after adding the following to ~/.bashrc
:
module use /import/exception-dump/ulrich/spack/modules/linux-debian11-zen2
It is then loaded with:
# load the (first in the list) seissol-env module compiled with cuda support
module load $(module avail seissol-env/*-cuda-* | awk '/seissol-env/ {print $1}')
This module has been compiled based on the main branch of https://github.com/SeisSol/seissol-spack-aid with the command:
spack install -j 40 --fresh seissol-env +cuda %gcc@10
spack module tcl refresh $(spack find -d --format "{name}{/hash:5}" seissol-env +cuda)
Install YATeTo GPU backends (i.e., GemmForge and ChainForge) as shown here.
Then clone SeisSol with:
git clone https://github.com/SeisSol/SeisSol.git
cd SeisSol
git submodule update --init --recursive
To compile the GPU version of SeisSol on heisenbug, use the following cmake options
-DDEVICE_ARCH=sm_86 -DHOST_ARCH=hsw -DDEVICE_BACKEND=cuda -DPRECISION=single -DHIPSYCL_CUDA_PATH=$CUDA_HOME
As there is no queuing system on heisenbug, you need to make sure that nobody is running anything on the GPUs.
You can check that by running nvidia-smi
(it should return No running processes found
).
To run on one GPU (here with order 4, elastic), use simply:
export OMP_NUM_THREADS=1
export OMP_PLACES="cores"
export OMP_PROC_BIND=spread
./launch ./SeisSol_RelWithDebInfo_ssm_86_cuda_4_elastic ./parameters.par
launch is a simple bash helper script. It is generated by CMake, in the build directory).
On 2 ranks, use:
# Note that it is possible to increase OMP_NUM_THREADS
# This will speed up (the rare) portions of the code running only CPUs, e.g. the wiggle factor calculation
export OMP_NUM_THREADS=1
export OMP_PLACES="cores"
export OMP_PROC_BIND=spread
mpirun -n 2 --map-by ppr:1:numa:pe=2 --report-bindings ./launch ./SeisSol_RelWithDebInfo_ssm_86_cuda_4_elastic ./parameters.par