Build architectures
Currently, SeisSol needs information about the host architecture on which the code is going to run.
Besides setting the necessary compiler tuning variables (usually corresponding to -march=TARGET_ARCH -mtune=TARGET_ARCH
),
it also sets the code generators.
CPU architectures
|
Architecture |
Notes |
CPUs (examples) |
Usage |
---|---|---|---|---|
|
No architecture-specific optimizations |
Generates plain x86-64 instructions, without SIMD instructions like SSE/AVX/AMX etc. |
||
|
Intel Nehalem/Westmere architecture |
Generates SSE instructions (up to SSE 3). |
Intel Xeon v1/v0, Intel Core i3/i5/i7 ??? and 1??? |
|
|
Intel Sandy Bridge architecture |
Generates AVX instructions. |
Intel Xeon v2, Intel Core i3/i5/i7 2??? and 3??? |
|
|
Intel Haswell |
Generates AVX2 instructions. |
Intel Xeon v3, Intel Core i3/i5/i7/i9 4??? to 14??? (mostly), AMD Zen 1 to 3 |
Older Intel CPU clusters, clusters with AMD CPUs (up to 2023) |
|
Intel Skylake-X (including Skylake-SP) |
Generates AVX-512{F,CD,BW,DQ,VL} instructions. (NOTE: Skylake desktop processors are NOT included here, unless they contain an “X” in their name, such as e.g. i9 7800X) |
Intel Xeon v4 and onward (i.e. including the “metal”-branded Xeons), some Intel Core i9 models (check the Intel database), AMD Zen 4 |
Most CPU clusters, e.g. SuperMUC NG (Phase 1), Frontera |
|
Intel Knight’s Corner (Xeon Phi coprocessor) |
Generates Knight’s Corner-specific instructions. |
Intel Xeon Phi coprocessor |
(not known anymore) |
|
Intel Knight’s Landing (Xeon Phi, optionally as coprocessor) |
Generates AVX-512{F,CD,PF,ER} instructions. |
Intel Xeon Phi coprocessor, as well as |
LRZ CoolMUC 3 |
|
AMD Zen 1 |
Generates AVX2 instructions. For the libxsmm kernel generator, it is deemed equivalent to |
Ryzen 1xxx series |
|
|
AMD Zen 2 |
Generates AVX2 instructions. For the libxsmm kernel generator, it is deemed equivalent to |
Ryzen 3??? series, 7?2?, 8?2? series |
LUMI (CPU partition) |
|
AMD Zen 3 |
Generates AVX2 instructions. For the libxsmm kernel generator, it is deemed equivalent to |
Ryzen 5??? series, 7?3?, 8?3? series |
LUMI (GPU partition), Frontier (GPU partition) |
|
AMD Zen 4 |
Generates AVX512 instructions. For the libxsmm kernel generator, it is deemed equivalent to |
Ryzen 7?4? series, MI300A |
|
|
IBM PowerPC 9 |
|||
|
ARM ThunderX2 (ARM NEON) |
ARM ThunderX2 |
Isambard 2 |
|
|
Fujitsu A64FX (ARM SVE, 512 bits) |
Fujitsu A64FX |
Fugaku |
|
|
Dummy target for AARCH64 (with NEON) |
|||
|
Dummy target for AARCH64, ARM SVE with 128 bits length |
Needed e.g. for the Neoverse V2 CPU |
||
|
Dummy target for AARCH64, ARM SVE with 256 bits length |
|||
|
Dummy target for AARCH64, ARM SVE with 512 bits length |
|||
|
Dummy target for AARCH64, ARM SVE with 1024 bits length |
|||
|
Dummy target for AARCH64, ARM SVE with 2048 bits length |
|||
|
Apple M1 CPU |
|||
|
Apple M2 CPU |
GPU architectures
For GPUs, SeisSol supports two types of memory management on GPUs.
split: separate host and device buffers which are synchronized for e.g. IO
unified: combined host-device buffers which are transferred by the system as needed
As default, SeisSol will use unified host-device buffers by default on all systems where the CPU can freely access GPU memory, i.e. the Nvidia Superchips (e.g. GH100, e.g. GH200) and the AMD APUs (e.g. MI300A).
In all other cases, split host-device buffers will be used as default.
The following architectures are supported:
|
|
Architecture |
GPUs (examples) |
Memory default |
---|---|---|---|---|
|
|
Nvidia Pascal |
Nvidia P100 |
split |
|
|
Nvidia Pascal |
Nvidia Geforce 1000 series, Quadro P series |
split |
|
|
Nvidia Volta |
Nvidia V100 |
split |
|
|
Nvidia Turing |
Nvidia Geforce 2000 series, Quadro RTX series |
split |
|
|
Nvidia Ampere |
Nvidia A100 |
split |
|
|
Nvidia Ampere |
Nvidia Geforce 3000 series, Quadro RTX A series |
split |
|
|
Nvidia Lovelace |
Nvidia Geforce 4000 series, Quadro RTX Ada series |
split |
|
|
Nvidia Hopper |
Nvidia H100, H200 |
split; unified on GH superchip |
|
|
Nvidia Blackwell |
Nvidia B100, B200 |
split; unified on GB superchip |
|
|
AMD GCN 5 (Vega) |
AMD Instinct MI25, Radeon RX Vega 56, Radeon RX Vega 64 |
split [1] |
|
|
AMD GCN 5 (Vega) |
AMD Instinct MI50, Radeon VII |
split [1] |
|
|
AMD CDNA 1 |
AMD Instinct MI100X |
split [1] |
|
|
AMD CDNA 2 |
AMD Instinct MI210, MI250X |
split [1] |
|
|
AMD CDNA 3 |
AMD Instinct MI300A, MI300X |
split [1]; unified on MI300A |
|
|
AMD RDNA 1 |
AMD Radeon 5000 series |
split [1] |
|
|
AMD RDNA 2 |
AMD Radeon 6000 series |
split [2] |
|
|
AMD RDNA 3 |
AMD Radeon 7000 series |
split [2] |
|
|
Intel Ponte Vecchio |
Intel Data Center Max 1550 |
split |
Sources:
About AMD GPUs: for unified memory to perform well, you will need to set HSA_XNACK=1
.
For unsupported AMD GPU architectures (e.g. gfx90c
), you can proceed as follows:
compile for a compatible GPU architecture. In the case of
gfx90c
, your best choice isgfx900
(orgfx906
).run SeisSol with specifying the environment variable
HSA_OVERRIDE_GFX_VERSION
in accordance to the architecture you compiled against in the previous step. That is, you need to convertgfxAABC
to a version of the formAA.B.C
. E.g., if you compiled forgfx906
, you will need to setHSA_OVERRIDE_GFX_VERSION=9.0.6
. Letters become numbers, akin to the hexadecimal notation, i.e.gfx90a
becomes 9.0.10.