Skip to content

Releases: ROCm/aomp

rocm-5.5.1

24 May 17:25
Compare
Choose a tag to compare

ROCm release v5.5.1

rocm-5.5.0

01 May 19:50
Compare
Choose a tag to compare

ROCm release v5.5.0

AOMP Release 17.0-2

28 Apr 22:15
Compare
Choose a tag to compare

These are the release notes for AOMP 17.0-2. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-2, the last trunk commit is 921b45a855f09afe99ea9c0c173794ee4ccd5872 on April 27, 2023. The last amd-only commit is ad7b5d7a69c62dab21332cba131054d2b8a713cc on April 26, 2023 . These commits forms a frozen branch now called "aomp-17.0-2". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-2.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 3rd AOMP release based on LLVM 17 development.
These are the changes from 17.0-1 to 17.0-2 include:

  • Changed gpurun to set value of both GPU_MAX_HW_QUEUES and LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES to 1 if there is shared use of GPU by multiple mpi ranks. Also, it is set to 1 ONLY if it was not already set by caller.
  • Added environment variables LIBOMPTARGET_AMDGPU_ KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to control how much time to wait in an active state for kernel completion and data transfer completion respectively. The default is 0 which means to wait indefinitely in blocked state. If set, and the specified timeout expires, the waiting runtime jumps to waiting for signal in blocked state.
  • Changed run_babelstream.sh to set LIBOMPTARGET_AMDGPU_KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to improve performance.
  • Fixed the amdgpu nextgen plugin to work for cov5 (code object version 5). The default code object version is cov4.
  • Fixed the amdgpu nextgen plugin to work with OMPT (OpenMP Tools environment).
  • Fixed the amdgpu nextgen plugin to work for multiple architectures supported in same image. Additional patches needed to support device clause on target region to properly offload to the correct gpu when using different architectures from the same vendor.

AOMP Release 17.0-1

14 Apr 13:57
Compare
Choose a tag to compare

These are the release notes for AOMP 17.0-1. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-1, the last trunk commit is 3712dd73a1d50b76624ee6a520be2b1ca94c02ee on April 11th, 2023. The last amd-only commit is
1d8def5772d16c64652d68daac1b12af99fe3770 on April 12th, 2023 . These commits forms a frozen branch now called "aomp-17.0-1". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-1.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 2nd AOMP release based on LLVM 17 development.
These are the changes from 17.0-0 to 17.0-1 include:

  • Switch to nextgen plugin as default. This has shown significant performance improvements. To revert to the old plugin set LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
  • Switch from hostrpc to hostexec. hostexec is a significant rewrite of hostrpc. The device hostexec_invoke is now written in OpenMP for portability to other platforms. The names of the wrapper (stub) to run a host function has changed to hostexec() and hostexec_<ReturnType>() . hostexec also uses a global variable to find the transfer payload buffer instead of AMD implicit kernel args. This will support portability of hostexec, printf, and fprintf to other platforms. The update to this device global is made with global variable services in the nextgen plugin.
  • An example on the use of hostexec to run MPI_Send and MPI_Recv in a target region is given. This example demonstrates how library owners can build a supplemental header file to enable transparent host execution of selected library functions within an OpenMP target regions with the same host interface. This eliminates the need for any source changes in the user code when host execution from a target region is desired. Before hostexec, users would typically have to end their target region, execute a host-only function, then start another target region. This feature significantly increases general purpose computing capabilities of OpenMP on GPGPU platforms.
  • OMPT target support is incomplete with the nextgen plugin. To use OMPT, set the environment variable LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
  • Set GPU_MAX_HW_QUEUES in gpurun to 1 when multiple ranks per GPU. This limits GPU concurrency when the GPU is already getting shared usage. This should only set if caller (of gpurun or mpirun) did not already set it. In other words, this should trust the user if they set a value. This will be fixed in next release. Also, OpenMP nextgen plugin does not use GPU_MAX_HW_QUEUES. It uses env variable LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES.
  • Critical regions created via the critical directive are now more efficient: by relaxing the semantics of locks and combining that with the use of acquire and release fences we can limit the flushing of the GPU caches to every time the lock is acquired instead of at every lock check.
  • When inlining functions called from the kernel, move allocas for their arguments in the kernel entry block instead of leaving them at launch point.
  • Respect environment variable to force synchronous target region executions. Available via OMPX_FORCE_SYNC_REGIONS=1.

Errata:

  • smoke test "schedule" occasionally fails with memory fault or wrong ordering
  • AMD code object version 5 does not work with nextgen plugin. When testing cov5, use LIBOMPTARGET_NEXTGEN_PLUGINS=OFF

rocm-5.4.4

22 Mar 19:21
Compare
Choose a tag to compare

ROCm release v5.4.4

AOMP Release 17.0-0

09 Mar 22:00
Compare
Choose a tag to compare

These are the release notes for AOMP 17.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-0, the last trunk commit is bd1f7c417fc04f93de6b7bbf8740351e58a90613 on March 5th, 2023. The last amd-only commit is f16add4badfa0f16d62ba025f0565a9e4475e37e on March 4th, 2023 . These commits forms a frozen branch now called "aomp-17.0-0". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-0.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.0 sources.
This is the 1st AOMP release based on LLVM 17 development.
These are the changes from 16.0-3 to 17.0-0 include:

  • Add support for amdclang, amdclang++, and amdflang
  • Updated build scripts for Kokkos (and updated to Kokkos v 3.7.00)
  • Support for multiple blocksizes in Xteam reduction (1024 limit).
  • A new execution mode BigJumpLoop for SPMD non-reduction kernels
  • Additional support for OMPT function "translate_time"
  • Added Centos-9 and SLES 15 SP4 rpms.
  • No longer support SLES15 SP1.

Errata:
Smoke test failures:

  • managed_memory: segfault, when 2+ devices are present

Hip example failure:

  • device-lib

OvO:

  • cpp/hierarchical_parallelism/reduction_add-complex_double/target__teams (timeout on gfx908)
  • cpp/hierarchical_parallelism/reduction_add-float/target_teams (timeout on gfx908)

rocm-5.4.3

07 Feb 16:50
Compare
Choose a tag to compare

ROCm release v5.4.3

rocm-5.4.2

13 Jan 16:15
Compare
Choose a tag to compare

ROCm release v5.4.2

rocm-5.4.1

15 Dec 16:28
Compare
Choose a tag to compare

ROCm release v5.4.1

AOMP Release 16.0-3

08 Dec 23:25
Compare
Choose a tag to compare

These are the release notes for AOMP 16.0-3. This release uses modifications to the LLVM development trunk called the "amd-stg-open" branch. This is found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. Some AMD modifications are experimental and/or under review for the LLVM upstream mono-repo. The AOMP release is a snapshot of amd-stg-open and supporting repositories to build various components.

For AOMP 16.0-3, the last trunk commit is 11e86868c1a1ee67a1d88ef84b68193d06dc996 on Nov 14, 2022. This is the 4th AOMP release for LLVM 16 development. The last amd-only commit is b642bb5cf84bbbdcc3e8748c5ceeb72c7bb07144 on Dec 2, 2022. This forms a frozen branch now called "aomp-16.0-3". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-16.0-3.

AOMP is a "standalone" build of all necessary ROCm components with the exception of the kernel module and libdrm. The non llvm-project components for this release were built with ROCM 5.4.0 sources.

These are the changes from 16.0-2 to 16.0-3 include:

  • Build includes gfx90c, gfx1035, and gfx1036.
  • Fix to rocm_agent_enumerator to correctly identify gfx90c.
  • Fix issue #435 "abs undefined within device block #435".
  • More enhancements to xteam reductions .
  • Ignore map clause option with USM.
  • Additional support for OMPT functions "get_device_time" and "get_record_type".
  • NUM_QUEUES_PER_DEVICE default to 1.
  • Fixed clang-build-select-link to honor -fdisable-host-devmem.
  • Fixed openmp lib-debug build overwriting release libraries/plugins.
  • Updated cmake version to 3.22.1.
  • Added Ubuntu 22.04 package.

Errata:
(potential regressions from 16.0-2):

  • Smoke test failures:
    clang-337336 - Performance decrease, may cause test to timeout after 1 min. 16.0-2 showed 30-40 secs.

(potential regressions from 16.0-1):

  • Smoke test failures (issue at -O0):
    clang-ifaces: core dump (gfx908)
    clang-337336: core dump gfx908)
    clang-325070: core dump (gfx908)

(potential regressions from 16.0-0):

  • Performance decrease with lulesh
  • Performance decrease with Nekbone (performance improved in 16.0-3, but still not at 16.0-0 levels.)
  • Smoke test failures:
    flang-315870: (resolved by building this test case with cov5)
    managed_memory: segfault, when 2+ devices are present
  • Hip example failure:
    device-lib