Commits · 7fa7b0cbd8f8d43c2237b75423cd25e74edde820 · Lorenzo Albano / LLVM bpEVL

Apr 07, 2022

[libomptarget] Add device RTL to regression test dependencies. · 7fa7b0cb

Michael Kruse authored Apr 06, 2022

In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177

7fa7b0cb

Mar 17, 2022
- [AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp · e0b9364b
  Stanislav Mekhanoshin authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120849
```
  e0b9364b
Mar 06, 2022

[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` · 7f7c2c34

Shilei Tian authored Mar 05, 2022

`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055

7f7c2c34

Mar 03, 2022
- [AMDGPU] Add gfx1036 target · 84069581
  Aakanksha authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120846
```
  84069581
Feb 08, 2022

[OpenMP] Enable new driver tests for AMDGPU · f8ffac59

Joseph Huber authored Feb 08, 2022

This patch enables running the new driver tests for AMDGPU. Previously
this was disabled because some tests failed. This was only because the
new driver tests hadn't been listed as unsupported or expected to fail.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119240

f8ffac59

Feb 04, 2022

[OpenMP] Completely remove old device runtime · 034adaf5

Joseph Huber authored Feb 03, 2022

This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934

034adaf5

Feb 01, 2022

[OpenMP] Remove new driver tests for AMDGPU · 4d4587d5
Joseph Huber authored Jan 31, 2022
```
Some of the new driver tests are flaky on AMDGPU, remove for now.
```
4d4587d5

[Libomptarget] Run GPU offloading tests using the new drvier · 0ac799b5

Joseph Huber authored Jan 31, 2022

This patch adds a new target to the tests to run using the new driver as
the method for generating offloading code.

Depends on D116541

Differential Revision: https://reviews.llvm.org/D118637

0ac799b5

Jan 19, 2022
- [openmp][amdgpu] Disable tests on old runtime, enable tests on new one · ca84c43d
  Jon Chesterfield authored Jan 19, 2022
  
  ca84c43d
- [openmp][amdgpu] Temporarily disable tests on old runtime · e35c8f54
  Jon Chesterfield authored Jan 19, 2022
  
  e35c8f54
Jan 10, 2022

[openmp][amdgpu] Replace unsigned long with uint64_t · a74826d3

Jon Chesterfield authored Jan 10, 2022

Some types need to be 64 bit. Unsigned long is a hazard there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D116963

a74826d3

Dec 17, 2021
- [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches · 38af5b4f
  Jon Chesterfield authored Dec 17, 2021
  
  38af5b4f
- [openmp][amdgpu][nfc] Mark all external functions extern C to get type checking · 91dfb32f
  Jon Chesterfield authored Dec 17, 2021
  
  91dfb32f
- [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter · d3abb04e
  Carlo Bertolli authored Dec 17, 2021
```
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887
```
  d3abb04e
Dec 15, 2021

[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add... · d83dc4c6

Carlo Bertolli authored Dec 15, 2021

[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771

d83dc4c6

Dec 10, 2021

[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous · 28309c54

Carlo Bertolli authored Dec 10, 2021

and synchronous kernel launch implementations into a single
synchronous version.  This patch prepares the plugin for asynchronous
implementation by:

    Privatizing actual kernel launch code (valid in both cases) into
    an anonymous namespace base function (submitted at D115267)

    - Separating the control flow path of asynchronous and synchronous
      kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273

28309c54

Dec 09, 2021

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version · cc8dc5e2

Carlo Bertolli authored Dec 08, 2021

Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279

cc8dc5e2

Dec 08, 2021

Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version" · 14ff611f
Jon Chesterfield authored Dec 08, 2021
```
This reverts commit 6de698bf.
It didn't build in the dynamic_hsa configuration
```
14ff611f

[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version · 6de698bf

Carlo Bertolli authored Dec 07, 2021

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279

6de698bf

Dec 07, 2021

[NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch · d9b1d827

Carlo Bertolli authored Dec 07, 2021

At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version.
This patch prepares the plugin for asynchronous implementation by:
- Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function

Actual separation of kernel launch code (async vs sync) is a following patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115267

d9b1d827

[OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies · 21a51ceb

Ye Luo authored Dec 06, 2021

amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115198

21a51ceb

Dec 06, 2021
- [libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins · a05a0c3c
  Jon Chesterfield authored Dec 06, 2021
```
Analogous to the controls on building device runtimes

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115148
```
  a05a0c3c
- [openmp] Enable tests on new devicertl on amdgpu · 9e08c205
  Jon Chesterfield authored Dec 06, 2021
```
Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D114891
```
  9e08c205
Nov 29, 2021

OpenMP: Correctly query location for amdgpu-arch · 935abeaa

Matt Arsenault authored Nov 19, 2021

This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.

935abeaa

Nov 23, 2021

[openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments · ae5348a3

Jon Chesterfield authored Nov 22, 2021

OpenMP (compiler) does not currently request any implicit kernel
arguments. OpenMP (runtime) allocates and initialises a reasonable guess at
the implicit kernel arguments anyway.

This change makes the plugin check the number of explicit arguments, instead
of all arguments, and puts the pointer to hostcall buffer in both the current
location and at the offset expected when implicit arguments are added to the
metadata by D113538.

This is intended to keep things running while fixing the oversight in the
compiler (in D113538). Once that patch lands, and a following one marks
openmp kernels that use printf such that the backend emits an args element
with the right type (instead of hidden_node), the over-allocation can be
removed and the hardcoded 8*e+3 offset replaced with one read from the
.offset of the corresponding metadata element.

Reviewed By: estewart08

Differential Revision: https://reviews.llvm.org/D114274

ae5348a3

Nov 19, 2021

[openmp][amdgpu][nfc] Simplify implicit args handling · 04954824

Jon Chesterfield authored Nov 19, 2021

Removes a +x/-x pair on the only store/load of a variable
and deletes some nearby dead code. Also reduces the size of the implicit
struct to reflect the code currently emitted by clang.

Differential Revision: https://reviews.llvm.org/D114270

04954824

[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller · 9cdaf0b0
Jon Chesterfield authored Nov 19, 2021

9cdaf0b0

Oct 28, 2021

Revert "[libomptarget] Build DeviceRTL for amdgpu" · 6c7b203d
Jon Chesterfield authored Oct 28, 2021
```
 - more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb.
```
6c7b203d

[libomptarget] Build DeviceRTL for amdgpu · 33427fdb

Jon Chesterfield authored Oct 28, 2021

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

33427fdb

Oct 23, 2021

[libomptarget] Run GPU offloading tests on both new and old runtime · bf6f955f

Jon Chesterfield authored Oct 22, 2021

Implemented by patching python config instead of modifying all
the tests so that -generic and XFAIL work as usual. Expectation is for
this to be reverted once the old runtime is deleted.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D112225

bf6f955f

Oct 19, 2021

[OpenMP] Remove macro guards for device debugging · b1ce4549

Joseph Huber authored Oct 19, 2021

The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112083

b1ce4549

Oct 09, 2021
- [libomptarget][amdgpu][NFC] tweak a comment · d022f39d
  Ron Lieberman authored Oct 09, 2021
  
  d022f39d
Oct 07, 2021

[libomptarget] Reapply 2bc4d48a which was accidentally reverted · 1bc3a6e4
Jon Chesterfield authored Oct 07, 2021

1bc3a6e4

[libomptarget] Move device environment to shared header, remove divergence · 0c554a47

Jon Chesterfield authored Oct 07, 2021

Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069

0c554a47

Oct 01, 2021
- [libomptarget][amdgpu] Refactor memory pool collection · 05ba9ff6
  Jon Chesterfield authored Oct 01, 2021
  
  05ba9ff6
Sep 30, 2021

[libomptarget] Apply D110029 to amdgpu · b75a7481

Jon Chesterfield authored Sep 30, 2021

Use enum for execution mode.

This is partly a port from ROCm and partly a port from D110029. Attempted to
make the same choices as ROCm as far as comments etc go to reduce the merge
conflicts.

There is some cleanup warranted here - in particular I like the cuda patch
factoring out the comparisons into named variables - but I'd like to leave
that for a follow up patch, keeping this one minimal.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D110845

b75a7481

Sep 29, 2021

[libomptarget] [amdgpu] After a kernel dispatch packet is published, its... · 62262702

Dhruva Chakrabarti authored Sep 28, 2021

[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.

Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679

62262702

Sep 27, 2021
- [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal · 2bc4d48a
  Jon Chesterfield authored Sep 27, 2021
  
  2bc4d48a
- [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv · 738734f6
  Jon Chesterfield authored Sep 27, 2021
  
  738734f6
- [AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool · b1695c2e
  Pushpinder Singh authored Sep 24, 2021
```
Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513
```
  b1695c2e