- Apr 07, 2022
-
-
Michael Kruse authored
In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123177
-
- Mar 17, 2022
-
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D120849
-
- Mar 06, 2022
-
-
Shilei Tian authored
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055
-
- Mar 03, 2022
-
-
Aakanksha authored
Differential Revision: https://reviews.llvm.org/D120846
-
- Feb 08, 2022
-
-
Joseph Huber authored
This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240
-
- Feb 04, 2022
-
-
Joseph Huber authored
This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934
-
- Feb 01, 2022
-
-
Joseph Huber authored
Some of the new driver tests are flaky on AMDGPU, remove for now.
-
Joseph Huber authored
This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637
-
- Jan 19, 2022
-
-
Jon Chesterfield authored
-
Jon Chesterfield authored
-
- Jan 10, 2022
-
-
Jon Chesterfield authored
Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963
-
- Dec 17, 2021
-
-
Jon Chesterfield authored
-
Jon Chesterfield authored
-
Carlo Bertolli authored
I missed the async info parameter in the first version of this API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115887
-
- Dec 15, 2021
-
-
Carlo Bertolli authored
[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115771
-
- Dec 10, 2021
-
-
Carlo Bertolli authored
and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267) - Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115273
-
- Dec 09, 2021
-
-
Carlo Bertolli authored
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279
-
- Dec 08, 2021
-
-
Jon Chesterfield authored
This reverts commit 6de698bf. It didn't build in the dynamic_hsa configuration
-
Carlo Bertolli authored
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279
-
- Dec 07, 2021
-
-
Carlo Bertolli authored
At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: - Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function Actual separation of kernel launch code (async vs sync) is a following patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115267
-
Ye Luo authored
amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115198
-
- Dec 06, 2021
-
-
Jon Chesterfield authored
Analogous to the controls on building device runtimes Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115148
-
Jon Chesterfield authored
Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D114891
-
- Nov 29, 2021
-
-
Matt Arsenault authored
This was trying to figure out the build path for amdgpu-arch, and making assumptions about where it is which were not working on my system. Whether a standalone build or not, we should have a proper imported target to get the location from.
-
- Nov 23, 2021
-
-
Jon Chesterfield authored
OpenMP (compiler) does not currently request any implicit kernel arguments. OpenMP (runtime) allocates and initialises a reasonable guess at the implicit kernel arguments anyway. This change makes the plugin check the number of explicit arguments, instead of all arguments, and puts the pointer to hostcall buffer in both the current location and at the offset expected when implicit arguments are added to the metadata by D113538. This is intended to keep things running while fixing the oversight in the compiler (in D113538). Once that patch lands, and a following one marks openmp kernels that use printf such that the backend emits an args element with the right type (instead of hidden_node), the over-allocation can be removed and the hardcoded 8*e+3 offset replaced with one read from the .offset of the corresponding metadata element. Reviewed By: estewart08 Differential Revision: https://reviews.llvm.org/D114274
-
- Nov 19, 2021
-
-
Jon Chesterfield authored
Removes a +x/-x pair on the only store/load of a variable and deletes some nearby dead code. Also reduces the size of the implicit struct to reflect the code currently emitted by clang. Differential Revision: https://reviews.llvm.org/D114270
-
Jon Chesterfield authored
-
- Oct 28, 2021
-
-
Jon Chesterfield authored
- more tests failing on CI than failed locally when writing this patch This reverts commit 33427fdb.
-
Jon Chesterfield authored
Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227
-
- Oct 23, 2021
-
-
Jon Chesterfield authored
Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225
-
- Oct 19, 2021
-
-
Joseph Huber authored
The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083
-
- Oct 09, 2021
-
-
Ron Lieberman authored
-
- Oct 07, 2021
-
-
Jon Chesterfield authored
-
Jon Chesterfield authored
Follow on to D110006, related to D110957 Where implementations have diverged this resolves to match the new DeviceRTL - replaces definitions of this struct in deviceRTL and plugins with include - changes the dynamic_shared_size field from D110006 to 32 bits - handles stdint being unavailable in DeviceRTL - adds a zero initializer for the field to amdgpu - moves the extern declaration for deviceRTL to target_interface (omptarget.h is more natural, but doesn't work due to include order with debug.h) - Renames the fields everywhere to match the LLVM format used in DeviceRTL - Makes debug_level uint32_t everywhere (previously sometimes int32_t) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111069
-
- Oct 01, 2021
-
-
Jon Chesterfield authored
-
- Sep 30, 2021
-
-
Jon Chesterfield authored
Use enum for execution mode. This is partly a port from ROCm and partly a port from D110029. Attempted to make the same choices as ROCm as far as comments etc go to reduce the merge conflicts. There is some cleanup warranted here - in particular I like the cuda patch factoring out the comparisons into named variables - but I'd like to leave that for a follow up patch, keeping this one minimal. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D110845
-
- Sep 29, 2021
-
-
Dhruva Chakrabarti authored
[libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed. Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye) The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110679
-
- Sep 27, 2021
-
-
Jon Chesterfield authored
-
Jon Chesterfield authored
-
Pushpinder Singh authored
Keeping all the checks in one place for future simplification. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D110513
-