- Feb 10, 2022
-
-
Ye Luo authored
Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119478
-
Shilei Tian authored
This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311
-
- Feb 09, 2022
-
-
Joseph Huber authored
The 'bug49779.cpp' test has been failing recently. This is because the runtime is sufficiently complex when using nested parallelism without optimizations that the CUDA tools cannot statically determine the stack size. Because of this the kernel can exceed the thread stack size and crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment variable and add an FAQ entry for this situation. Fixes #53670 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119357
-
- Feb 08, 2022
-
-
Joseph Huber authored
This patch manually adds the runtime include files to the list of dependencies when we build the bitcode runtime library. Previously if only the header was changed we would not recompile the source files. The solution used here isn't optimal because every source file not has a dependency on each header file regardless of if it was actually used by that file. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D119254
-
Joseph Huber authored
This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240
-
- Feb 07, 2022
-
-
Joseph Huber authored
This patch replaces the ValueRAII pointer with a default 'nullptr' value. Previously this was initialized as a reference to an existing variable. The use of this variable caused overhead as the compiler could not look through the uses and determine that it was unused if 'Active' was not set. Because of this accesses to the variable would be left in the runtime once compiled. Fixes #53641 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119187
-
- Feb 04, 2022
-
-
Joseph Huber authored
This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934
-
Joseph Huber authored
Summary; This test should pass now with AMDGPU. Previously the symbols were hidden and would fail when read.
-
- Feb 01, 2022
-
-
Jon Chesterfield authored
This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit 7b9844cc.
-
Jon Chesterfield authored
-
Johannes Doerfert authored
Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.
-
Joseph Huber authored
Some of the new driver tests are flaky on AMDGPU, remove for now.
-
Joseph Huber authored
This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637
-
- Jan 31, 2022
-
-
Joseph Huber authored
This patch changes the error message to instead mention the documentation page for the debugging options provided by libomptarget and the bitcode runtimes. Add some extra information to the documentation to help users more quickly identify debugging resources. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118626
-
Joseph Huber authored
Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared memory. This patch ado adds a debug mesage to indicate when the shared memory was insufficient. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118625
-
Jon Chesterfield authored
Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493
-
Jon Chesterfield authored
Failed some buildbots, bad assumptions about structure of install path This reverts commit a80d5c34.
-
Jon Chesterfield authored
Openmp executables need to find libomp and libomptarget at runtime. This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change that to set the expected location of the openmp libraries in the install tree. Whether rpath means rpath or runpath is system dependent. The attached test shows that the Wl,--disable-new-dtags control interacts correctly with this feature. The implicit rpath field is appended to any user specified ones which is ideal. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D118493
-
- Jan 29, 2022
-
-
Ye Luo authored
Fully respect LIBOMPTARGET_BUILD_NVPTX_BCLIB. There is no CUDA toolchain dependency. Complement D118268. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118522
-
- Jan 28, 2022
-
-
Ron Lieberman authored
This reverts commit 27c799ec.
-
Joseph Huber authored
If we have a broken assumption we want to print a message to the user. If the assumption is broken by many threads in many teams this can become a problem. To avoid it we use a hash that tracks if a broken assumption has (likely) been printed and avoid printing it again. This is not fool proof and has some caveats that might cause problems in the future (see comment) but it should improve the situation considerably for now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112156
-
- Jan 27, 2022
-
-
Johannes Doerfert authored
IdentTy objects are useful for debugging and profiling so we want to keep them around in more places, especially those that have a large impact on performance, e.g., everything related to state. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112494
-
Sri Hari Krishna Narayanan authored
This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward. Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D106674
-
- Jan 26, 2022
-
-
Jon Chesterfield authored
The old runtime is not tested by CI. Disable the build prior to the llvm-14 branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118268
-
- Jan 21, 2022
-
-
Joseph Huber authored
This patch changes the visibility for all construct in the new device RTL to be hidden by default. This is done after the changes introduced in D117806 changed the visibility from being hidden by default for all device compilations. This asserts that the visibility for the device runtime library will be hidden except for the internal environment variable. This is done to aid optimization and linking of the device library. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117807
-
- Jan 20, 2022
-
-
Johannes Doerfert authored
In the OpenMC app we saw `omp target update` spending an awful lot of time in the shadow map traversal without ever doing any update there. There are two cases that allow us to avoid the traversal completely. The simplest thing is that small updates cannot (reasonably) contain an attached pointer part. The other case requires to track in the mapping table if an entry might contain an attached pointer as part. Given that we have a single location shadow map entries are created, the latter is actually fairly easy as well. Differential Revision: https://reviews.llvm.org/D113124
-
Johannes Doerfert authored
Atomic handling of map clauses was introduced to comply with the OpenMP standard (see D104418). However, many apps won't need this feature which can be costly in certain situations. To allow for applications to opt-out we now introduce the `LIBOMPTARGET_MAP_FORCE_ATOMIC` environment flag that voids the atomicity guarantee of the standard for map clauses again, shifting the burden to the user. This patch also de-duplicates the code that introduces the events used to enforce atomicity as a cleanup. Differential Revision: https://reviews.llvm.org/D117627
-
Joseph Huber authored
The OpenMP offloading libraries are built with fixed triples and linked in during compile time. This would cause un-helpful errors if the user passed in the wrong expansion of the triple used for the bitcode library. because we only support these triples for OpenMP offloading we can normalize them to the full verion used in the bitcode library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D117634
-
- Jan 19, 2022
-
-
Jon Chesterfield authored
Previously, we sometimes pass fopenmp-targets=nvptx64-nvidia-cuda-newRTL Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D117715
-
Jon Chesterfield authored
-
Jon Chesterfield authored
-
Jon Chesterfield authored
-
Joseph Huber authored
After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526
-
- Jan 18, 2022
-
-
Joseph Huber authored
Reverting to investigate break on AMDGPU. This reverts commit 0203ff19.
-
Joseph Huber authored
After the changes in D117362 made variables declared inside of a target declare directive visible outside the plugin, some variables inside the runtime were given visiblity that conflicted with their address space type. This caused problems when shared or local memory was made externally visible. This patch fixes this issue by making these varialbes static within the module, therefore limiting their visibility to being internal. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117526
-
- Jan 17, 2022
-
-
Joseph Huber authored
This patch adds the `cold` attribute to the keepAlive functions in the RTL. This dummy function exists to keep certain RTL calls alive without them being optimized out, but it is never called and can be declared cold. This also helps some erroneous remarks being given on this function because it has weak linkage and cannot be made internal. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D117513
-
- Jan 13, 2022
-
-
Jon Chesterfield authored
Fixes github issues/52910 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D117230
-
Joseph Huber authored
This patch adds the `weak` identifier to the openmp device environment variable. The changes introduced in https://reviews.llvm.org/D117211 result in multiply defined symbols. Because the symbol is potentially included multiple times for each offloading file we will get symbol colisions, and because it needs to have external visiblity it should be weak. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D117231
-
Jon Chesterfield authored
D97446 changed the behaviour of 'used'. Compensate. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D117211
-
- Jan 10, 2022
-
-
Jon Chesterfield authored
Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963
-