- Mar 06, 2022
-
-
Shilei Tian authored
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055
-
- Mar 05, 2022
-
-
AndreyChurbanov authored
Differential Revision: https://reviews.llvm.org/D120671
-
- Mar 04, 2022
-
-
Joseph Huber authored
Libomptarget uses some shared variables to track certain internal stated in the runtime. This causes problems when we have code that contains no OpenMP kernels. These variables are normally initialized upon kernel entry, but if there are no kernels we will see no initialization. Currently we load the runtime into each source file when not running in LTO mode, so these variables will be erroneously considered undefined or dead and removed, causing miscompiles. This patch temporarily works around the most obvious case, but others still exhibit this problem. We will need to fix this more soundly later. Fixes #54208. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121007
-
- Mar 03, 2022
-
-
Aakanksha authored
Differential Revision: https://reviews.llvm.org/D120846
-
- Mar 02, 2022
-
-
Stanislav Mekhanoshin authored
This is target definition only. Differential Revision: https://reviews.llvm.org/D120688
-
- Mar 01, 2022
-
-
Malhar Jajoo authored
Essentially removed the "use omp_lib_kinds" statement and replaced it with import to maintain consistency (and avoid compilation error in case the omp_lib_kinds.mod file is not accessible) in header file. The import is required to access entities in host scoping unit. Differential Revision: https://reviews.llvm.org/D120707
-
- Feb 23, 2022
-
-
Shilei Tian authored
-
Joseph Huber authored
-
- Feb 18, 2022
-
-
Carlo Bertolli authored
[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct. This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119968
-
Joseph Huber authored
The runtime uses thread state values to indicate when we use an ICV or are in nested parallelism. This is done for OpenMP correctness, but it not needed in the majority of cases. The new flag added is `-fopenmp-assume-no-thread-state`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120106
-
- Feb 17, 2022
-
-
Shilei Tian authored
`bug49334.cpp` has one issue that causes flaky result reported in #53730. The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait` it is directly read and written (via `+=`). Fixes #53730. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119988
-
- Feb 16, 2022
-
-
Johannes Doerfert authored
The `IsSPMD` global can only be read by threads other than the main thread *after* initialization is complete. To allow usage of `mapping::getBlockSize` before initialization is done, we can pass the `IsSPMD` state explicitly. This is similar to other APIs that take `IsSPMD` explicitly to avoid such a race, e.g., `mapping::isInitialThreadInLevel0(IsSPMD)` Fixes https://github.com/llvm/llvm-project/issues/53857
-
- Feb 15, 2022
-
-
Joseph Huber authored
This patch adds a new target to the OpenMP CPU offloading tests. This tests the usage of the new driver for CPU offloading. If this all works then we can move to transition to the new driver as the default. Depends on D119613 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119736
-
- Feb 14, 2022
-
-
Joseph Huber authored
Currently whenever we compile the device runtime we get the following 'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue' is not defined [-Wundefined-inline]' warning. This can be silenced by removing the constexpr attribute for this function. Doing this doesn't change the generated bitcode at all but prevents the screen from getting filled with warnings whenver we build the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119747
-
Jonathan Peyton authored
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI compiler. Fixup flag detection and compiler ID detection in CMake. Older CMake's detect IntelLLVM as Clang. Fix compiler warnings. Fixup many of the tests to have non-empty parallel regions as they are elided by oneAPI compiler.
-
- Feb 13, 2022
-
-
Shilei Tian authored
This patch fixes the issue that the for loop in `applyToShadowMapEntries` is infinite because `Itr` is not incremented in `CB`. Fixes #53727. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119471
-
- Feb 12, 2022
-
-
AndreyChurbanov authored
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads. Thus number of worker hidden helper threads corresponds to user request, main thread of helper team excluded as it does not participate in actual work. This also fixes divide-by-0 issue in the code. Fixes #48656 Differential Revision: https://reviews.llvm.org/D119586
-
- Feb 11, 2022
-
-
AndreyChurbanov authored
Fixed mistaken iterations distribution between different target regions. Differential Revision: https://reviews.llvm.org/D118393
-
Shilei Tian authored
`bug49334.cpp` directly uses `!=` to compare two floating point values, which is almost wrong. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D119485
-
Shilei Tian authored
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119313
-
- Feb 10, 2022
-
-
Ye Luo authored
Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119478
-
Shilei Tian authored
This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311
-
- Feb 09, 2022
-
-
Joseph Huber authored
The 'bug49779.cpp' test has been failing recently. This is because the runtime is sufficiently complex when using nested parallelism without optimizations that the CUDA tools cannot statically determine the stack size. Because of this the kernel can exceed the thread stack size and crash. Work around this using the 'LIBOMPTARGET_STACK_SIZE' environment variable and add an FAQ entry for this situation. Fixes #53670 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D119357
-
Jonathan Peyton authored
MSVC does not support variable length arrays. Replace with KMP_ALLOCA which is already used in the same file for stack-allocated variables.
-
- Feb 08, 2022
-
-
Joseph Huber authored
This patch manually adds the runtime include files to the list of dependencies when we build the bitcode runtime library. Previously if only the header was changed we would not recompile the source files. The solution used here isn't optimal because every source file not has a dependency on each header file regardless of if it was actually used by that file. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D119254
-
Joseph Huber authored
This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240
-
- Feb 07, 2022
-
-
Joseph Huber authored
This patch replaces the ValueRAII pointer with a default 'nullptr' value. Previously this was initialized as a reference to an existing variable. The use of this variable caused overhead as the compiler could not look through the uses and determine that it was unused if 'Active' was not set. Because of this accesses to the variable would be left in the runtime once compiled. Fixes #53641 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119187
-
Igor Kirillov authored
Differential Revision: https://reviews.llvm.org/D118988
-
- Feb 04, 2022
-
-
Joseph Huber authored
This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934
-
Joseph Huber authored
Summary; This test should pass now with AMDGPU. Previously the symbols were hidden and would fail when read.
-
- Feb 02, 2022
-
-
Tom Stellard authored
-
- Feb 01, 2022
-
-
Jon Chesterfield authored
This seems to be the root cause of hangs on amdgpu. Reverting while investigating. This reverts commit 7b9844cc.
-
Jon Chesterfield authored
-
Johannes Doerfert authored
Due to num_threads (probably also other reasons) we cannot assume explicit barriers are always executed by all threads in an aligned fashion. We can optimize them if that property can be proven but that is different.
-
Johannes Doerfert authored
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002
-
Joseph Huber authored
Some of the new driver tests are flaky on AMDGPU, remove for now.
-
Joseph Huber authored
This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637
-
- Jan 31, 2022
-
-
Joachim Protze authored
Temporary solution for #53467, since debian test machines do not support DWARF v5.
-
Joseph Huber authored
This patch changes the error message to instead mention the documentation page for the debugging options provided by libomptarget and the bitcode runtimes. Add some extra information to the documentation to help users more quickly identify debugging resources. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118626
-
Joseph Huber authored
Reduces the shared memory size used for globalization to 512 bytes from 2048 to reduce the pressure on shared memory. This patch ado adds a debug mesage to indicate when the shared memory was insufficient. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118625
-