- Apr 07, 2022
-
-
Michael Kruse authored
In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123177
-
- Apr 06, 2022
-
-
Hansang Bae authored
Silenced compiler warnings after pushing the following change. https://reviews.llvm.org/D122107 Differential Revision: https://reviews.llvm.org/D123233
-
Hansang Bae authored
This change adds support for ompt_callback_dispatch with the new dispatch chunk type introduced in 5.2. Definitions of the new ompt_work_loop types were also added in the header file. Differential Revision: https://reviews.llvm.org/D122107
-
- Mar 31, 2022
-
-
Jonathan Peyton authored
-
Joachim Protze authored
Latest OpenMP spec says parallel_data is NULL for initial/implicit-task-end. We nevertheless need to cleanup the ParallelData here, as there is no other callback for the end of the implicit parallel region. We can use the reference stored in the TaskData. Reviewed By: dreachem Differential Revision: https://reviews.llvm.org/D114005
-
- Mar 29, 2022
-
-
Ron Lieberman authored
Differential Revision: https://reviews.llvm.org/D122658
-
Johannes Doerfert authored
-
Johannes Doerfert authored
-
Johannes Doerfert authored
If we decided to delete a mapping entry we did not act on it right away but first issued and waited for memory copies. In the meantime some other thread might reuse the entry. While there was some logic to avoid colliding on the actual "deletion" part, there were two races happening: 1) The data transfer back of the thread deleting the entry and the data transfer back of the thread taking over the entry raced. 2) The update to the shadow map happened regardless if the entry was actually reused by another thread which left the shadow map in a inconsistent state. To fix both issues we will now update the shadow map and delete the entry only if we are sure the thread is responsible for deletion, hence no other thread took over the entry and reused it. We also wait for a potential former data transfer from the device to finish before we issue another one that would race with it. Fixes https://github.com/llvm/llvm-project/issues/54216 Differential Revision: https://reviews.llvm.org/D121058
-
Johannes Doerfert authored
-
Johannes Doerfert authored
Inline assembly is scary but we need to support it for the OpenMP GPU device runtime. The new assumption expresses the fact that it may not have call semantics, that is, it will not call another function but simply perform an operation or side-effect. This is important for reachability in the presence of inline assembly. Differential Revision: https://reviews.llvm.org/D109986
-
- Mar 26, 2022
-
-
Shilei Tian authored
As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element. That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash. This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly. For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed. Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed *before* `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122014
-
Joseph Huber authored
This patch adds the necessary AMDGPU calling convention to the ctor / dtor kernels. These are fundamentally device kenels called by the host on image load. Without this calling convention information the AMDGPU plugin is unable to identify them. Depends on D122504 Fixes #54091 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122515
-
- Mar 25, 2022
-
-
Johannes Doerfert authored
This reverts commit b9fd8f34 as it accidentally contained a unit test change that is not finished (and unrelated).
-
Johannes Doerfert authored
-
Johannes Doerfert authored
-
Johannes Doerfert authored
This patch solves two problems with the `HostDataToTargetMap` (HDTT map) which caused races and crashes before: 1) Any access to the HDTT map needs to be exclusive access. This was not the case for the "dump table" traversals that could collide with updates by other threads. The new `Accessor` and `ProtectedObject` wrappers will ensure we have a hard time introducing similar races in the future. Note that we could allow multiple concurrent read-accesses but that feature can be added to the `Accessor` API later. 2) The elements of the HDTT map were `HostDataToTargetTy` objects which meant that they could be copied/moved/deleted as the map was changed. However, we sometimes kept pointers to these elements around after we gave up the map lock which caused potential races again. The new indirection through `HostDataToTargetMapKeyTy` will allows us to modify the map while keeping the (interesting part of the) entries valid. To offset potential cost we duplicate the ordering key of the entry which avoids an additional indirect lookup. We should replace more objects with "protected objects" as we go. Differential Revision: https://reviews.llvm.org/D121057
-
- Mar 22, 2022
-
-
Joseph Huber authored
The unroll pragma did not properly work as the loop bound was not known when we optimize the runtime and we then added a "unroll disable" metadata which prevented unrolling later when the bounds were known. For now we manually unroll to make sure up to 16 elements are handled nicely. This helps optimizations to look through the argument passing. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109164
-
- Mar 17, 2022
-
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D120849
-
- Mar 12, 2022
-
-
Jon Chesterfield authored
-
- Mar 10, 2022
-
-
Shilei Tian authored
-
- Mar 09, 2022
-
-
Shilei Tian authored
Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121322
-
Shilei Tian authored
This patch fixes the issue introduced in 14de0820 and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121308
-
Mark Dewing authored
In SupportAndFAQ.rst, add blank lines before and after a bullet list and sublist. This avoids an "Unepxected indentation" warning. In Runtimes.rst, adjust the suggestion for setting LIBOMPTARGET_INFO. The right shifts are not necessary as the bit mask values are already correct. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119595
-
Johannes Doerfert authored
The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors.
-
- Mar 08, 2022
-
-
Joseph Huber authored
-
Johannes Doerfert authored
An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all. Differential Revision: https://reviews.llvm.org/D120142
-
Johannes Doerfert authored
There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context. Differential Revision: https://reviews.llvm.org/D120089
-
Johannes Doerfert authored
Differential Revision: https://reviews.llvm.org/D121060
-
- Mar 07, 2022
-
-
Jonathan Peyton authored
Register constraint switched to "=q" which means very specifically (from https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints) > Any register accessible as rl. In 32-bit mode, a, b, c, and d; in 64-bit mode, any integer register. Older gcc versions (8.x and below) were trying to use esi or edi for the 8 bit flag variable, but it wound up displaying this error in the end: kmp_lock.cpp: In function ‘void __kmp_spin_backoff(kmp_backoff_t*)’: kmp_lock.cpp:2684:1: error: unsupported size for integer register Hence the correct restriction is "=q" instead of "=r". Fixes: https://github.com/llvm/llvm-project/issues/53309 Differential Revision: https://reviews.llvm.org/D120519
-
AndreyChurbanov authored
Before this patch task priorities were ignored, that was a valid implementation as the task priority is a hint according to OpenMP specification. Implemented shared list of sorted (high -> low) task deques one per task priority value. Tasks execution changed to first check if priority tasks ready for execution exist, and these tasks executed before others; otherwise usual tasks execution mechanics work. Differential Revision: https://reviews.llvm.org/D119676
-
Johannes Doerfert authored
This reverts commit ff50e81b as it broke the buildbots, see https://reviews.llvm.org/D121060#3362737.
-
Johannes Doerfert authored
Differential Revision: https://reviews.llvm.org/D121060
-
- Mar 06, 2022
-
-
James Beddek authored
Do the same as is done for NetBSD. Some compiler-rt/lib/builtins files call libm functions (e.g. fmaxl, fabs). Linking libomp with --rtlib=compiler-rt references these functions. Downstream report: https://bugs.gentoo.org/816831 Fixes: https://github.com/llvm/llvm-project/issues/51457
-
Shilei Tian authored
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055
-
- Mar 05, 2022
-
-
AndreyChurbanov authored
Differential Revision: https://reviews.llvm.org/D120671
-
- Mar 04, 2022
-
-
Joseph Huber authored
Libomptarget uses some shared variables to track certain internal stated in the runtime. This causes problems when we have code that contains no OpenMP kernels. These variables are normally initialized upon kernel entry, but if there are no kernels we will see no initialization. Currently we load the runtime into each source file when not running in LTO mode, so these variables will be erroneously considered undefined or dead and removed, causing miscompiles. This patch temporarily works around the most obvious case, but others still exhibit this problem. We will need to fix this more soundly later. Fixes #54208. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121007
-
- Mar 03, 2022
-
-
Aakanksha authored
Differential Revision: https://reviews.llvm.org/D120846
-
- Mar 02, 2022
-
-
Stanislav Mekhanoshin authored
This is target definition only. Differential Revision: https://reviews.llvm.org/D120688
-
- Mar 01, 2022
-
-
Malhar Jajoo authored
Essentially removed the "use omp_lib_kinds" statement and replaced it with import to maintain consistency (and avoid compilation error in case the omp_lib_kinds.mod file is not accessible) in header file. The import is required to access entities in host scoping unit. Differential Revision: https://reviews.llvm.org/D120707
-