Commits · 7fa7b0cbd8f8d43c2237b75423cd25e74edde820 · Lorenzo Albano / LLVM bpEVL

Apr 07, 2022

[libomptarget] Add device RTL to regression test dependencies. · 7fa7b0cb

Michael Kruse authored Apr 06, 2022

In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177

7fa7b0cb

Mar 29, 2022

[libomptarget] x86 offloading fails map_back_race.cpp intermittently · 95eac472
Ron Lieberman authored Mar 29, 2022
```
Differential Revision: https://reviews.llvm.org/D122658
```
95eac472
[OpenMP] The test does not have check lines · b803f069
Johannes Doerfert authored Mar 29, 2022

b803f069
[OpenMP][FIX] Use clang++ for the C++ test case · b309bdb9
Johannes Doerfert authored Mar 28, 2022

b309bdb9

[OpenMP][FIX] Avoid races in the handling of to be deleted mapping entries · b3161268

Johannes Doerfert authored Mar 02, 2022

If we decided to delete a mapping entry we did not act on it right away
but first issued and waited for memory copies. In the meantime some
other thread might reuse the entry. While there was some logic to avoid
colliding on the actual "deletion" part, there were two races happening:

1) The data transfer back of the thread deleting the entry and
the data transfer back of the thread taking over the entry raced.
2) The update to the shadow map happened regardless if the entry was
actually reused by another thread which left the shadow map in a
inconsistent state.

To fix both issues we will now update the shadow map and delete the
entry only if we are sure the thread is responsible for deletion, hence
no other thread took over the entry and reused it. We also wait for a
potential former data transfer from the device to finish before we issue
another one that would race with it.

Fixes https://github.com/llvm/llvm-project/issues/54216

Differential Revision: https://reviews.llvm.org/D121058

b3161268

[OpenMP][NFC] Add missing virtual destructor to silence warning · ba93e4e3
Johannes Doerfert authored Mar 25, 2022

ba93e4e3

[Attributor][OpenMP] Add assumption for non-call assembly instructions · 7df2eba7

Johannes Doerfert authored Sep 11, 2021

Inline assembly is scary but we need to support it for the OpenMP GPU
device runtime. The new assumption expresses the fact that it may not
have call semantics, that is, it will not call another function but
simply perform an operation or side-effect. This is important for
reachability in the presence of inline assembly.

Differential Revision: https://reviews.llvm.org/D109986

7df2eba7

Mar 26, 2022

[OpenMP][CUDA] Fix potential program crash caused by double free resources · 545fcc3d

Shilei Tian authored Mar 25, 2022

As we mentioned in the code comments for function `ResourcePoolTy::release`,
at some point there could be two identical resources on the two sides of `Next`
mark. It is usually not an issue, unless the following case:
1. Some resources are not returned.
2. We need to iterate the pool and free the element.

That will cause double free, which is the case for event pool. Since we don't release
events hold by the data map, it can happen that the `Next` mark is not reset, and
we have two identical items in the pool. When the pool is destroyed, we will call
`cuEventDestroy` twice on the same event. In the best case, we can only observe
CUDA errors. In the worst case, it can cause internal failures in CUDART and further
crash.

This patch fixes the issue by tracking all resources that have been given using
an `unordered_set`. We don't remove it when a resource is returned. When the pool
is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make
sure that the set contains all resources allocated from the device. We just need
to iterate the set and free the resource accordingly.

For now, only event pool is set to use it. Stream pool is not because we can make
sure all streams are returned when the plugin is destroyed.

Someone might be wondering, why don't we release all events hold in the data map.
That is because, plugins are determined to be destroyed *before* `libomptarget`.
If we can somehow make the plugin outlast `libomptarget`, life will be much
easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122014

545fcc3d

[OpenMP] Add AMDGPU calling convention to ctor / dtor functions · 9d3550c5

Joseph Huber authored Mar 25, 2022

This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.

Depends on D122504

Fixes #54091

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D122515

9d3550c5

Mar 25, 2022

Revert "[OpenMP][NFC] Add missing virtual destructor to silence warning" · 6c2be885
Johannes Doerfert authored Mar 25, 2022
```
This reverts commit b9fd8f34 as it
accidentally contained a unit test change that is not finished (and
unrelated).
```
6c2be885
[OpenMP][FIX] Repair ExclusiveAccess move semantic snafu · 7dfad948
Johannes Doerfert authored Mar 25, 2022

7dfad948
[OpenMP][NFC] Add missing virtual destructor to silence warning · b9fd8f34
Johannes Doerfert authored Mar 25, 2022

b9fd8f34

[OpenMP][FIX] Ensure exclusive access to the HDTT map · 4e34f061

Johannes Doerfert authored Mar 05, 2022

This patch solves two problems with the `HostDataToTargetMap` (HDTT
map) which caused races and crashes before:

1) Any access to the HDTT map needs to be exclusive access. This was not
the case for the "dump table" traversals that could collide with
updates by other threads. The new `Accessor` and `ProtectedObject`
wrappers will ensure we have a hard time introducing similar races in
the future. Note that we could allow multiple concurrent
read-accesses but that feature can be added to the `Accessor` API
later.
2) The elements of the HDTT map were `HostDataToTargetTy` objects which
meant that they could be copied/moved/deleted as the map was changed.
However, we sometimes kept pointers to these elements around after we
gave up the map lock which caused potential races again. The new
indirection through `HostDataToTargetMapKeyTy` will allows us to
modify the map while keeping the (interesting part of the) entries
valid. To offset potential cost we duplicate the ordering key of the
entry which avoids an additional indirect lookup.

We should replace more objects with "protected objects" as we go.

Differential Revision: https://reviews.llvm.org/D121057

4e34f061

Mar 22, 2022

[OpenMP] Manually unroll the argument copy loop · a619072c

Joseph Huber authored Mar 21, 2022

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109164

a619072c

Mar 17, 2022
- [AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp · e0b9364b
  Stanislav Mekhanoshin authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120849
```
  e0b9364b
Mar 12, 2022
- [nfc][openmp] Swap arguments to remove noise from upcoming diff · 75779435
  Jon Chesterfield authored Mar 11, 2022
  
  75779435
Mar 10, 2022
- [OpenMP][CUDA] Fix the check of `setContext` · f6639a42
  Shilei Tian authored Mar 09, 2022
  
  f6639a42
Mar 09, 2022

[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly · 39d3283a

Shilei Tian authored Mar 09, 2022

Currently we set ccontext everywhere accordingly, but that causes many
unnecessary function calls. For example, in the resource pool, if we need to
resize the pool, we need to get from allocator. Each call to allocate sets the
current context once, which is unnecessary. In this patch, we set the context
only in the entry interface functions, if needed. Actually in the best way this
should be implemented via RAII, but since `cuCtxSetCurrent` could return error,
and we don't use exception, we can't stop the execution if RAII fails.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121322

39d3283a

[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten · 5105c7cd

Shilei Tian authored Mar 09, 2022

This patch fixes the issue introduced in 14de0820 and D120089, that
if dynamic libraries are used, the `CUmodule` array could be overwritten.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121308

5105c7cd

[OpenMP][FIX] Ensure the modules vector is filled as others are · 14de0820

Johannes Doerfert authored Mar 08, 2022

The modules vector was for some reason special which could lead to it
not being of the same size (=num devices). Easiest solution is to treat
it like we do all the other vectors.

14de0820

Mar 08, 2022

[OpenMP][CUDA] Use one event pool per device · 1660288b

Johannes Doerfert authored Feb 18, 2022

An event pool, similar to the stream pool, needs to be kept per device.
For one, events are associated with cuda contexts which means we cannot
destroy the former after the latter. Also, CUDA documentation states
streams and events need to be associated with the same context, which
we did not ensure at all.

Differential Revision: https://reviews.llvm.org/D120142

1660288b

[OpenMP] Allow to explicitly deinitialize device resources · 10aa83ff

Johannes Doerfert authored Feb 17, 2022

There are two problems this patch tries to address:
1) We currently free resources in a random order wrt. plugin and
   libomptarget destruction. This patch should ensure the CUDA plugin
   is less fragile if something during the deinitialization goes wrong.
2) We need to support (hard) pause runtime calls eventually. This patch
   allows us to free all associated resources, though we cannot
   reinitialize the device yet.

Follow up patch will associate one event pool per device/context.

Differential Revision: https://reviews.llvm.org/D120089

10aa83ff

[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible · 307bbd3c
Johannes Doerfert authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D121060
```
307bbd3c

Mar 07, 2022
- Revert "[OpenMP][NFCI] Use RAII lock guards in libomptarget where possible" · 7ead7e90
  Johannes Doerfert authored Mar 06, 2022
```
This reverts commit ff50e81b as it broke
the buildbots, see https://reviews.llvm.org/D121060#3362737.
```
  7ead7e90
- [OpenMP][NFCI] Use RAII lock guards in libomptarget where possible · ff50e81b
  Johannes Doerfert authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D121060
```
  ff50e81b
Mar 06, 2022

[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` · 7f7c2c34

Shilei Tian authored Mar 05, 2022

`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055

7f7c2c34

Mar 04, 2022

[Libomptarget] Work around bug in initialization of libomptarget · e2dcc221

Joseph Huber authored Mar 04, 2022

Libomptarget uses some shared variables to track certain internal stated
in the runtime. This causes problems when we have code that contains no
OpenMP kernels. These variables are normally initialized upon kernel
entry, but if there are no kernels we will see no initialization.
Currently we load the runtime into each source file when not running in
LTO mode, so these variables will be erroneously considered undefined or
dead and removed, causing miscompiles. This patch temporarily works
around the most obvious case, but others still exhibit this problem. We
will need to fix this more soundly later.

Fixes #54208.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121007

e2dcc221

Mar 03, 2022
- [AMDGPU] Add gfx1036 target · 84069581
  Aakanksha authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120846
```
  84069581
Mar 02, 2022

[AMDGPU] Add gfx940 target · 2e2e64df

Stanislav Mekhanoshin authored Feb 28, 2022

This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688

2e2e64df

Feb 23, 2022
- [OpenMP][Offloading] Change N back to 256 in bug49334.cpp · 75812e77
  Shilei Tian authored Feb 23, 2022
  
  75812e77
- [Libomptarget][NFC} Fix missing newline in error message · 5dd0c396
  Joseph Huber authored Feb 23, 2022
  
  5dd0c396
Feb 18, 2022

[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after... · 7b731f4d

Carlo Bertolli authored Feb 18, 2022

[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed

When using asynchronous plugin calls, shadow pointer restore could happen before the D2H copy for the entire struct has completed, effectively leaving a device pointer in a host struct.
This patch fixes the problem by delaying restore's to after a synchronization happens (target regions) and by calling early synchronization (target update).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119968

7b731f4d

[OpenMP] Add flag for disabling thread state in runtime · 0870a4f5

Joseph Huber authored Feb 17, 2022

The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106

0870a4f5

Feb 17, 2022

[OpenMP][Offloading] Fix test case issues in bug49334.cpp · 092a5bb7

Shilei Tian authored Feb 17, 2022

`bug49334.cpp` has one issue that causes flaky result reported in #53730.
The root cause is `BlockedC` is never initialized but in `BlockMatMul_TargetNowait`
it is directly read and written (via `+=`). Fixes #53730.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119988

092a5bb7

Feb 16, 2022

[OpenMP][FIX] Eliminate race on the IsSPMD global · 57b4c526

Johannes Doerfert authored Feb 14, 2022

The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857

57b4c526

Feb 15, 2022

[Libomptarget] Run CPU offloading tests using the new driver · 777039a5

Joseph Huber authored Feb 14, 2022

This patch adds a new target to the OpenMP CPU offloading tests. This
tests the usage of the new driver for CPU offloading. If this all works
then we can move to transition to the new driver as the default.

Depends on D119613

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119736

777039a5

Feb 14, 2022

[Libomptarget][NFC] Remove constexpr to hide warnings · 48e3dcec

Joseph Huber authored Feb 14, 2022

Currently whenever we compile the device runtime we get the following
'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue'
is not defined [-Wundefined-inline]' warning. This can be silenced by
removing the constexpr attribute for this function. Doing this doesn't
change the generated bitcode at all but prevents the screen from getting
filled with warnings whenver we build the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119747

48e3dcec

Feb 13, 2022

[OpenMP][Offloading] Fix infinite loop in applyToShadowMapEntries · c27f530d

Shilei Tian authored Feb 12, 2022

This patch fixes the issue that the for loop in `applyToShadowMapEntries`
is infinite because `Itr` is not incremented in `CB`. Fixes #53727.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119471

c27f530d

Feb 11, 2022

[OpenMP][Offloading] Change the way to compare floating point values in bug49334.cpp · 702a976c

Shilei Tian authored Feb 10, 2022

`bug49334.cpp` directly uses `!=` to compare two floating point values,
which is almost wrong.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D119485

702a976c

[OpenMP][CUDA] Remove the hard team limit · aca33b0b

Shilei Tian authored Feb 10, 2022

Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119313

aca33b0b