Commits · 7fa7b0cbd8f8d43c2237b75423cd25e74edde820 · Lorenzo Albano / LLVM bpEVL

Apr 07, 2022

[libomptarget] Add device RTL to regression test dependencies. · 7fa7b0cb

Michael Kruse authored Apr 06, 2022

In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177

7fa7b0cb

Mar 29, 2022

[Attributor][OpenMP] Add assumption for non-call assembly instructions · 7df2eba7

Johannes Doerfert authored Sep 11, 2021

Inline assembly is scary but we need to support it for the OpenMP GPU
device runtime. The new assumption expresses the fact that it may not
have call semantics, that is, it will not call another function but
simply perform an operation or side-effect. This is important for
reachability in the presence of inline assembly.

Differential Revision: https://reviews.llvm.org/D109986

7df2eba7

Mar 22, 2022

[OpenMP] Manually unroll the argument copy loop · a619072c

Joseph Huber authored Mar 21, 2022

The unroll pragma did not properly work as the loop bound was not known
when we optimize the runtime and we then added a "unroll disable"
metadata which prevented unrolling later when the bounds were known.
For now we manually unroll to make sure up to 16 elements are handled
nicely. This helps optimizations to look through the argument passing.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109164

a619072c

Mar 06, 2022

[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` · 7f7c2c34

Shilei Tian authored Mar 05, 2022

`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055

7f7c2c34

Mar 04, 2022

[Libomptarget] Work around bug in initialization of libomptarget · e2dcc221

Joseph Huber authored Mar 04, 2022

Libomptarget uses some shared variables to track certain internal stated
in the runtime. This causes problems when we have code that contains no
OpenMP kernels. These variables are normally initialized upon kernel
entry, but if there are no kernels we will see no initialization.
Currently we load the runtime into each source file when not running in
LTO mode, so these variables will be erroneously considered undefined or
dead and removed, causing miscompiles. This patch temporarily works
around the most obvious case, but others still exhibit this problem. We
will need to fix this more soundly later.

Fixes #54208.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D121007

e2dcc221

Mar 03, 2022
- [AMDGPU] Add gfx1036 target · 84069581
  Aakanksha authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120846
```
  84069581
Mar 02, 2022

[AMDGPU] Add gfx940 target · 2e2e64df

Stanislav Mekhanoshin authored Feb 28, 2022

This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688

2e2e64df

Feb 23, 2022
- [Libomptarget][NFC} Fix missing newline in error message · 5dd0c396
  Joseph Huber authored Feb 23, 2022
  
  5dd0c396
Feb 18, 2022

[OpenMP] Add flag for disabling thread state in runtime · 0870a4f5

Joseph Huber authored Feb 17, 2022

The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120106

0870a4f5

Feb 16, 2022

[OpenMP][FIX] Eliminate race on the IsSPMD global · 57b4c526

Johannes Doerfert authored Feb 14, 2022

The `IsSPMD` global can only be read by threads other than the main
thread *after* initialization is complete. To allow usage of
`mapping::getBlockSize` before initialization is done, we can pass the
`IsSPMD` state explicitly. This is similar to other APIs that take
`IsSPMD` explicitly to avoid such a race, e.g.,
`mapping::isInitialThreadInLevel0(IsSPMD)`

Fixes https://github.com/llvm/llvm-project/issues/53857

57b4c526

Feb 14, 2022

[Libomptarget][NFC] Remove constexpr to hide warnings · 48e3dcec

Joseph Huber authored Feb 14, 2022

Currently whenever we compile the device runtime we get the following
'Mapping.cpp:32:32: warning: inline function '_OMP::impl::getGridValue'
is not defined [-Wundefined-inline]' warning. This can be silenced by
removing the constexpr attribute for this function. Doing this doesn't
change the generated bitcode at all but prevents the screen from getting
filled with warnings whenver we build the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119747

48e3dcec

Feb 10, 2022

[Libomptarget][AMDGCN] add gfx90c target · 59ad9650

Ye Luo authored Feb 10, 2022

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119478

59ad9650

Feb 08, 2022

[Libomptarget] Add header files as a dependency to CMake target · 99d72ebd

Joseph Huber authored Feb 08, 2022

This patch manually adds the runtime include files to the list of
dependencies when we build the bitcode runtime library. Previously if
only the header was changed we would not recompile the source files.
The solution used here isn't optimal because every source file not has a
dependency on each header file regardless of if it was actually used by
that file.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D119254

99d72ebd

Feb 07, 2022

[Libomptarget] Replace Value RAII with default value · d28051c4

Joseph Huber authored Feb 07, 2022

This patch replaces the ValueRAII pointer with a default 'nullptr'
value. Previously this was initialized as a reference to an existing
variable. The use of this variable caused overhead as the compiler could
not look through the uses and determine that it was unused if 'Active'
was not set. Because of this accesses to the variable would be left in
the runtime once compiled.

Fixes #53641

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119187

d28051c4

Feb 04, 2022

[OpenMP] Completely remove old device runtime · 034adaf5

Joseph Huber authored Feb 03, 2022

This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934

034adaf5

Feb 01, 2022

Revert "[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned" · f52927c1
Jon Chesterfield authored Feb 01, 2022
```
This seems to be the root cause of hangs on amdgpu. Reverting while investigating.
This reverts commit 7b9844cc.
```
f52927c1

[OpenMP][FIX] Explicit barriers in SPMD mode are not aligned · 7b9844cc

Johannes Doerfert authored Jan 26, 2022

Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fashion. We can optimize them if that property can be proven but
that is different.

7b9844cc

Jan 31, 2022

[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded · fd5853da

Joseph Huber authored Jan 31, 2022

Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared memory. This patch ado adds a
debug mesage to indicate when the shared memory was insufficient.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118625

fd5853da

Jan 28, 2022

Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times." · 619f44b0
Ron Lieberman authored Jan 28, 2022
```
This reverts commit 27c799ec.
```
619f44b0

[OpenMP] Ensure broken assumptions print once, not thousands of times. · 27c799ec

Joseph Huber authored Jan 27, 2022

If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112156

27c799ec

Jan 27, 2022

[OpenMP][NFCI] Pipe the IdentTy object through more new RT functions · 1e121568

Johannes Doerfert authored Jan 27, 2022

IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have a large
impact on performance, e.g., everything related to state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112494

1e121568

Jan 21, 2022

[Libomptarget] Change visibility to hidden for device RTL · 26feef08

Joseph Huber authored Jan 20, 2022

This patch changes the visibility for all construct in the new device
RTL to be hidden by default. This is done after the changes introduced
in D117806 changed the visibility from being hidden by default for all
device compilations. This asserts that the visibility for the device
runtime library will be hidden except for the internal environment
variable. This is done to aid optimization and linking of the device
library.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117807

26feef08

Jan 20, 2022

[OpenMP] Expand short verisions of OpenMP offloading triples · 28d71860

Joseph Huber authored Jan 19, 2022

The OpenMP offloading libraries are built with fixed triples and linked
in during compile time. This would cause un-helpful errors if the user
passed in the wrong expansion of the triple used for the bitcode
library. because we only support these triples for OpenMP offloading we
can normalize them to the full verion used in the bitcode library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D117634

28d71860

Jan 19, 2022

[Libomptarget] Fix external visibility for internal variables · 4863fed9

Joseph Huber authored Jan 17, 2022

After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526

4863fed9

Jan 18, 2022

Revert "[Libomptarget] Fix external visibility for internal variables" · 138cc5a0
Joseph Huber authored Jan 18, 2022
```
Reverting to investigate break on AMDGPU. This reverts commit
0203ff19.
```
138cc5a0

[Libomptarget] Fix external visibility for internal variables · 0203ff19

Joseph Huber authored Jan 17, 2022

After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526

0203ff19

Jan 17, 2022

[Libomptarget] Add `cold` to KeepAlive attributes · 4869a22d

Joseph Huber authored Jan 17, 2022

This patch adds the `cold` attribute to the keepAlive functions in the
RTL. This dummy function exists to keep certain RTL calls alive without
them being optimized out, but it is never called and can be declared
cold. This also helps some erroneous remarks being given on this
function because it has weak linkage and cannot be made internal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D117513

4869a22d

Jan 13, 2022

[openmp][devicertl] Handle missing clang_tool · d53b9795

Jon Chesterfield authored Jan 13, 2022

Fixes github issues/52910

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117230

d53b9795

[Libomptarget] Fix multiply defined symbol during linking · 4746e38f

Joseph Huber authored Jan 13, 2022

This patch adds the `weak` identifier to the openmp device environment
variable. The changes introduced in https://reviews.llvm.org/D117211
result in multiply defined symbols. Because the symbol is potentially
included multiple times for each offloading file we will get symbol
colisions, and because it needs to have external visiblity it should be
weak.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117231

4746e38f

[openmp] Mark used variables as retain as well · 43956089

Jon Chesterfield authored Jan 13, 2022

D97446 changed the behaviour of 'used'. Compensate.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D117211

43956089

Dec 27, 2021

[OpenMP][FIX] Change globalization alignment to 16 · 7cdaa5a9

Joseph Huber authored Dec 17, 2021

This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971

7cdaa5a9

Dec 09, 2021

[OpenMP][FIX] Pass the num_threads value directly to parallel_51 · bc9c4d72

Joseph Huber authored Dec 09, 2021

The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623

bc9c4d72

Nov 30, 2021
- [openmp][devicertl] Add a missing loader_uninitialized attribute · 3ab150f6
  Jon Chesterfield authored Nov 29, 2021
  
  3ab150f6
Nov 16, 2021

[OpenMP] Fix initializer not working on AMDGPU · 374cd0fb

Joseph Huber authored Nov 15, 2021

The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963

374cd0fb

Nov 12, 2021

[OpenMP] Fix main thread barrier for Pascal and amdgpu · c9dfe322

Joel E. Denny authored Nov 12, 2021

Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602

c9dfe322

Nov 10, 2021

[OpenMP] Lower printf to __llvm_omp_vprintf · 27177b82

Jon Chesterfield authored Nov 10, 2021

Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680

27177b82

Nov 09, 2021

[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files · 737c4a26

Atmn Patel authored Nov 08, 2021

The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421

737c4a26

Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files" · ef717f38
Atmn Patel authored Nov 09, 2021
```
This reverts commit 81a7cad2.
```
ef717f38

[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files · 81a7cad2

Atmn Patel authored Nov 08, 2021

The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421

81a7cad2

Nov 08, 2021
- Revert "[OpenMP] Lower printf to __llvm_omp_vprintf" · 0fa45d6d
  Jon Chesterfield authored Nov 08, 2021
```
This reverts commit db81d8f6.
```
  0fa45d6d