Commits · 7fa7b0cbd8f8d43c2237b75423cd25e74edde820 · Lorenzo Albano / LLVM bpEVL

Apr 07, 2022

[libomptarget] Add device RTL to regression test dependencies. · 7fa7b0cb

Michael Kruse authored Apr 06, 2022

In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D123177

7fa7b0cb

Mar 06, 2022

[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` · 7f7c2c34

Shilei Tian authored Mar 05, 2022

`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121055

7f7c2c34

Mar 03, 2022
- [AMDGPU] Add gfx1036 target · 84069581
  Aakanksha authored Mar 02, 2022
```
Differential Revision: https://reviews.llvm.org/D120846
```
  84069581
Mar 02, 2022

[AMDGPU] Add gfx940 target · 2e2e64df

Stanislav Mekhanoshin authored Feb 28, 2022

This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688

2e2e64df

Feb 10, 2022

[Libomptarget][AMDGCN] add gfx90c target · 59ad9650

Ye Luo authored Feb 10, 2022

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D119478

59ad9650

Feb 08, 2022

[Libomptarget] Add header files as a dependency to CMake target · 99d72ebd

Joseph Huber authored Feb 08, 2022

This patch manually adds the runtime include files to the list of
dependencies when we build the bitcode runtime library. Previously if
only the header was changed we would not recompile the source files.
The solution used here isn't optimal because every source file not has a
dependency on each header file regardless of if it was actually used by
that file.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D119254

99d72ebd

Feb 04, 2022

[OpenMP] Completely remove old device runtime · 034adaf5

Joseph Huber authored Feb 03, 2022

This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D118934

034adaf5

Jan 31, 2022

[Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded · fd5853da

Joseph Huber authored Jan 31, 2022

Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared memory. This patch ado adds a
debug mesage to indicate when the shared memory was insufficient.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118625

fd5853da

Jan 21, 2022

[Libomptarget] Change visibility to hidden for device RTL · 26feef08

Joseph Huber authored Jan 20, 2022

This patch changes the visibility for all construct in the new device
RTL to be hidden by default. This is done after the changes introduced
in D117806 changed the visibility from being hidden by default for all
device compilations. This asserts that the visibility for the device
runtime library will be hidden except for the internal environment
variable. This is done to aid optimization and linking of the device
library.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117807

26feef08

Jan 20, 2022

[OpenMP] Expand short verisions of OpenMP offloading triples · 28d71860

Joseph Huber authored Jan 19, 2022

The OpenMP offloading libraries are built with fixed triples and linked
in during compile time. This would cause un-helpful errors if the user
passed in the wrong expansion of the triple used for the bitcode
library. because we only support these triples for OpenMP offloading we
can normalize them to the full verion used in the bitcode library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D117634

28d71860

Jan 13, 2022

[openmp][devicertl] Handle missing clang_tool · d53b9795

Jon Chesterfield authored Jan 13, 2022

Fixes github issues/52910

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117230

d53b9795

Nov 04, 2021
- [OpenMP] Build device runtimes for sm_86 · d4b1cf8f
  Johannes Doerfert authored Nov 03, 2021
```
Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D113111
```
  d4b1cf8f
Oct 28, 2021

[libomptarget] Build DeviceRTL for amdgpu · 4d50803c

Jon Chesterfield authored Oct 28, 2021

Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

4d50803c

Revert "[libomptarget] Build DeviceRTL for amdgpu" · 6c7b203d
Jon Chesterfield authored Oct 28, 2021
```
 - more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb.
```
6c7b203d

[libomptarget] Build DeviceRTL for amdgpu · 33427fdb

Jon Chesterfield authored Oct 28, 2021

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227

33427fdb

Oct 26, 2021

[libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu · e42e5785

Jon Chesterfield authored Oct 26, 2021

Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx.

NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111987

e42e5785

Oct 21, 2021

[libomptarget][DeviceRTL] Generalise and simplify cmakelists · a602c2b5

Jon Chesterfield authored Oct 21, 2021

Step towards building the DeviceRTL for amdgpu.

Mostly replaces cuda-specific toolchain finding logic with the
generic logic currently found in the amdgpu deviceRTL cmake. Also
deletes dead code and changes the default to build on systems
without cuda installed, as the library doesn't use cuda and the
amdgpu-only systems generally won't have cuda installed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111983

a602c2b5

Oct 07, 2021

[libomptarget] Move device environment to shared header, remove divergence · 0c554a47

Jon Chesterfield authored Oct 07, 2021

Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069

0c554a47

Sep 27, 2021

[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. · 1b242dcc

Michael Kruse authored Sep 27, 2021

Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.

Fixes the openmp-offloading-cuda-runtime builder which was failing
since D110006.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110251

1b242dcc

Aug 23, 2021

[openmp] Use llvm GridValues from devicertl · 842f875c

Jon Chesterfield authored Aug 23, 2021

Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108391

842f875c

Aug 20, 2021

[libomptarget][amdcgn] Add build dependency for llvm-link and opt · 4bb36df1

Joachim Protze authored Aug 19, 2021

D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
(LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
cmake instance.

We could limit the dependency to cases where libomptarget/plugins are really
built. But compared to the whole llvm project, building openmp runtime is
negligible and postponing the build of OpenMP runtime after the dependencies
are ready seems reasonable.

The direct dependency introduced in D107156 and D107320 is necessary for the
case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).

Differential Revision: https://reviews.llvm.org/D108404

4bb36df1

Jul 27, 2021

[OpenMP] Prototype opt-in new GPU device RTL · 67ab875f

Johannes Doerfert authored Jul 25, 2021

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803

67ab875f

Jul 25, 2021

[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs · f1b8fa55

Shilei Tian authored Jul 25, 2021

We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When
the info cache is created, some attributes are removed. As a result, although we
mark a few functions `noinline`, they are still inlined when the bitcode library
is generated. This can cause an issue in middle end optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106710

f1b8fa55

Jul 22, 2021

[OpenMP] Add an option to disable function internalization · 4a668604

Joseph Huber authored Jul 21, 2021

Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438

4a668604

Jul 18, 2021

[OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode · 4357cfc7

Shilei Tian authored Jul 18, 2021

Currently when we compile the project in debug mode, `-g` will not be added to
compilation flag. The bc files generated in different mode are of different size.
When using GPU debuggers like `cuda-gdb`, it is expected to provide more info
with a debug version of bc lib.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106229

4357cfc7

Apr 30, 2021

[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. · 7308862f

Michael Kruse authored Apr 30, 2021

If available, use the clang that is already built in the same project as
CUDA compiler unless another executable is explicitly defined. This also
ensures the generated deviceRTL IR will be consistent with the version
of Clang.

This patch is required to reliably test OpenMP offloading in a buildbot
without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a
separately installed clang on the worker that will eventually become
outdated.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D101265

7308862f

Mar 12, 2021

[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant · 66ba494b

Johannes Doerfert authored Jan 30, 2021

The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95752

66ba494b

Mar 08, 2021

[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM · c41ae246

Shilei Tian authored Mar 08, 2021

In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
NVPTX target. We don't need to have macros in source code to select right functions
based on CUDA version. we don't need to compile multiple bitcode libraries of
different CUDA versions for each SM. We don't need to worry about future
compatibility with newer CUDA version.

`-target-feature +ptx61` is used in this patch, which corresponds to the highest
PTX version that CUDA 9.2 can support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97198

c41ae246

Feb 23, 2021

[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported... · f6c2984a

Shilei Tian authored Feb 23, 2021

[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM

`ptx71` is not supported in release version of LLVM yet. As a result,
the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
in D97004. Since the support in D97004 is just a WA for releease, and we'll not
use it in the near future, using `ptx70` for CUDA 11 is feasible.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97195

f6c2984a

Feb 19, 2021

[OpenMP] Fix nvptx CUDA_VERSION conversion · ef8b3b5f

Joel E. Denny authored Feb 19, 2021

As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
in the following two tests:

- openmp/libomptarget/test/mapping/lambda_mapping.cpp
- openmp/libomptarget/test/offloading/bug49021.cpp

The error looks like:

```
ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
```

The problem is that our cmake script converts CUDA version strings
incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
`getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
`openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
inadvertently enables `activemask` because it apparently becomes
available in 9.2.  This patch fixes the conversion.

This patch does not fix the other two tests in PR#49250.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97012

ef8b3b5f

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 · 89827fd4

Shilei Tian authored Feb 18, 2021

CUDA 11.2 and CUDA 11.1 are all available now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97004

89827fd4

Jan 30, 2021

[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites · 26d38f6d

Shilei Tian authored Jan 30, 2021

This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
  requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.

If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95687

26d38f6d

Jan 28, 2021

[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries · 5a64794b

Shilei Tian authored Jan 28, 2021

In the past `-O1` was used when building NVPTX bitcode libraries. After
we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance
regression.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95545

5a64794b

Jan 27, 2021

[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system · fb12df4a

Shilei Tian authored Jan 27, 2021

D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default.
However, the building requires some libraries that are not available on non-CUDA
system by default, which could break the compilation. This patch disabled the
build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D95556

fb12df4a

[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` · e7535f8f

Shilei Tian authored Jan 26, 2021

With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
many CMake code in the project is useless. This patch cleans up unnecessary code
and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
still being used however to determine whether we need to involve the tests. Auto
detection of compute capability is enabled by default and can be disabled by
setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
If auto detection is enabled, and CUDA is also valid, it will only build the
bitcode library for the detected version; otherwise, all variants supported will
be generated. One drawback of this patch is, we now generate 96 variants of
bitcode library, and totally 1485 files to be built with a clean build on a
non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
disable building NVPTX `deviceRTLs`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95466

e7535f8f

Jan 26, 2021

[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language · 7c03f7d7

Shilei Tian authored Jan 26, 2021

From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.

With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94745

7c03f7d7

Jan 14, 2021

[OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX · 64e9e9ae

Shilei Tian authored Jan 14, 2021

The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700

64e9e9ae

[OpenMP] Drop the static library libomptarget-nvptx · 763c1f99

Shilei Tian authored Jan 14, 2021

For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573

763c1f99

Jan 13, 2021

[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL · 84e0b14a

Jon Chesterfield authored Jan 13, 2021

[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94565

84e0b14a

Oct 08, 2020

[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default · d5644099

Joseph Huber authored Oct 07, 2020

Summary:
This patch changes the CMake files for Clang and Libomptarget to query the
system for its supported CUDA architecture. This makes it much easier for the
user to build optimal code without needing to set the flags manually. This
relies on the now deprecated FindCUDA method in CMake, but full support for
architecture detection is only availible in CMake >3.18

Reviewers: jdoerfert ye-luo

Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl

Tags: #clang #OpenMP

Differential Revision: https://reviews.llvm.org/D87946

d5644099