Commits · a602c2b51dccc756c12e7585c34710aeaef54e0b · Lorenzo Albano / LLVM bpEVL

Oct 21, 2021

[libomptarget][DeviceRTL] Generalise and simplify cmakelists · a602c2b5

Jon Chesterfield authored Oct 21, 2021

Step towards building the DeviceRTL for amdgpu.

Mostly replaces cuda-specific toolchain finding logic with the
generic logic currently found in the amdgpu deviceRTL cmake. Also
deletes dead code and changes the default to build on systems
without cuda installed, as the library doesn't use cuda and the
amdgpu-only systems generally won't have cuda installed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111983

a602c2b5

Oct 07, 2021

[libomptarget] Move device environment to shared header, remove divergence · 0c554a47

Jon Chesterfield authored Oct 07, 2021

Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069

0c554a47

Sep 27, 2021

[OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL. · 1b242dcc

Michael Kruse authored Sep 27, 2021

Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.

Fixes the openmp-offloading-cuda-runtime builder which was failing
since D110006.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110251

1b242dcc

Aug 23, 2021

[openmp] Use llvm GridValues from devicertl · 842f875c

Jon Chesterfield authored Aug 23, 2021

Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108391

842f875c

Aug 20, 2021

[libomptarget][amdcgn] Add build dependency for llvm-link and opt · 4bb36df1

Joachim Protze authored Aug 19, 2021

D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
(LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
cmake instance.

We could limit the dependency to cases where libomptarget/plugins are really
built. But compared to the whole llvm project, building openmp runtime is
negligible and postponing the build of OpenMP runtime after the dependencies
are ready seems reasonable.

The direct dependency introduced in D107156 and D107320 is necessary for the
case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).

Differential Revision: https://reviews.llvm.org/D108404

4bb36df1

Jul 27, 2021

[OpenMP] Prototype opt-in new GPU device RTL · 67ab875f

Johannes Doerfert authored Jul 25, 2021

The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803

67ab875f

Jul 25, 2021

[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs · f1b8fa55

Shilei Tian authored Jul 25, 2021

We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When
the info cache is created, some attributes are removed. As a result, although we
mark a few functions `noinline`, they are still inlined when the bitcode library
is generated. This can cause an issue in middle end optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106710

f1b8fa55

Jul 22, 2021

[OpenMP] Add an option to disable function internalization · 4a668604

Joseph Huber authored Jul 21, 2021

Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438

4a668604

Jul 18, 2021

[OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode · 4357cfc7

Shilei Tian authored Jul 18, 2021

Currently when we compile the project in debug mode, `-g` will not be added to
compilation flag. The bc files generated in different mode are of different size.
When using GPU debuggers like `cuda-gdb`, it is expected to provide more info
with a debug version of bc lib.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D106229

4357cfc7

Apr 30, 2021

[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. · 7308862f

Michael Kruse authored Apr 30, 2021

If available, use the clang that is already built in the same project as
CUDA compiler unless another executable is explicitly defined. This also
ensures the generated deviceRTL IR will be consistent with the version
of Clang.

This patch is required to reliably test OpenMP offloading in a buildbot
without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a
separately installed clang on the worker that will eventually become
outdated.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D101265

7308862f

Mar 12, 2021

[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant · 66ba494b

Johannes Doerfert authored Jan 30, 2021

The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95752

66ba494b

Mar 08, 2021

[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM · c41ae246

Shilei Tian authored Mar 08, 2021

In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
NVPTX target. We don't need to have macros in source code to select right functions
based on CUDA version. we don't need to compile multiple bitcode libraries of
different CUDA versions for each SM. We don't need to worry about future
compatibility with newer CUDA version.

`-target-feature +ptx61` is used in this patch, which corresponds to the highest
PTX version that CUDA 9.2 can support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97198

c41ae246

Feb 23, 2021

[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported... · f6c2984a

Shilei Tian authored Feb 23, 2021

[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM

`ptx71` is not supported in release version of LLVM yet. As a result,
the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
in D97004. Since the support in D97004 is just a WA for releease, and we'll not
use it in the near future, using `ptx70` for CUDA 11 is feasible.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97195

f6c2984a

Feb 19, 2021

[OpenMP] Fix nvptx CUDA_VERSION conversion · ef8b3b5f

Joel E. Denny authored Feb 19, 2021

As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
in the following two tests:

- openmp/libomptarget/test/mapping/lambda_mapping.cpp
- openmp/libomptarget/test/offloading/bug49021.cpp

The error looks like:

```
ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
```

The problem is that our cmake script converts CUDA version strings
incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
`getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
`openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
inadvertently enables `activemask` because it apparently becomes
available in 9.2.  This patch fixes the conversion.

This patch does not fix the other two tests in PR#49250.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97012

ef8b3b5f

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 · 89827fd4

Shilei Tian authored Feb 18, 2021

CUDA 11.2 and CUDA 11.1 are all available now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97004

89827fd4

Jan 30, 2021

[OpenMP][NVPTX] Refined CMake logic to choose compute capabilites · 26d38f6d

Shilei Tian authored Jan 30, 2021

This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
  requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.

If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95687

26d38f6d

Jan 28, 2021

[OpenMP][NVPTX] Added the missing -O1 when building NVPTX bitcode libraries · 5a64794b

Shilei Tian authored Jan 28, 2021

In the past `-O1` was used when building NVPTX bitcode libraries. After
we switched to OpenMP, `-O1` was missing by mistake, leading to a huge performance
regression.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95545

5a64794b

Jan 27, 2021

[OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system · fb12df4a

Shilei Tian authored Jan 27, 2021

D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default.
However, the building requires some libraries that are not available on non-CUDA
system by default, which could break the compilation. This patch disabled the
build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D95556

fb12df4a

[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` · e7535f8f

Shilei Tian authored Jan 26, 2021

With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
many CMake code in the project is useless. This patch cleans up unnecessary code
and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
still being used however to determine whether we need to involve the tests. Auto
detection of compute capability is enabled by default and can be disabled by
setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
If auto detection is enabled, and CUDA is also valid, it will only build the
bitcode library for the detected version; otherwise, all variants supported will
be generated. One drawback of this patch is, we now generate 96 variants of
bitcode library, and totally 1485 files to be built with a clean build on a
non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
disable building NVPTX `deviceRTLs`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95466

e7535f8f

Jan 26, 2021

[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language · 7c03f7d7

Shilei Tian authored Jan 26, 2021

From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.

With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94745

7c03f7d7

Jan 14, 2021

[OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX · 64e9e9ae

Shilei Tian authored Jan 14, 2021

The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94700

64e9e9ae

[OpenMP] Drop the static library libomptarget-nvptx · 763c1f99

Shilei Tian authored Jan 14, 2021

For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.

One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.

This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94573

763c1f99

Jan 13, 2021

[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL · 84e0b14a

Jon Chesterfield authored Jan 13, 2021

[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D94565

84e0b14a

Oct 08, 2020

[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default · d5644099

Joseph Huber authored Oct 07, 2020

Summary:
This patch changes the CMake files for Clang and Libomptarget to query the
system for its supported CUDA architecture. This makes it much easier for the
user to build optimal code without needing to set the flags manually. This
relies on the now deprecated FindCUDA method in CMake, but full support for
architecture detection is only availible in CMake >3.18

Reviewers: jdoerfert ye-luo

Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl

Tags: #clang #OpenMP

Differential Revision: https://reviews.llvm.org/D87946

d5644099

Sep 24, 2020

[OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL · ffd159d8

Ye Luo authored Sep 24, 2020

It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage.

In addition, so far the size is hard-coded up to __CUDA_ARCH__ 700 and is already a hassle for 800.
Introduce MAX_SM for 800 and protect future arch

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D88185

ffd159d8

Dec 18, 2019

[libomptarget][nfc] Extract function from data_sharing, move to common · 8adae602

JonChesterfield authored Dec 18, 2019

Summary:
[libomptarget][nfc] Extract function from data_sharing, move to common

Finding the first active thread in the warp is different on nvptx and amdgcn,
mostly due to warp size and the desire for efficiency.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71643

8adae602

Dec 17, 2019

[libomptarget][nfc] Move three files under common, build them for amdgcn · 0c83f8cc

JonChesterfield authored Dec 17, 2019

Summary:
[libomptarget][nfc] Move three files under common, build them for amdgcn

Change to reduction.cu to remove two dead includes, otherwise no code change.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71601

0c83f8cc

[libomptarget][nfc] Move omp locks under target_impl · 3d3e4076

JonChesterfield authored Dec 17, 2019

Summary:
[libomptarget][nfc] Move omp locks under target_impl

These are likely to be target specific, even down to the lock_t which is
correspondingly moved out of interface.h. The alternative is to include
interface.h in target_impl which substantiatially increases the scope of
those symbols.

The current nvptx implementation deadlocks on amdgcn. The preferred
implementation for that arch is still under discussion - this change
leaves declarations in target_impl.

The functions could be inline for nvptx. I'd prefer to keep the internals
hidden in the target_impl translation unit, but will add the (possibly renamed)
macros to target_impl.h if preferred.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71574

3d3e4076

Dec 06, 2019

[libomptarget][nfc] Move three more files to common · cd90f49d

Jon Chesterfield authored Dec 06, 2019

Summary: [libomptarget][nfc] Move three more files to common

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: ABataev

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71103

cd90f49d

Dec 05, 2019

[libomptarget][nfc] Move omptarget-nvptx under common · d0b9ed5c

Jon Chesterfield authored Dec 05, 2019

Summary:
[libomptarget][nfc] Move omptarget-nvptx under common

Almost all files depend on require omptarget-nvptx, which no longer
contains any obviously architecture dependent code. Moving it under
common unblocks task/loop for amdgcn, and allows moving other code.

At some point there should probably be a widespread symbol renaming to
replace the nvptx string. I'd prefer to get things working first.

Building this (and task.cu, loop.cu) without a cuda library requires
some more refactoring, e.g. wrap threadfence(), use DEVICE macro more
consistently. Patches for that are orthogonal and will be posted shortly.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: ABataev

Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71073

d0b9ed5c

Nov 18, 2019

[libomptarget][nfc] Move some source into common from nvptx · 5a4a05d7

Jon Chesterfield authored Nov 18, 2019

Summary:
[libomptarget][nfc] Move some source into common from nvptx

Moves some source that compiles cleanly under amdgcn into a common subdirectory
Includes some non-trivial files and some headers. Keeps the cuda file extension.

The build systems for different architectures seem unlikely to have much in
common. The idea is therefore to set include paths such that files under
common/src compile as if they were under arch/src as the mechanism for sharing.
In particular, files under common/src need to be able to include target_impl.h.

The corresponding -Icommon is left out in favour of explicit includes on the
basis that the it makes it clearer which files under common are used by a given
architecture.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: ABataev

Subscribers: jfb, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D70328

5a4a05d7

Nov 13, 2019

[libomptarget] Move supporti.h to support.cu · fd9fa999

JonChesterfield authored Nov 13, 2019

Summary:
[libomptarget] Move supporti.h to support.cu
Reimplementation of D69652, without the unity build and refactors.
Will need a clean build of libomptarget as the cmakelists changed.

Reviewers: ABataev, jdoerfert

Reviewed By: jdoerfert

Subscribers: mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D70131

fd9fa999

Nov 06, 2019

[libomptarget] Revert all improvements to support · 7cea0cea

Jon Chesterfield authored Nov 06, 2019

Summary:
[libomptarget] Revert all improvements to support

The change to unity build for nvcc has broken the build for some developers.
This patch reverts to a known-working state.

There has been some confusion over exactly how the build broke. I think we
have reached a common understanding that the disappearing symbols are from
the bitcode library built by clang. The static archive built by nvcc may show the
same problem. Some of the confusion arose from building the deviceRTL twice
and using one or the other library based on various environmental factors.

I'm pretty sure the problem is clang expanding `__forceinline__` into both `__inline__`
and `attribute(("always_inline"))`. The `__inline__` attribute resolves to linkonce_odr
which is not safe for exporting symbols from translation units.

"always_inline" is the desired semantic for small functions defined in one translation
unit that are intended to be inlined at link time. "inline" is not.

This therefore reintroduces the dependency hazard of supporti.h and some code
duplication, and blocks progress separating deviceRTL into reusable components.

See also D69857, D69859 for attempts at a fix instead of a revert.

Reviewers: ABataev, jdoerfert, grokos, ikitayama, tianshilei1992

Reviewed By: ABataev

Subscribers: mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69885

7cea0cea

Oct 31, 2019

[nfc][libomptarget] Reorganise support header · 764c8420

JonChesterfield authored Oct 31, 2019

Summary:
[nfc][libomptarget] Reorganise support header

All functions defined in support implementation are now declared in support.h
Reordered functions in support implementation to match the sequence in support.h
Added include guards to support.h
Added #include interface to support.h to provide kmp_Ident declaration
Move supporti.h to support.cu and s/INLINE/EXTERN/g
Add remaining includes to support.cu

A minor side effect is to change the name mangling of the support functions to
extern "C". If this matters another macro along the lines of INLINE/EXTERN
can be added - perhaps DEVICE as that's the obvious implementation.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69652

764c8420

[libomptarget] Change nvcc compilation to use a unity build · e9f9dfab

Jon Chesterfield authored Oct 31, 2019

Summary:
[libomptarget] Change nvcc compilation to use a unity build

This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).

This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.

Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers

Reviewed By: jdoerfert

Subscribers: mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69489

e9f9dfab

Oct 15, 2019

[libomptarget][nfc] Make interface.h target independent · d69d1aa1

Jon Chesterfield authored Oct 15, 2019

Summary:
[libomptarget][nfc] Make interface.h target independent

Move interface.h under a top level include directory.
Remove #includes to avoid the interface depending on the implementation.

Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy

Reviewed By: jdoerfert

Subscribers: mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D68615

llvm-svn: 374919

d69d1aa1

Jan 19, 2019

Update more file headers across all of the LLVM projects in the monorepo · 57b08b09

Chandler Carruth authored Jan 19, 2019

to reflect the new license. These used slightly different spellings that
defeated my regular expressions.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351648

57b08b09

Oct 01, 2018

[libomptarget-nvptx] Enable asserts in bclib · a762bfc0

Jonas Hahnfeld authored Oct 01, 2018

If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in
the bitcode library. Everything else will have very unpleasent
effects because asserts will appear when falling back to the static
library libomptarget-nvptx.a.

Differential Revision: https://reviews.llvm.org/D52701

llvm-svn: 343477

a762bfc0

Sep 28, 2018

[libomptarget-nvptx] Add testing infrastructure · 122dbb5d

Jonas Hahnfeld authored Sep 28, 2018

This patch also introduces testing for libomptarget-nvptx
which has been missing until now. I propose to add tests for
all bugs that are fixed in the future.
The target check-libomptarget-nvptx is not run by default because
 - we can't determine if there is a GPU plugged into the system.
 - it will require the latest Clang compiler. Keeping compatibility
   with older releases would prevent testing newer code generation
   developed in trunk.

Differential Revision: https://reviews.llvm.org/D51687

llvm-svn: 343324

122dbb5d

May 25, 2018

[CMake] Unify install path for libraries · 65e0b878

Jonas Hahnfeld authored May 25, 2018

Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.

Differential Revision: https://reviews.llvm.org/D47130

llvm-svn: 333284

65e0b878