Skip to content
  1. Apr 07, 2022
    • Michael Kruse's avatar
      [libomptarget] Add device RTL to regression test dependencies. · 7fa7b0cb
      Michael Kruse authored
      In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D123177
      7fa7b0cb
  2. Mar 06, 2022
  3. Mar 03, 2022
  4. Mar 02, 2022
  5. Feb 10, 2022
  6. Feb 08, 2022
    • Joseph Huber's avatar
      [Libomptarget] Add header files as a dependency to CMake target · 99d72ebd
      Joseph Huber authored
      This patch manually adds the runtime include files to the list of
      dependencies when we build the bitcode runtime library. Previously if
      only the header was changed we would not recompile the source files.
      The solution used here isn't optimal because every source file not has a
      dependency on each header file regardless of if it was actually used by
      that file.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D119254
      99d72ebd
  7. Feb 04, 2022
    • Joseph Huber's avatar
      [OpenMP] Completely remove old device runtime · 034adaf5
      Joseph Huber authored
      This patch completely removes the old OpenMP device runtime. Previously,
      the old runtime had the prefix `libomptarget-new-` and the old runtime
      was simply called `libomptarget-`. This patch makes the formerly new
      runtime the only runtime available. The entire project has been deleted,
      and all references to the `libomptarget-new` runtime has been replaced
      with `libomptarget-`.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D118934
      034adaf5
  8. Jan 31, 2022
  9. Jan 21, 2022
    • Joseph Huber's avatar
      [Libomptarget] Change visibility to hidden for device RTL · 26feef08
      Joseph Huber authored
      This patch changes the visibility for all construct in the new device
      RTL to be hidden by default. This is done after the changes introduced
      in D117806 changed the visibility from being hidden by default for all
      device compilations. This asserts that the visibility for the device
      runtime library will be hidden except for the internal environment
      variable. This is done to aid optimization and linking of the device
      library.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D117807
      26feef08
  10. Jan 20, 2022
    • Joseph Huber's avatar
      [OpenMP] Expand short verisions of OpenMP offloading triples · 28d71860
      Joseph Huber authored
      The OpenMP offloading libraries are built with fixed triples and linked
      in during compile time. This would cause un-helpful errors if the user
      passed in the wrong expansion of the triple used for the bitcode
      library. because we only support these triples for OpenMP offloading we
      can normalize them to the full verion used in the bitcode library.
      
      Reviewed By: jdoerfert, JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D117634
      28d71860
  11. Jan 13, 2022
  12. Nov 04, 2021
  13. Oct 28, 2021
  14. Oct 26, 2021
  15. Oct 21, 2021
    • Jon Chesterfield's avatar
      [libomptarget][DeviceRTL] Generalise and simplify cmakelists · a602c2b5
      Jon Chesterfield authored
      Step towards building the DeviceRTL for amdgpu.
      
      Mostly replaces cuda-specific toolchain finding logic with the
      generic logic currently found in the amdgpu deviceRTL cmake. Also
      deletes dead code and changes the default to build on systems
      without cuda installed, as the library doesn't use cuda and the
      amdgpu-only systems generally won't have cuda installed.
      
      Reviewed By: Meinersbur
      
      Differential Revision: https://reviews.llvm.org/D111983
      a602c2b5
  16. Oct 07, 2021
    • Jon Chesterfield's avatar
      [libomptarget] Move device environment to shared header, remove divergence · 0c554a47
      Jon Chesterfield authored
      Follow on to D110006, related to D110957
      
      Where implementations have diverged this resolves to match the new DeviceRTL
      
      - replaces definitions of this struct in deviceRTL and plugins with include
      - changes the dynamic_shared_size field from D110006 to 32 bits
      - handles stdint being unavailable in DeviceRTL
      - adds a zero initializer for the field to amdgpu
      - moves the extern declaration for deviceRTL to target_interface
        (omptarget.h is more natural, but doesn't work due to include order
        with debug.h)
      - Renames the fields everywhere to match the LLVM format used in DeviceRTL
      - Makes debug_level uint32_t everywhere (previously sometimes int32_t)
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D111069
      0c554a47
  17. Sep 27, 2021
  18. Aug 23, 2021
  19. Aug 20, 2021
    • Joachim Protze's avatar
      [libomptarget][amdcgn] Add build dependency for llvm-link and opt · 4bb36df1
      Joachim Protze authored
      D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
      (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
      cmake instance.
      
      We could limit the dependency to cases where libomptarget/plugins are really
      built. But compared to the whole llvm project, building openmp runtime is
      negligible and postponing the build of OpenMP runtime after the dependencies
      are ready seems reasonable.
      
      The direct dependency introduced in D107156 and D107320 is necessary for the
      case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).
      
      Differential Revision: https://reviews.llvm.org/D108404
      4bb36df1
  20. Jul 27, 2021
    • Johannes Doerfert's avatar
      [OpenMP] Prototype opt-in new GPU device RTL · 67ab875f
      Johannes Doerfert authored
      The "old" OpenMP GPU device runtime (D14254) has served us well for many
      years but modernizing it has caused some pain recently. This patch
      introduces an alternative which is mostly written from scratch embracing
      OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
      interfaces. This new runtime is opt-in through a clang flag (D106793).
      The new runtime is currently only build for nvptx and has "-new" in its
      name.
      
      The design is tailored towards middle-end optimizations rather than
      front-end code generation choices, a trend we already started in the old
      runtime a while back. In contrast to the old one, state is organized in
      a simple manner rather than a "smart" one. While this can induce costs
      it helps optimizations. Our expectation is that the majority of codes
      can be optimized and a "simple" design is therefore preferable. The new
      runtime does also avoid users to pay for things they do not use,
      especially wrt. memory. The unlikely case of nested parallelism is
      supported but costly to make the more likely case use less resources.
      
      The worksharing and reduction implementation have been taken from the
      old runtime and will be rewritten in the future if necessary.
      
      Documentation and debug features are still mostly missing and will be
      added over time.
      
      All external symbols start with `__kmpc` for legacy reasons but should
      be renamed once we switch over to a single runtime. All internal symbols
      are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
      clashes with user symbols.
      
      Differential Revision: https://reviews.llvm.org/D106803
      67ab875f
  21. Jul 25, 2021
  22. Jul 22, 2021
  23. Jul 18, 2021
  24. Apr 30, 2021
    • Michael Kruse's avatar
      [OpenMP][CMake] Use in-project clang as CUDA->IR compiler. · 7308862f
      Michael Kruse authored
      If available, use the clang that is already built in the same project as
      CUDA compiler unless another executable is explicitly defined. This also
      ensures the generated deviceRTL IR will be consistent with the version
      of Clang.
      
      This patch is required to reliably test OpenMP offloading in a buildbot
      without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a
      separately installed clang on the worker that will eventually become
      outdated.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D101265
      7308862f
  25. Mar 12, 2021
  26. Mar 08, 2021
    • Shilei Tian's avatar
      [OpenMP][Clang][NVPTX] Only build one bitcode library for each SM · c41ae246
      Shilei Tian authored
      In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
      NVPTX target. We don't need to have macros in source code to select right functions
      based on CUDA version. we don't need to compile multiple bitcode libraries of
      different CUDA versions for each SM. We don't need to worry about future
      compatibility with newer CUDA version.
      
      `-target-feature +ptx61` is used in this patch, which corresponds to the highest
      PTX version that CUDA 9.2 can support.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97198
      c41ae246
  27. Feb 23, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported... · f6c2984a
      Shilei Tian authored
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM
      
      `ptx71` is not supported in release version of LLVM yet. As a result,
      the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
      in D97004. Since the support in D97004 is just a WA for releease, and we'll not
      use it in the near future, using `ptx70` for CUDA 11 is feasible.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97195
      f6c2984a
  28. Feb 19, 2021
    • Joel E. Denny's avatar
      [OpenMP] Fix nvptx CUDA_VERSION conversion · ef8b3b5f
      Joel E. Denny authored
      As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
      in the following two tests:
      
      - openmp/libomptarget/test/mapping/lambda_mapping.cpp
      - openmp/libomptarget/test/offloading/bug49021.cpp
      
      The error looks like:
      
      ```
      ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
      ```
      
      The problem is that our cmake script converts CUDA version strings
      incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
      `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
      `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
      inadvertently enables `activemask` because it apparently becomes
      available in 9.2.  This patch fixes the conversion.
      
      This patch does not fix the other two tests in PR#49250.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D97012
      ef8b3b5f
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 · 89827fd4
      Shilei Tian authored
      CUDA 11.2 and CUDA 11.1 are all available now.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97004
      89827fd4
  29. Jan 30, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Refined CMake logic to choose compute capabilites · 26d38f6d
      Shilei Tian authored
      This patch refines the logic to choose compute capabilites via the
      environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
      following values (all case insensitive):
      - "all": Build `deviceRTLs` for all supported compute capabilites;
      - "auto": Only build for the compute capability auto detected. Note that this
        requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
      - "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.
      
      If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
      it to `all`.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D95687
      26d38f6d
  30. Jan 28, 2021
  31. Jan 27, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system · fb12df4a
      Shilei Tian authored
      D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default.
      However, the building requires some libraries that are not available on non-CUDA
      system by default, which could break the compilation. This patch disabled the
      build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`.
      
      Reviewed By: kparzysz
      
      Differential Revision: https://reviews.llvm.org/D95556
      fb12df4a
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` · e7535f8f
      Shilei Tian authored
      With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
      many CMake code in the project is useless. This patch cleans up unnecessary code
      and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
      still being used however to determine whether we need to involve the tests. Auto
      detection of compute capability is enabled by default and can be disabled by
      setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
      If auto detection is enabled, and CUDA is also valid, it will only build the
      bitcode library for the detected version; otherwise, all variants supported will
      be generated. One drawback of this patch is, we now generate 96 variants of
      bitcode library, and totally 1485 files to be built with a clean build on a
      non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
      disable building NVPTX `deviceRTLs`.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D95466
      e7535f8f
  32. Jan 26, 2021
    • Shilei Tian's avatar
      [OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language · 7c03f7d7
      Shilei Tian authored
      From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
      Here're a list of changes in this patch.
      1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
      2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
      3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
      4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.
      
      With this change, there are also multiple features to be expected in the near future:
      1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
      2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
      3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
      4. (Maybe more...)
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D94745
      7c03f7d7
  33. Jan 14, 2021
    • Shilei Tian's avatar
      [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX · 64e9e9ae
      Shilei Tian authored
      The comment said CUDA 9 header files use the `nv_weak` attribute which
      `clang` is not yet prepared to handle. It's three years ago and now things have
      changed. Based on my test, removing the definition doesn't have any problem on
      my machine with CUDA 11.1 installed.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D94700
      64e9e9ae
    • Shilei Tian's avatar
      [OpenMP] Drop the static library libomptarget-nvptx · 763c1f99
      Shilei Tian authored
      For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
      built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
      Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
      in the second run on the program that compiles the target part. Then the generated
      PTX file will be fed to `ptxas` to generate the object file, and finally the driver
      invokes `nvlink` to generate the binary, where the static library will be appened
      to `nvlink`.
      
      One question is, why do we need two libraries? The only difference is, the static
      library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
      they were implemented in this way, but per D94565, there is no issue if we also
      include the file into the bitcode library. Therefore, we can safely drop the
      static library.
      
      This patch is about the change in OpenMP. The driver will be updated as well if
      this patch is accepted.
      
      Reviewed By: jdoerfert, JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D94573
      763c1f99
  34. Jan 13, 2021
  35. Oct 08, 2020
    • Joseph Huber's avatar
      [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default · d5644099
      Joseph Huber authored
      Summary:
      This patch changes the CMake files for Clang and Libomptarget to query the
      system for its supported CUDA architecture. This makes it much easier for the
      user to build optimal code without needing to set the flags manually. This
      relies on the now deprecated FindCUDA method in CMake, but full support for
      architecture detection is only availible in CMake >3.18
      
      Reviewers: jdoerfert ye-luo
      
      Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl
      
      Tags: #clang #OpenMP
      
      Differential Revision: https://reviews.llvm.org/D87946
      d5644099
Loading