Skip to content
  1. Aug 20, 2021
    • Joachim Protze's avatar
      [libomptarget][amdcgn] Add build dependency for llvm-link and opt · 4bb36df1
      Joachim Protze authored
      D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
      (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
      cmake instance.
      
      We could limit the dependency to cases where libomptarget/plugins are really
      built. But compared to the whole llvm project, building openmp runtime is
      negligible and postponing the build of OpenMP runtime after the dependencies
      are ready seems reasonable.
      
      The direct dependency introduced in D107156 and D107320 is necessary for the
      case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).
      
      Differential Revision: https://reviews.llvm.org/D108404
      4bb36df1
  2. Jul 27, 2021
    • Johannes Doerfert's avatar
      [OpenMP] Prototype opt-in new GPU device RTL · 67ab875f
      Johannes Doerfert authored
      The "old" OpenMP GPU device runtime (D14254) has served us well for many
      years but modernizing it has caused some pain recently. This patch
      introduces an alternative which is mostly written from scratch embracing
      OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
      interfaces. This new runtime is opt-in through a clang flag (D106793).
      The new runtime is currently only build for nvptx and has "-new" in its
      name.
      
      The design is tailored towards middle-end optimizations rather than
      front-end code generation choices, a trend we already started in the old
      runtime a while back. In contrast to the old one, state is organized in
      a simple manner rather than a "smart" one. While this can induce costs
      it helps optimizations. Our expectation is that the majority of codes
      can be optimized and a "simple" design is therefore preferable. The new
      runtime does also avoid users to pay for things they do not use,
      especially wrt. memory. The unlikely case of nested parallelism is
      supported but costly to make the more likely case use less resources.
      
      The worksharing and reduction implementation have been taken from the
      old runtime and will be rewritten in the future if necessary.
      
      Documentation and debug features are still mostly missing and will be
      added over time.
      
      All external symbols start with `__kmpc` for legacy reasons but should
      be renamed once we switch over to a single runtime. All internal symbols
      are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
      clashes with user symbols.
      
      Differential Revision: https://reviews.llvm.org/D106803
      67ab875f
  3. Jul 25, 2021
  4. Jul 22, 2021
  5. Jul 18, 2021
  6. Apr 30, 2021
    • Michael Kruse's avatar
      [OpenMP][CMake] Use in-project clang as CUDA->IR compiler. · 7308862f
      Michael Kruse authored
      If available, use the clang that is already built in the same project as
      CUDA compiler unless another executable is explicitly defined. This also
      ensures the generated deviceRTL IR will be consistent with the version
      of Clang.
      
      This patch is required to reliably test OpenMP offloading in a buildbot
      without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a
      separately installed clang on the worker that will eventually become
      outdated.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D101265
      7308862f
  7. Mar 12, 2021
  8. Mar 08, 2021
    • Shilei Tian's avatar
      [OpenMP][Clang][NVPTX] Only build one bitcode library for each SM · c41ae246
      Shilei Tian authored
      In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
      NVPTX target. We don't need to have macros in source code to select right functions
      based on CUDA version. we don't need to compile multiple bitcode libraries of
      different CUDA versions for each SM. We don't need to worry about future
      compatibility with newer CUDA version.
      
      `-target-feature +ptx61` is used in this patch, which corresponds to the highest
      PTX version that CUDA 9.2 can support.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97198
      c41ae246
  9. Feb 23, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported... · f6c2984a
      Shilei Tian authored
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM
      
      `ptx71` is not supported in release version of LLVM yet. As a result,
      the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
      in D97004. Since the support in D97004 is just a WA for releease, and we'll not
      use it in the near future, using `ptx70` for CUDA 11 is feasible.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97195
      f6c2984a
  10. Feb 19, 2021
    • Joel E. Denny's avatar
      [OpenMP] Fix nvptx CUDA_VERSION conversion · ef8b3b5f
      Joel E. Denny authored
      As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
      in the following two tests:
      
      - openmp/libomptarget/test/mapping/lambda_mapping.cpp
      - openmp/libomptarget/test/offloading/bug49021.cpp
      
      The error looks like:
      
      ```
      ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
      ```
      
      The problem is that our cmake script converts CUDA version strings
      incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
      `getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
      `openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
      inadvertently enables `activemask` because it apparently becomes
      available in 9.2.  This patch fixes the conversion.
      
      This patch does not fix the other two tests in PR#49250.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D97012
      ef8b3b5f
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 · 89827fd4
      Shilei Tian authored
      CUDA 11.2 and CUDA 11.1 are all available now.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97004
      89827fd4
  11. Jan 30, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Refined CMake logic to choose compute capabilites · 26d38f6d
      Shilei Tian authored
      This patch refines the logic to choose compute capabilites via the
      environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
      following values (all case insensitive):
      - "all": Build `deviceRTLs` for all supported compute capabilites;
      - "auto": Only build for the compute capability auto detected. Note that this
        requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
      - "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.
      
      If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
      it to `all`.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D95687
      26d38f6d
  12. Jan 28, 2021
  13. Jan 27, 2021
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system · fb12df4a
      Shilei Tian authored
      D95466 dropped CUDA to build NVPTX deviceRTL and enabled it by default.
      However, the building requires some libraries that are not available on non-CUDA
      system by default, which could break the compilation. This patch disabled the
      build by default. It can be enabled with `LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON`.
      
      Reviewed By: kparzysz
      
      Differential Revision: https://reviews.llvm.org/D95556
      fb12df4a
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` · e7535f8f
      Shilei Tian authored
      With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
      many CMake code in the project is useless. This patch cleans up unnecessary code
      and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
      still being used however to determine whether we need to involve the tests. Auto
      detection of compute capability is enabled by default and can be disabled by
      setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
      If auto detection is enabled, and CUDA is also valid, it will only build the
      bitcode library for the detected version; otherwise, all variants supported will
      be generated. One drawback of this patch is, we now generate 96 variants of
      bitcode library, and totally 1485 files to be built with a clean build on a
      non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
      disable building NVPTX `deviceRTLs`.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D95466
      e7535f8f
  14. Jan 26, 2021
    • Shilei Tian's avatar
      [OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language · 7c03f7d7
      Shilei Tian authored
      From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
      Here're a list of changes in this patch.
      1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
      2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
      3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
      4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.
      
      With this change, there are also multiple features to be expected in the near future:
      1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
      2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
      3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
      4. (Maybe more...)
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D94745
      7c03f7d7
  15. Jan 14, 2021
    • Shilei Tian's avatar
      [OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX · 64e9e9ae
      Shilei Tian authored
      The comment said CUDA 9 header files use the `nv_weak` attribute which
      `clang` is not yet prepared to handle. It's three years ago and now things have
      changed. Based on my test, removing the definition doesn't have any problem on
      my machine with CUDA 11.1 installed.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D94700
      64e9e9ae
    • Shilei Tian's avatar
      [OpenMP] Drop the static library libomptarget-nvptx · 763c1f99
      Shilei Tian authored
      For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
      built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
      Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
      in the second run on the program that compiles the target part. Then the generated
      PTX file will be fed to `ptxas` to generate the object file, and finally the driver
      invokes `nvlink` to generate the binary, where the static library will be appened
      to `nvlink`.
      
      One question is, why do we need two libraries? The only difference is, the static
      library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
      they were implemented in this way, but per D94565, there is no issue if we also
      include the file into the bitcode library. Therefore, we can safely drop the
      static library.
      
      This patch is about the change in OpenMP. The driver will be updated as well if
      this patch is accepted.
      
      Reviewed By: jdoerfert, JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D94573
      763c1f99
  16. Jan 13, 2021
  17. Oct 08, 2020
    • Joseph Huber's avatar
      [OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default · d5644099
      Joseph Huber authored
      Summary:
      This patch changes the CMake files for Clang and Libomptarget to query the
      system for its supported CUDA architecture. This makes it much easier for the
      user to build optimal code without needing to set the flags manually. This
      relies on the now deprecated FindCUDA method in CMake, but full support for
      architecture detection is only availible in CMake >3.18
      
      Reviewers: jdoerfert ye-luo
      
      Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl
      
      Tags: #clang #OpenMP
      
      Differential Revision: https://reviews.llvm.org/D87946
      d5644099
  18. Sep 24, 2020
  19. Dec 18, 2019
  20. Dec 17, 2019
    • JonChesterfield's avatar
      [libomptarget][nfc] Move three files under common, build them for amdgcn · 0c83f8cc
      JonChesterfield authored
      Summary:
      [libomptarget][nfc] Move three files under common, build them for amdgcn
      
      Change to reduction.cu to remove two dead includes, otherwise no code change.
      
      Reviewers: jdoerfert, ABataev, grokos
      
      Reviewed By: jdoerfert
      
      Subscribers: jvesely, mgorny, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D71601
      0c83f8cc
    • JonChesterfield's avatar
      [libomptarget][nfc] Move omp locks under target_impl · 3d3e4076
      JonChesterfield authored
      Summary:
      [libomptarget][nfc] Move omp locks under target_impl
      
      These are likely to be target specific, even down to the lock_t which is
      correspondingly moved out of interface.h. The alternative is to include
      interface.h in target_impl which substantiatially increases the scope of
      those symbols.
      
      The current nvptx implementation deadlocks on amdgcn. The preferred
      implementation for that arch is still under discussion - this change
      leaves declarations in target_impl.
      
      The functions could be inline for nvptx. I'd prefer to keep the internals
      hidden in the target_impl translation unit, but will add the (possibly renamed)
      macros to target_impl.h if preferred.
      
      Reviewers: ABataev, jdoerfert, grokos
      
      Reviewed By: jdoerfert
      
      Subscribers: jvesely, mgorny, jfb, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D71574
      3d3e4076
  21. Dec 06, 2019
  22. Dec 05, 2019
    • Jon Chesterfield's avatar
      [libomptarget][nfc] Move omptarget-nvptx under common · d0b9ed5c
      Jon Chesterfield authored
      Summary:
      [libomptarget][nfc] Move omptarget-nvptx under common
      
      Almost all files depend on require omptarget-nvptx, which no longer
      contains any obviously architecture dependent code. Moving it under
      common unblocks task/loop for amdgcn, and allows moving other code.
      
      At some point there should probably be a widespread symbol renaming to
      replace the nvptx string. I'd prefer to get things working first.
      
      Building this (and task.cu, loop.cu) without a cuda library requires
      some more refactoring, e.g. wrap threadfence(), use DEVICE macro more
      consistently. Patches for that are orthogonal and will be posted shortly.
      
      Reviewers: jdoerfert, ABataev, grokos
      
      Reviewed By: ABataev
      
      Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D71073
      d0b9ed5c
  23. Nov 18, 2019
    • Jon Chesterfield's avatar
      [libomptarget][nfc] Move some source into common from nvptx · 5a4a05d7
      Jon Chesterfield authored
      Summary:
      [libomptarget][nfc] Move some source into common from nvptx
      
      Moves some source that compiles cleanly under amdgcn into a common subdirectory
      Includes some non-trivial files and some headers. Keeps the cuda file extension.
      
      The build systems for different architectures seem unlikely to have much in
      common. The idea is therefore to set include paths such that files under
      common/src compile as if they were under arch/src as the mechanism for sharing.
      In particular, files under common/src need to be able to include target_impl.h.
      
      The corresponding -Icommon is left out in favour of explicit includes on the
      basis that the it makes it clearer which files under common are used by a given
      architecture.
      
      Reviewers: jdoerfert, ABataev, grokos
      
      Reviewed By: ABataev
      
      Subscribers: jfb, mgorny, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D70328
      5a4a05d7
  24. Nov 13, 2019
    • JonChesterfield's avatar
      [libomptarget] Move supporti.h to support.cu · fd9fa999
      JonChesterfield authored
      Summary:
      [libomptarget] Move supporti.h to support.cu
      Reimplementation of D69652, without the unity build and refactors.
      Will need a clean build of libomptarget as the cmakelists changed.
      
      Reviewers: ABataev, jdoerfert
      
      Reviewed By: jdoerfert
      
      Subscribers: mgorny, jfb, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D70131
      fd9fa999
  25. Nov 06, 2019
    • Jon Chesterfield's avatar
      [libomptarget] Revert all improvements to support · 7cea0cea
      Jon Chesterfield authored
      Summary:
      [libomptarget] Revert all improvements to support
      
      The change to unity build for nvcc has broken the build for some developers.
      This patch reverts to a known-working state.
      
      There has been some confusion over exactly how the build broke. I think we
      have reached a common understanding that the disappearing symbols are from
      the bitcode library built by clang. The static archive built by nvcc may show the
      same problem. Some of the confusion arose from building the deviceRTL twice
      and using one or the other library based on various environmental factors.
      
      I'm pretty sure the problem is clang expanding `__forceinline__` into both `__inline__`
      and `attribute(("always_inline"))`. The `__inline__` attribute resolves to linkonce_odr
      which is not safe for exporting symbols from translation units.
      
      "always_inline" is the desired semantic for small functions defined in one translation
      unit that are intended to be inlined at link time. "inline" is not.
      
      This therefore reintroduces the dependency hazard of supporti.h and some code
      duplication, and blocks progress separating deviceRTL into reusable components.
      
      See also D69857, D69859 for attempts at a fix instead of a revert.
      
      Reviewers: ABataev, jdoerfert, grokos, ikitayama, tianshilei1992
      
      Reviewed By: ABataev
      
      Subscribers: mgorny, jfb, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D69885
      7cea0cea
  26. Oct 31, 2019
    • JonChesterfield's avatar
      [nfc][libomptarget] Reorganise support header · 764c8420
      JonChesterfield authored
      Summary:
      [nfc][libomptarget] Reorganise support header
      
      All functions defined in support implementation are now declared in support.h
      Reordered functions in support implementation to match the sequence in support.h
      Added include guards to support.h
      Added #include interface to support.h to provide kmp_Ident declaration
      Move supporti.h to support.cu and s/INLINE/EXTERN/g
      Add remaining includes to support.cu
      
      A minor side effect is to change the name mangling of the support functions to
      extern "C". If this matters another macro along the lines of INLINE/EXTERN
      can be added - perhaps DEVICE as that's the obvious implementation.
      
      Reviewers: jdoerfert, ABataev, grokos
      
      Reviewed By: jdoerfert
      
      Subscribers: mgorny, jfb, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D69652
      764c8420
    • Jon Chesterfield's avatar
      [libomptarget] Change nvcc compilation to use a unity build · e9f9dfab
      Jon Chesterfield authored
      Summary:
      [libomptarget] Change nvcc compilation to use a unity build
      
      This allows nvcc to inline functions between what would otherwise be distinct
      translation units, which in turn removes any runtime cost from implementing
      functions in source files (as opposed to inline in headers).
      
      This will then allow the circular dependencies in deviceRTL to be readily
      broken and individual components more easily shared between architectures.
      
      Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers
      
      Reviewed By: jdoerfert
      
      Subscribers: mgorny, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D69489
      e9f9dfab
  27. Oct 15, 2019
    • Jon Chesterfield's avatar
      [libomptarget][nfc] Make interface.h target independent · d69d1aa1
      Jon Chesterfield authored
      Summary:
      [libomptarget][nfc] Make interface.h target independent
      
      Move interface.h under a top level include directory.
      Remove #includes to avoid the interface depending on the implementation.
      
      Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy
      
      Reviewed By: jdoerfert
      
      Subscribers: mgorny, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D68615
      
      llvm-svn: 374919
      d69d1aa1
  28. Jan 19, 2019
    • Chandler Carruth's avatar
      Update more file headers across all of the LLVM projects in the monorepo · 57b08b09
      Chandler Carruth authored
      to reflect the new license. These used slightly different spellings that
      defeated my regular expressions.
      
      We understand that people may be surprised that we're moving the header
      entirely to discuss the new license. We checked this carefully with the
      Foundation's lawyer and we believe this is the correct approach.
      
      Essentially, all code in the project is now made available by the LLVM
      project under our new license, so you will see that the license headers
      include that license only. Some of our contributors have contributed
      code under our old license, and accordingly, we have retained a copy of
      our old license notice in the top-level files in each project and
      repository.
      
      llvm-svn: 351648
      57b08b09
  29. Oct 01, 2018
  30. Sep 28, 2018
    • Jonas Hahnfeld's avatar
      [libomptarget-nvptx] Add testing infrastructure · 122dbb5d
      Jonas Hahnfeld authored
      This patch also introduces testing for libomptarget-nvptx
      which has been missing until now. I propose to add tests for
      all bugs that are fixed in the future.
      The target check-libomptarget-nvptx is not run by default because
       - we can't determine if there is a GPU plugged into the system.
       - it will require the latest Clang compiler. Keeping compatibility
         with older releases would prevent testing newer code generation
         developed in trunk.
      
      Differential Revision: https://reviews.llvm.org/D51687
      
      llvm-svn: 343324
      122dbb5d
  31. May 25, 2018
  32. May 16, 2018
  33. Apr 20, 2018
  34. Apr 09, 2018
  35. Apr 03, 2018
Loading