Skip to content
  1. Aug 26, 2020
  2. Aug 19, 2020
  3. Aug 16, 2020
    • Johannes Doerfert's avatar
      [OpenMP][CUDA] Keep one kernel list per device, not globally. · 5272d29e
      Johannes Doerfert authored
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D86039
      5272d29e
    • Johannes Doerfert's avatar
      [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) · aa27cfc1
      Johannes Doerfert authored
      Instead of calling `cuFuncGetAttribute` with
      `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
      we can do it for the first one and cache the result as part of the
      `KernelInfo` struct. The only functional change is that we now expect
      `cuFuncGetAttribute` to succeed and otherwise propagate the error.
      Ignoring any error seems like a slippery slope...
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D86038
      aa27cfc1
    • Jon Chesterfield's avatar
      [libomptarget] Implement host plugin for amdgpu · d0b31295
      Jon Chesterfield authored
      [libomptarget] Implement host plugin for amdgpu
      
      Replacement for D71384. Primary difference is inlining the dependency on atmi
      followed by extensive simplification and bugfixes. This is the latest version
      from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with
      minor patches and a rename from hsa to amdgpu, on the basis that this can't be
      used by other implementations of hsa without additional work.
      
      This will not build unless the ROCM_DIR variable is passed so won't break other
      builds. That variable is used to locate two amdgpu specific libraries that ship
      as part of rocm:
      libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
      libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime
      These libraries build from source. The build scripts in those repos are for
      shared libraries, but can be adapted to statically link both into this plugin.
      
      There are caveats.
      - This works well enough to run various tests and benchmarks, and will be used
        to support the current clang bring up
      - It is adequately thread safe for the above but there will be races remaining
      - It is not stylistically correct for llvm, though has had clang-format run
      - It has suboptimal memory management and locking strategies
      - The debug printing / error handling is inconsistent
      
      I would like to contribute this pretty much as-is and then improve it in-tree.
      This would be advantagous because the aomp12 branch that was in use for fixing
      this codebase has just been joined with the amd internal rocm dev process.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D85742
      d0b31295
  4. Aug 05, 2020
  5. Jul 07, 2020
  6. Jun 30, 2020
  7. Jun 04, 2020
    • Shilei Tian's avatar
      [OpenMP] Improve D2D memcpy to use more efficient driver API · a014fbbc
      Shilei Tian authored
      Summary:
      In current implementation, D2D memcpy is first to copy data back to host and then
      copy from host to device. This is very efficient if the device supports D2D
      memcpy, like CUDA.
      
      In this patch, D2D memcpy will first try to use native supported driver API. If
      it fails, fall back to original way. It is worth noting that D2D memcpy in this
      scenerio contains two ideas:
      - Same devices: this is the D2D memcpy in the CUDA context.
      - Different devices: this is the PeerToPeer memcpy in the CUDA context.
      My implementation merges this two parts. It chooses the best API according to
      the source device and destination device.
      
      Reviewers: jdoerfert, AndreyChurbanov, grokos
      
      Reviewed By: jdoerfert
      
      Subscribers: yaxunl, guansong, sstefan1, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D80649
      a014fbbc
  8. May 12, 2020
  9. May 03, 2020
  10. Apr 13, 2020
    • Shilei Tian's avatar
      [OpenMP] Refined CUDA plugin to put all CUDA operations into class · 4031bb98
      Shilei Tian authored
      Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.
      
      Reviewers: jdoerfert
      
      Reviewed By: jdoerfert
      
      Subscribers: jfb, yaxunl, guansong, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D77951
      4031bb98
  11. Apr 11, 2020
    • Shilei Tian's avatar
      [OpenMP] Introduce stream pool to make sure the correctness of device synchr... · feed674d
      Shilei Tian authored
      ...onization
      
      Summary: In previous patch, in order to optimize performance, we only synchronize once
      for each target region. The syncrhonization is via stream synchronization.
      However, in the extreme situation, the performce might be bad. Consider the
      following case: There is a task that requires transferring huge amount of data
      (call many times of data transferring function). It is scheduled to the first
      stream. And then we have 255 very light tasks scheduled to the remaining 255
      streams (by default we have 256 streams). They can be finished before we do
      synchronization at the end of the first task. Next, we get another very huge
      task. It will be scheduled again to the first stream. Now the first task
      finishes its kernel launch and call stream synchronization. Right now, the
      stream already contains two kernels, and the synchronization will wait until the
      two kernels finish instead of just the first one for the first task.
      
      In this patch, we introduce stream pool. After each synchronization, the stream
      will be returned back to the pool to make sure that for each synchronization,
      only expected operations are waited.
      
      Reviewers: jdoerfert
      
      Reviewed By: jdoerfert
      
      Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D77412
      feed674d
  12. Apr 10, 2020
  13. Apr 07, 2020
    • Shilei Tian's avatar
      [OpenMP] Optimized stream selection by scheduling data mapping for the same... · 32ed2927
      Shilei Tian authored
      [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream
      
      Summary:
      This patch introduces two things for offloading:
      1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info*`, which is a new struct that only has one field, `void *Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future.
      2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation.
      
      Reviewers: jdoerfert, ye-luo
      
      Reviewed By: jdoerfert
      
      Subscribers: yaxunl, lildmh, guansong, openmp-commits
      
      Tags: #openmp
      
      Differential Revision: https://reviews.llvm.org/D77005
      32ed2927
  14. Mar 26, 2020
  15. Mar 20, 2020
  16. Feb 12, 2020
  17. Jan 07, 2020
  18. Nov 28, 2019
  19. Nov 04, 2019
  20. Sep 27, 2019
  21. Aug 05, 2019
  22. Jun 19, 2019
  23. Jun 04, 2019
  24. Jan 19, 2019
    • Chandler Carruth's avatar
      Update more file headers across all of the LLVM projects in the monorepo · 57b08b09
      Chandler Carruth authored
      to reflect the new license. These used slightly different spellings that
      defeated my regular expressions.
      
      We understand that people may be surprised that we're moving the header
      entirely to discuss the new license. We checked this carefully with the
      Foundation's lawyer and we believe this is the correct approach.
      
      Essentially, all code in the project is now made available by the LLVM
      project under our new license, so you will see that the license headers
      include that license only. Some of our contributors have contributed
      code under our old license, and accordingly, we have retained a copy of
      our old license notice in the top-level files in each project and
      repository.
      
      llvm-svn: 351648
      57b08b09
  25. Sep 04, 2018
  26. Jul 18, 2018
    • Joachim Protze's avatar
      [libomptarget] Also support several images for elf · bb869f42
      Joachim Protze authored
      In revision r336569 (D49036) libomptarget support for multiple nvidia images
      has been fixed in case a target region resides inside one or multiple
      libraries and in the compiled application. But the issues is still present
      for elf images.
      This fix will also support multiple images for elf.
      
      Patch by Jannis Klinkenberg
      
      Reviewers: protze.joachim, ABataev, grokos
      
      Reviewed By: protze.joachim, ABataev, grokos
      
      Subscribers: openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D49418
      
      llvm-svn: 337355
      bb869f42
  27. Jul 09, 2018
  28. May 25, 2018
  29. May 04, 2018
  30. Jan 30, 2018
  31. Nov 29, 2017
  32. Aug 14, 2017
  33. Jun 03, 2017
  34. May 10, 2017
    • George Rokos's avatar
      [OpenMP] Changes in the plugin interface · 1546d319
      George Rokos authored
      This patch chagnes the plugin interface so that:
      1) future plugins can take advantage of systems with shared CPU/device storage
      2) instead of using base addresses, target regions are launched by providing target addresseds and base offsets explicitly.
      
      Differential revision: https://reviews.llvm.org/D33028
       
      
      llvm-svn: 302663
      1546d319
  35. Apr 25, 2017
  36. Mar 22, 2017
Loading