Skip to content
  1. Jan 13, 2022
  2. Jan 10, 2022
  3. Jan 06, 2022
    • Shilei Tian's avatar
      [OpenMP][Offloading] Fixed a crash caused by dereferencing nullptr · aab62aab
      Shilei Tian authored
      In function `DeviceTy::getTargetPointer`, `Entry` could be `nullptr` because of
      zero length array section. We need to check if it is a valid iterator before
      using it.
      
      Reviewed By: ronlieb
      
      Differential Revision: https://reviews.llvm.org/D116716
      aab62aab
    • Shilei Tian's avatar
      [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement · 9584c6fa
      Shilei Tian authored
      The async data movement can cause data race if the target supports it.
      Details can be found in [1]. This patch tries to fix this problem by attaching
      an event to the entry of data mapping table. Here are the details.
      
      For each issued data movement, a new event is generated and returned to `libomptarget`
      by calling `createEvent`. The event will be attached to the corresponding mapping table
      entry.
      
      For each data mapping lookup, if there is no need for a data movement, the
      attached event has to be inserted into the queue to gaurantee that all following
      operations in the queue can only be executed if the event is fulfilled.
      
      This design is to avoid synchronization on host side.
      
      Note that we are using CUDA terminolofy here. Similar mechanism is assumped to
      be supported by another targets. Even if the target doesn't support it, it can
      be easily implemented in the following fall back way:
      - `Event` can be any kind of flag that has at least two status, 0 and 1.
      - `waitEvent` can directly busy loop if `Event` is still 0.
      
      My local test shows that `bug49334.cpp` can pass.
      
      Reference:
      [1] https://bugs.llvm.org/show_bug.cgi?id=49940
      
      Reviewed By: grokos, JonChesterfield, ye-luo
      
      Differential Revision: https://reviews.llvm.org/D104418
      9584c6fa
  4. Dec 30, 2021
  5. Dec 29, 2021
  6. Dec 28, 2021
  7. Dec 27, 2021
    • Joseph Huber's avatar
      [OpenMP][FIX] Change globalization alignment to 16 · 7cdaa5a9
      Joseph Huber authored
      This patch changes the default aligntment from 8 to 16, and encodes this
      information in the `__kmpc_alloc_shared` runtime call to communicate it
      to the HeapToStack pass. The previous alignment of 8 was not sufficient
      for the maximum size of primitive types on 64-bit systems, and needs to
      be increaesd. This reduces the amount of space availible in the data
      sharing stack, so this implementation will need to be improved later to
      include the alignment requirements in the allocation call, and use it
      properly in the data sharing stack in the runtime.
      
      Depends on D115888
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D115971
      7cdaa5a9
    • Shilei Tian's avatar
      [OpenMP][Plugin] Introduce generic resource pool · a697a0a4
      Shilei Tian authored
      Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now
      we have the need that some resources, such as CUDA stream and event, will be
      hold by `libomptarget`. It is always good to buffer those resources. What's more
      important, given the way that `libomptarget` and plugins are connected, we cannot
      make sure whether plugins are still alive when `libomptarget` is destroyed. That
      leads to an issue that those resouces hold by `libomptarget` might not be
      released correctly. As a result, we need an unified management of all the resources
      that can be shared between `libomptarget` and plugins.
      
      `ResourcePoolTy` is designed to manage the type of resource for one device.
      It has to work with an allocator which is supposed to provide `create` and
      `destroy`. In this way, when the plugin is destroyed, we can make sure that
      all resources allocated from native runtime library will be released correctly,
      no matter whether `libomptarget` starts its destroy.
      
      Reviewed By: ye-luo
      
      Differential Revision: https://reviews.llvm.org/D111954
      a697a0a4
  8. Dec 17, 2021
  9. Dec 15, 2021
  10. Dec 10, 2021
    • Joseph Huber's avatar
      Revert "[OpenMP] Avoid costly shadow map traversals whenever possible" · 8425bde8
      Joseph Huber authored
      This reverts commit 7c8f4e7b.
      Fails a few OpenMP tests, causes a few updates to segfault.
      8425bde8
    • Joseph Huber's avatar
      [OpenMP] Avoid costly shadow map traversals whenever possible · 7c8f4e7b
      Joseph Huber authored
      In the OpenMC app we saw `omp target update` spending an awful lot of
      time in the shadow map traversal without ever doing any update there.
      There are two cases that allow us to avoid the traversal completely.
      The simplest thing is that small updates cannot (reasonably) contain
      an attached pointer part. The other case requires to track in the
      mapping table if an entry might contain an attached pointer as part.
      Given that we have a single location shadow map entries are created,
      the latter is actually fairly easy as well.
      
      Reviewed By: grokos
      
      Differential Revision: https://reviews.llvm.org/D113124
      7c8f4e7b
    • Carlo Bertolli's avatar
      [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous · 28309c54
      Carlo Bertolli authored
      and synchronous kernel launch implementations into a single
      synchronous version.  This patch prepares the plugin for asynchronous
      implementation by:
      
          Privatizing actual kernel launch code (valid in both cases) into
          an anonymous namespace base function (submitted at D115267)
      
          - Separating the control flow path of asynchronous and synchronous
            kernel launch functions** (this diff)
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D115273
      28309c54
    • Joel E. Denny's avatar
      [OpenMP] Add test for custom state machine if have reduction · 51168ce8
      Joel E. Denny authored
      D113602 broke the custom state machine when a reduction is present, as
      revealed by the reproducer this patch adds to the test suite.  In that
      case, openmp-opts changes the return value to undef in
      `__kmpc_get_warp_size` (which the custom state machine calls as of
      D113602).  Later optimizations then optimize away the custom state
      machine code as if all threads are outside the thread block, so the
      target region does not execute.  D114802 fixed that but didn't add a
      reproducer.
      
      This patch also adds a `__OMP_RTL_ATTRS` entry for
      `__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed.  This
      change does not seem to have any impact on the reduction problem.
      
      Reviewed By: JonChesterfield, jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D113824
      51168ce8
  11. Dec 09, 2021
    • Joseph Huber's avatar
      [OpenMP][FIX] Pass the num_threads value directly to parallel_51 · bc9c4d72
      Joseph Huber authored
      The problem with the old scheme is that we would need to keep track of
      the "next region" and reset the num_threads value after it. The new RT
      doesn't do it and an assertion is triggered. The old RT doesn't do it
      either, I haven't tested it but I assume a num_threads clause might
      impact multiple parallel regions "accidentally". Further, in SPMD mode
      num_threads was simply ignored, for some reason beyond me.
      
      In any case, parallel_51 is designed to take the clause value directly,
      so let's do that instead.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D113623
      bc9c4d72
    • Carlo Bertolli's avatar
      [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version · cc8dc5e2
      Carlo Bertolli authored
      Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
      Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D115279
      cc8dc5e2
  12. Dec 08, 2021
  13. Dec 07, 2021
  14. Dec 06, 2021
  15. Dec 04, 2021
  16. Dec 02, 2021
  17. Nov 30, 2021
  18. Nov 29, 2021
    • Matt Arsenault's avatar
      OpenMP: Correctly query location for amdgpu-arch · 935abeaa
      Matt Arsenault authored
      This was trying to figure out the build path for amdgpu-arch, and
      making assumptions about where it is which were not working on my
      system. Whether a standalone build or not, we should have a proper
      imported target to get the location from.
      935abeaa
  19. Nov 23, 2021
    • Jon Chesterfield's avatar
      [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments · ae5348a3
      Jon Chesterfield authored
      OpenMP (compiler) does not currently request any implicit kernel
      arguments. OpenMP (runtime) allocates and initialises a reasonable guess at
      the implicit kernel arguments anyway.
      
      This change makes the plugin check the number of explicit arguments, instead
      of all arguments, and puts the pointer to hostcall buffer in both the current
      location and at the offset expected when implicit arguments are added to the
      metadata by D113538.
      
      This is intended to keep things running while fixing the oversight in the
      compiler (in D113538). Once that patch lands, and a following one marks
      openmp kernels that use printf such that the backend emits an args element
      with the right type (instead of hidden_node), the over-allocation can be
      removed and the hardcoded 8*e+3 offset replaced with one read from the
      .offset of the corresponding metadata element.
      
      Reviewed By: estewart08
      
      Differential Revision: https://reviews.llvm.org/D114274
      ae5348a3
  20. Nov 20, 2021
  21. Nov 19, 2021
Loading