Skip to content
  1. Jan 13, 2022
  2. Dec 27, 2021
    • Joseph Huber's avatar
      [OpenMP][FIX] Change globalization alignment to 16 · 7cdaa5a9
      Joseph Huber authored
      This patch changes the default aligntment from 8 to 16, and encodes this
      information in the `__kmpc_alloc_shared` runtime call to communicate it
      to the HeapToStack pass. The previous alignment of 8 was not sufficient
      for the maximum size of primitive types on 64-bit systems, and needs to
      be increaesd. This reduces the amount of space availible in the data
      sharing stack, so this implementation will need to be improved later to
      include the alignment requirements in the allocation call, and use it
      properly in the data sharing stack in the runtime.
      
      Depends on D115888
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D115971
      7cdaa5a9
  3. Dec 09, 2021
    • Joseph Huber's avatar
      [OpenMP][FIX] Pass the num_threads value directly to parallel_51 · bc9c4d72
      Joseph Huber authored
      The problem with the old scheme is that we would need to keep track of
      the "next region" and reset the num_threads value after it. The new RT
      doesn't do it and an assertion is triggered. The old RT doesn't do it
      either, I haven't tested it but I assume a num_threads clause might
      impact multiple parallel regions "accidentally". Further, in SPMD mode
      num_threads was simply ignored, for some reason beyond me.
      
      In any case, parallel_51 is designed to take the clause value directly,
      so let's do that instead.
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D113623
      bc9c4d72
  4. Nov 30, 2021
  5. Nov 16, 2021
    • Joseph Huber's avatar
      [OpenMP] Fix initializer not working on AMDGPU · 374cd0fb
      Joseph Huber authored
      The RAII class used for debugging RTL entry used a shared variable to
      keep track of the current depth. This used a global initializer, which
      isn't supported on AMDGPU. This patch removes the initializer and
      instead sets it to zero when the state is initialized in the runtime.
      
      Reviewed By: jdoerfert, JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D113963
      374cd0fb
  6. Nov 12, 2021
  7. Nov 10, 2021
    • Jon Chesterfield's avatar
      [OpenMP] Lower printf to __llvm_omp_vprintf · 27177b82
      Jon Chesterfield authored
      Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
      which takes the same const char*, void* arguments as cuda vprintf and also
      passes the size of the void* alloca which will be needed by a non-stub
      implementation of `__llvm_omp_vprintf` for amdgpu.
      
      This removes the amdgpu link error on any printf in a target region in favour
      of silently compiling code that doesn't print anything to stdout.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D112680
      27177b82
  8. Nov 09, 2021
  9. Nov 08, 2021
  10. Nov 04, 2021
  11. Nov 03, 2021
  12. Oct 30, 2021
    • Shilei Tian's avatar
      [OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3 · 025f5492
      Shilei Tian authored
      The synchronization at the end of parallel region cannot make sure all threads
      exit the scope. As a result, the assertions right after it might be hit, and
      further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
      not hold as well. We either add a synchronization right after the parallel region,
      or remove the assertions and assuptions. Here we choose the first one as those
      assertions and assumptions can help optimizations.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D112861
      025f5492
  13. Oct 29, 2021
  14. Oct 28, 2021
  15. Oct 26, 2021
  16. Oct 21, 2021
    • Jon Chesterfield's avatar
      [libomptarget][DeviceRTL] Generalise and simplify cmakelists · a602c2b5
      Jon Chesterfield authored
      Step towards building the DeviceRTL for amdgpu.
      
      Mostly replaces cuda-specific toolchain finding logic with the
      generic logic currently found in the amdgpu deviceRTL cmake. Also
      deletes dead code and changes the default to build on systems
      without cuda installed, as the library doesn't use cuda and the
      amdgpu-only systems generally won't have cuda installed.
      
      Reviewed By: Meinersbur
      
      Differential Revision: https://reviews.llvm.org/D111983
      a602c2b5
  17. Oct 19, 2021
  18. Oct 09, 2021
  19. Oct 08, 2021
    • Joseph Huber's avatar
      [Libomptarget] Add an external interface to dynamic shared memory · 208f9005
      Joseph Huber authored
      This patch adds an external interface to access the dynamic shared
      memory buffer in the device runtime. The function introduced is
      ``llvm_omp_get_dynamic_shared``. This includes a host-side
      definition that only returns a null pointer so that it can be used when
      host-fallback is enabled without crashing. Support for dynamic shared
      memory was also ported to the old device runtime.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D110957
      208f9005
Loading