Skip to content
  1. Jul 22, 2021
    • Jon Chesterfield's avatar
      [libomptarget][amdgpu] Implement dlopen of libhsa · 1a965706
      Jon Chesterfield authored
      AMDGPU plugin equivalent of D95155, build without HSA installed locally
      
      Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that
      exposes the same symbols that the plugin presently uses from hsa. The object
      file contains dlopen of hsa and cached dlsym calls. Also provides header files
      corresponding to the subset that is used.
      
      This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off.
      That allows developers to build against the dlopen/dlsym implementation, e.g.
      while testing this mode.
      
      Enabling by default will cause this plugin to build on a wider variety of
      machines than it does at present so may break some CI builds. That risk can
      be minimised by reviewing the header dependencies of the library and ensuring
      it doesn't use any libraries that are not already used by libomptarget.
      
      Separating the implementation from enabling by default in case the latter needs
      to be rolled back after wider CI results.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106559
      1a965706
    • Jon Chesterfield's avatar
      [libomptarget][nfc] Improve static assert message in dlwrap · 6e9cd3e9
      Jon Chesterfield authored
      Revision of D102858. Raise dlwrap arity argument to template argument
      so the correct value is given in the error message. E.g. '2 == 1' instead of
      '2 == trait<>::nargs'.
      
      Arity higher than it should be:
      Before diff
      ```
      $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error:
            static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs'
            "Arity Error"
      DLWRAP_INTERNAL(cuInit, 2);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
      ...
      $/include/dlwrap.h:166:3: note: expanded from macro
            'DLWRAP_COMMON'
        static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error");      \
      ```
      
      After diff
      In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
      ```
      $/include/dlwrap.h:131:3: error: static_assert failed due to
            requirement '2UL == 1UL' "Arity Error"
        static_assert(Requested == Required, "Arity Error");
        ^             ~~~~~~~~~~~~~~~~~~~~~
      $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
            instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested
            here
      DLWRAP_INTERNAL(cuInit, 2);
      ```
      
      Arity lower than it should be:
      Before diff
      ```
      $/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no
            matching function for call to 'dlwrap_cuInit'
        return dlwrap_cuInit(X);
               ^~~~~~~~~~~~~
      $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate
            function not viable: requires 0 arguments, but 1 was provided
      DLWRAP_INTERNAL(cuInit, 0);
      ```
      
      After diff
      In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16:
      ```
      $/include/dlwrap.h:131:3: error: static_assert failed due to
            requirement '0UL == 1UL' "Arity Error"
        static_assert(Requested == Required, "Arity Error");
        ^             ~~~~~~~~~~~~~~~~~~~~~
      $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in
            instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested
            here
      DLWRAP_INTERNAL(cuInit, 0);
      ```
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106543
      6e9cd3e9
    • Joseph Huber's avatar
      [OpenMP] Fix warnings for uninitialized block counts · a158d366
      Joseph Huber authored
      Summary:
      Fixes some warning given for uninitialized block counts if the exection mode is
      not recognized. This shouldn't happen in practice because the execution mode is
      checked when it's read from the device.
      a158d366
    • Jon Chesterfield's avatar
      [libomptarget][amdgpu][nfc] Drop dead signal pool setup · dc1f6f8b
      Jon Chesterfield authored
      This class is instantiated once in rtl.cpp before hsa_init is
      called. The hsa_signal_create call therefore fails leaving the pool empty.
      
      This signal pool is a legacy from ATMI where it was constructed after hsa_init.
      Moving the state into the rtl.cpp global class disabled the initial populating
      of the pool without noticeably changing performance. Just rechecked with a fix
      that allocates the signals after hsa_init and that also doesn't noticeably
      change performance.
      
      This patch therefore drops the initialisation. Only change from main is to
      drop a DEBUG_PRINT statement that would say the pool initial size is zero.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106515
      dc1f6f8b
    • Joseph Huber's avatar
      [OpenMP] Add an option to disable function internalization · 4a668604
      Joseph Huber authored
      Function internalization can sometimes occur in situations where we want to
      keep the call sites intact. This patch adds an option to disable function
      internalization and prevents the device runtime from being internalized while
      creating the bitcode library.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106438
      4a668604
    • Joseph Huber's avatar
      [Libomptarget] Introduce new main thread ID runtime function · 1684012a
      Joseph Huber authored
      This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old
      comparison into its own runtime function. The purpose of this is so we can fold
      this part independently, so when both this and `is_spmd_mode` are folded the
      final function will be folded as well.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106437
      1684012a
    • Joseph Huber's avatar
      [OpenMP] Add new execution mode for SPMD execution with Generic semantics · 7d576392
      Joseph Huber authored
      Qualified kernels can be transformed from generic-mode to SPMD mode using an
      optimization in OpenMPOpt. This patch introduces a new execution mode to
      indicate kernels that have been transformed from generic-mode to SPMD-mode.
      These kernels have SPMD-mode execution, but need generic-mode semantics for
      scheduling the blocks and threads. Without this far too few blocks will be
      scheduled for a generic region as SPMD mode expects the trip count to be
      divided by the number of threads.
      
      Reviewed By: ggeorgakoudis
      
      Differential Revision: https://reviews.llvm.org/D106460
      7d576392
    • Joseph Huber's avatar
      [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size · 754eb1c2
      Joseph Huber authored
      This patch changes `__kmpc_free_shared` to take an additional argument
      corresponding to the associated allocation's size. This makes it easier to
      implement the allocator in the runtime.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106496
      754eb1c2
  2. Jul 21, 2021
  3. Jul 20, 2021
  4. Jul 19, 2021
  5. Jul 18, 2021
    • Shilei Tian's avatar
      [OpenMP][Offloading] Add a CMake argument LIBOMPTARGET_LIT_ARGS to control... · 954711ed
      Shilei Tian authored
      [OpenMP][Offloading] Add a CMake argument LIBOMPTARGET_LIT_ARGS to control behavior of libomptarget lit test
      
      By default, `lit` uses all threads to invoke tests, which  can easily cause out
      of memory on GPUs because most of OpenMP offloading test usually take about 1GB
      GPU memory, but a typical GPU only has 4-8GB memory. This patch introduce a
      CMake argument `LIBOMPTARGET_LIT_ARGS` to allow users to control the behavior of
      `libomptarget` tests, similar to `LLVM_LIT_ARGS`.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D106236
      954711ed
    • Shilei Tian's avatar
      [OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode · 4357cfc7
      Shilei Tian authored
      Currently when we compile the project in debug mode, `-g` will not be added to
      compilation flag. The bc files generated in different mode are of different size.
      When using GPU debuggers like `cuda-gdb`, it is expected to provide more info
      with a debug version of bc lib.
      
      Reviewed By: JonChesterfield
      
      Differential Revision: https://reviews.llvm.org/D106229
      4357cfc7
  6. Jul 17, 2021
    • Giorgis Georgakoudis's avatar
      [OpenMP] Codegen aggregate for outlined function captures · e9c7291c
      Giorgis Georgakoudis authored
      Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D102107
      e9c7291c
  7. Jul 16, 2021
    • Joseph Huber's avatar
      [OpenMP] Add remark documentation to the OpenMP webpage · 16164079
      Joseph Huber authored
      This patch begins adding documentation for each remark emitted by
      `openmp-opt`. This builds on the IDs introduced in D105939 so that users
      can more easily identify each remark in the webpage.
      
      Depends on D105939.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106018
      16164079
    • Shilei Tian's avatar
      [NFC][OpenMP][Offloading] Replaced explicit parallel level computation with... · 97c8f60b
      Shilei Tian authored
      [NFC][OpenMP][Offloading] Replaced explicit parallel level computation with function `__kmpc_parallel_level`
      
      There are two places in current deviceRTLs where it computes parallel level explicitly,
      which is basically the functionality of `__kmpc_parallel_level`. Starting from
      D105787, we plan to introduce a series of function call folding based on information
      that can be deducted during compilation time. Computation of parallel level is
      the next target. This patch makes steps for the optimization.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D105955
      97c8f60b
  8. Jul 15, 2021
  9. Jul 13, 2021
    • Peyton, Jonathan L's avatar
      424f14f0
    • Peyton, Jonathan L's avatar
      [OpenMP][NFC] Change comment style to eliminate warnings from GCC · 405eefe4
      Peyton, Jonathan L authored
      Standalone build for OpenMP runtime using GCC is giving -Wcomment
      warnings where a backslash newline is encountered in the // style
      comment. This switches the // style for /* style to silence the
      warnings.
      405eefe4
    • Hansang Bae's avatar
      [OpenMP] Minor improvement in task allocation · db635a28
      Hansang Bae authored
      This patch includes a few changes to improve task allocation
      performance slightly. These changes are enough to restore performance
      drop observed after introducing hidden helper.
      
      Differential Revision: https://reviews.llvm.org/D105715
      db635a28
    • Roman Lebedev's avatar
      [libomp] ompd_init(): fix heap-buffer-overflow when constructing libompd.so path · 4709d9d5
      Roman Lebedev authored
      There is no guarantee that the space allocated in `libname`
      is enough to accomodate the whole `dl_info.dli_fname`,
      because it could e.g. have an suffix  - `.5`,
      and that highlights another problem - what it should do about suffxies,
      and should it do anything to resolve the symlinks before changing the filename?
      
      ```
      $ LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"  ./src/utilities/rstest/rstest -c /tmp/f49137920.NEF
      dl_info.dli_fname "/usr/local/lib/libomp.so.5"
      strlen(dl_info.dli_fname) 26
      lib_path_length 14
      lib_path_length + 12 26
      =================================================================
      ==30949==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000002a at pc 0x000000548648 bp 0x7ffdfa0aa780 sp 0x7ffdfa0a9f40
      WRITE of size 27 at 0x60300000002a thread T0
          #0 0x548647 in strcpy (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647)
          #1 0x7fb9e3e3d234 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:102:5
          #2 0x7fb9e3dcb446 in __kmp_do_serial_initialize() /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6742:3
          #3 0x7fb9e3dcb40b in __kmp_get_global_thread_id_reg /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:251:7
          #4 0x59e035 in main /home/lebedevri/rawspeed/build-Clang-SANITIZE/../src/utilities/rstest/rstest.cpp:491
          #5 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
          #6 0x4df449 in _start (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x4df449)
      
      0x60300000002a is located 0 bytes to the right of 26-byte region [0x603000000010,0x60300000002a)
      allocated by thread T0 here:
          #0 0x55cc5d in malloc (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x55cc5d)
          #1 0x7fb9e3e3d224 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:101:17
          #2 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
      
      SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) in strcpy
      Shadow bytes around the buggy address:
        0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      =>0x0c067fff8000: fa fa 00 00 00[02]fa fa fa fa fa fa fa fa fa fa
        0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
        0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      Shadow byte legend (one shadow byte represents 8 application bytes):
        Addressable:           00
        Partially addressable: 01 02 03 04 05 06 07
        Heap left redzone:       fa
        Freed heap region:       fd
        Stack left redzone:      f1
        Stack mid redzone:       f2
        Stack right redzone:     f3
        Stack after return:      f5
        Stack use after scope:   f8
        Global redzone:          f9
        Global init order:       f6
        Poisoned by user:        f7
        Container overflow:      fc
        Array cookie:            ac
        Intra object redzone:    bb
        ASan internal:           fe
        Left alloca redzone:     ca
        Right alloca redzone:    cb
      ==30949==ABORTING
      Aborted
      ```
      4709d9d5
    • George Rokos's avatar
      [libomptarget] Update device pointer only if needed · bb0166dc
      George Rokos authored
      Currently, libomptarget will always perform a host-to-device memory transfer in
      order to update the device pointer of a PTR_AND_OBJ entry. This is not always
      necessary because the device pointer may have been set to the correct pointee
      address already, so we can eliminate the redundant memory transfer.
      bb0166dc
    • Jon Chesterfield's avatar
      [libomptarget][devicertl] Remove branches around setting parallelLevel · b6b53ffe
      Jon Chesterfield authored
      Simplifies control flow to allow store/load forwarding
      
      This change folds two basic blocks into one, leaving a single store to parallelLevel.
      This is a step towards spmd kernels with sufficiently aggressive inlining folding
      the loads from parallelLevel and thus discarding the nested parallel handling
      when it is unused.
      
      Transform:
      ```
      int threadId = GetThreadIdInBlock();
      if (threadId == 0) {
        parallelLevel[0] = expr;
      } else if (GetLaneId() == 0) {
        parallelLevel[GetWarpId()] = expr;
      }
      // =>
      if (GetLaneId() == 0) {
        parallelLevel[GetWarpId()] = expr;
      }
      // because
      unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);}
      // so whenever threadId == 0, GetLaneId() is also 0.
      ```
      
      That replaces a store in two distinct basic blocks with as single store.
      
      A more aggressive follow up is possible if the threads in the warp/wave
      race to write the same value to the same address. This is not done as
      part of this change.
      
      ```
      if (GetLaneId() == 0) {
        parallelLevel[GetWarpId()] = expr;
      }
      // =>
      parallelLevel[GetWarpId()] = expr;
      // because
      unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
      // so GetWarpId will index the same element for every thread in the warp
      // and, because expr is lane-invariant in this case, every lane stores the
      // same value to this unique address
      ```
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D105699
      b6b53ffe
  10. Jul 12, 2021
    • Joachim Protze's avatar
      [OpenMP] Remove TSAN annotations from libomp · 681055ea
      Joachim Protze authored
      The annotations in libomp were never built by default. The annotations are
      also superseded by the annotations which the OMPT tool libarcher.so provides.
      With respect to libarcher, libomp behaves as if libarcher would be the last
      element of OMP_TOOL_LIBARARIES. I.e., if no other OMPT tool gets active,
      libarcher will check if an OpenMP application is built with TSan.
      
      Since libarcher gets loaded by default, enabling LIBOMP_TSAN_SUPPORT would
      result in redundant annotations for TSan, which slightly differ in details
      and coverage (e.g. task dependencies are not handled well by the annotations
      in libomp).
      
      This patch removes all TSan annotations from the OpenMP runtime code.
      
      Differential Revision: https://reviews.llvm.org/D103767
      681055ea
    • Joachim Protze's avatar
      [OpenMP][OMPT] Fix compile-time assertion in ompt-multiplex.h · fedbff75
      Joachim Protze authored
      The compile-time assertion is supposed to prevent double-free caused by
      unexpected combination of preprocessor defines passed by an OMPT tool.
      The current defines are not used, so this patch replaces the check with
      macros actually used in ompt-multiplex.h
      
      Reported by: Semih Burak
      
      Differential Revision: https://reviews.llvm.org/D104633
      fedbff75
    • Johannes Doerfert's avatar
      [OpenMP] Create and use `__kmpc_is_generic_main_thread` · a7b7b5df
      Johannes Doerfert authored
      In order to fold calls based on high-level knowledge and control flow
      tracking it helps to expose the information as a runtime call. The
      logic: `!SPMD && getTID() == getMasterTID()` was used in various places
      and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of
      this rewrite we replaced eager computation of arguments with on-demand
      computation, especially helpful if the calls can be folded and arguments
      don't need to be computed consequently.
      
      Differential Revision: https://reviews.llvm.org/D105768
      a7b7b5df
    • Johannes Doerfert's avatar
      [OpenMP] Simplify variable sharing and increase shared memory size · 1ab1f04a
      Johannes Doerfert authored
      In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM
      (=64) variables are communicated in dedicated shared memory instead. The
      simplification does avoid the need for an "init" and requires "deinit"
      only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM
      variables.
      
      Differential Revision: https://reviews.llvm.org/D105767
      1ab1f04a
  11. Jul 11, 2021
  12. Jul 10, 2021
Loading