Skip to content
  1. Aug 23, 2021
  2. Aug 20, 2021
  3. Aug 19, 2021
  4. Aug 18, 2021
  5. Aug 17, 2021
  6. Aug 16, 2021
  7. Aug 11, 2021
  8. Aug 10, 2021
    • George Rokos's avatar
      [libomptarget][NFC] Fix compilation issue with GCC · df06ec30
      George Rokos authored
      Removed redundant assignment from condition which causes gcc to emit the following error:
      
      error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]
      df06ec30
    • Joel E. Denny's avatar
      [OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember · 2ced1f33
      Joel E. Denny authored
      targetDataEnd and targetDataBegin compute CopyMember/copy differently,
      and I don't see why they should.  This patch eliminates one of those
      differences by making a simplifying NFC change to targetDataEnd.
      
      The change is NFC as follows.  The change only affects the case when
      `!UNIFIED_SHARED_MEMORY || HasCloseModifier`.  In that case, the
      following points are always true:
      
      * The value of CopyMember is relevant later only if DelEntry = false.
      * DelEntry = false only if one of the following is true:
          * IsLast = false.  In this case, it's always true that CopyMember
            = false = IsLast.
          * `MEMBER_OF && !PTR_AND_OBJ` is true.  In this case, CopyMember =
            IsLast.
      * Thus, if CopyMember is relevant, CopyMember = IsLast.
      
      Reviewed By: grokos
      
      Differential Revision: https://reviews.llvm.org/D105990
      2ced1f33
  9. Aug 09, 2021
  10. Aug 08, 2021
  11. Aug 07, 2021
  12. Aug 06, 2021
  13. Aug 04, 2021
  14. Aug 03, 2021
  15. Jul 31, 2021
  16. Jul 30, 2021
  17. Jul 29, 2021
    • Terry Wilmarth's avatar
      [OpenMP] libomp: Add new experimental barrier: two-level distributed barrier · d8e4cb91
      Terry Wilmarth authored
      
      
      Two-level distributed barrier is a new experimental barrier designed
      for Intel hardware that has better performance in some cases than the
      default hyper barrier.
      
      This barrier is designed to handle fine granularity parallelism where
      barriers are used frequently with little compute and memory access
      between barriers. There is no need to use it for codes with few
      barriers and large granularity compute, or memory intensive
      applications, as little difference will be seen between this barrier
      and the default hyper barrier. This barrier is designed to work
      optimally with a fixed number of threads, and has a significant setup
      time, so should NOT be used in situations where the number of threads
      in a team is varied frequently.
      
      The two-level distributed barrier is off by default -- hyper barrier
      is used by default. To use this barrier, you must set all barrier
      patterns to use this type, because it will not work with other barrier
      patterns. Thus, to turn it on, the following settings are required:
      
      KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
      KMP_PLAIN_BARRIER_PATTERN=dist,dist
      KMP_REDUCTION_BARRIER_PATTERN=dist,dist
      
      Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
      and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
      barrier.
      
      Patch fixed for ITTNotify disabled builds and non-x86 builds
      
      Co-authored-by: default avatarJonathan Peyton <jonathan.l.peyton@intel.com>
      Co-authored-by: default avatarVladislav Vinogradov <vlad.vinogradov@intel.com>
      
      Differential Revision: https://reviews.llvm.org/D103121
      d8e4cb91
    • Joachim Protze's avatar
      [OpenMP][Tools][Tests][NFC] Address flaky archer tests · 4acc2f29
      Joachim Protze authored
      Adding more concurrent threads significantly increases the
      chance that the data race can be observed during testing.
      4acc2f29
    • Jon Chesterfield's avatar
      a90da62a
  18. Jul 28, 2021
    • Jose M Monsalve Diaz's avatar
      [OpenMP] Fixing missing variables when CUDA SDK not in system · 88e66fa6
      Jose M Monsalve Diaz authored
      This patch fixes the error reported in D106751. When there is no CUDA SDK
      installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
      variables.
      
      Using @zsrkmyn sugested fix
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106933
      88e66fa6
    • Jose M Monsalve Diaz's avatar
      [OpenMP][Tool] Introducing the `llvm-omp-device-info` tool · 313c5239
      Jose M Monsalve Diaz authored
      This patch introduces the `llvm-omp-device-info` tool, which uses the
      omptarget library and interface to query the device info from all the
      available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`
      
      Since omptarget usually requires a description structure with executable
      kernels, I split the initialization of the RTLs and Devices to be able to
      initialize all possible devices and query each of them.
      
      This revision relies on the patch that introduces the print device info.
      
      A limitation is that the order in which the devices are initialized, and the
      corresponding device ID is not necesarily the one seen by OpenMP.
      
      The changes are as follows:
      1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
      2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
      3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.
      
      Example Output:
      ```
      Device (0):
          print_device_info not implemented
      
      Device (1):
          print_device_info not implemented
      
      Device (2):
          print_device_info not implemented
      
      Device (3):
          print_device_info not implemented
      
      Device (4):
          CUDA Driver Version:                11000
          CUDA Device Number:                 0
          Device Name:                        Quadro P1000
          Global Memory Size:                 4236312576 bytes
          Number of Multiprocessors:          5
          Concurrent Copy and Execution:      Yes
          Total Constant Memory:              65536 bytes
          Max Shared Memory per Block:        49152 bytes
          Registers per Block:                65536
          Warp Size:                          32 Threads
          Maximum Threads per Block:          1024
          Maximum Block Dimensions:           1024, 1024, 64
          Maximum Grid Dimensions:            2147483647 x 65535 x 65535
          Maximum Memory Pitch:               2147483647 bytes
          Texture Alignment:                  512 bytes
          Clock Rate:                         1480500 kHz
          Execution Timeout:                  Yes
          Integrated Device:                  No
          Can Map Host Memory:                Yes
          Compute Mode:                       DEFAULT
          Concurrent Kernels:                 Yes
          ECC Enabled:                        No
          Memory Clock Rate:                  2505000 kHz
          Memory Bus Width:                   128 bits
          L2 Cache Size:                      1048576 bytes
          Max Threads Per SMP:                2048
          Async Engines:                      Yes (2)
          Unified Addressing:                 Yes
          Managed Memory:                     Yes
          Concurrent Managed Memory:          Yes
          Preemption Supported:               Yes
          Cooperative Launch:                 Yes
          Multi-Device Boars:                 No
          Compute Capabilities:               61
      ```
      
      Reviewed By: tianshilei1992
      
      Differential Revision: https://reviews.llvm.org/D106752
      313c5239
    • Jose M Monsalve Diaz's avatar
      [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` · d2f85d09
      Jose M Monsalve Diaz authored
      This patch introduces a function in the device's plugin to print the
      device information. This patch relates to another patch that introduces
      a CLI tool to obtain the device information from the omplibrary directly.
      It is inspired by PGI's pgaccelinfo.
      
      The modifications are as follows:
      1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
      2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
      3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
      4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D106751
      d2f85d09
Loading