Skip to content
  1. Dec 27, 2017
  2. Dec 24, 2017
  3. Dec 22, 2017
  4. Dec 21, 2017
  5. Dec 18, 2017
  6. Dec 13, 2017
  7. Dec 08, 2017
    • Jonas Hahnfeld's avatar
      Use hyperbarrier by default on all architectures · e628ab4c
      Jonas Hahnfeld authored
      All architectures except x86_64 used the linear barrier implementation
      by default which doesn't give good performance for a larger number
      of threads.
      
      Improvements for PARALLEL overhead (EPCC) with this patch on a Power8
      system (2 sockets x 10 cores x 8 threads, OMP_PLACES=cores)
      
       20 threads:  4.55us -> 3.49us
       40 threads:  8.84us -> 4.06us
       80 threads: 19.18us -> 4.74us
      160 threads: 54.22us -> 6.73us
      
      Differential Revision: https://reviews.llvm.org/D40358
      
      llvm-svn: 320152
      e628ab4c
    • Jonas Hahnfeld's avatar
      Fix thread affinity on non-x86 Linux · ce528acf
      Jonas Hahnfeld authored
      To make thread affinity work according to the OpenMP spec, the
      runtime needs information about the hardware topology. On Linux
      the default way is to parse /proc/cpuinfo which contains this
      information for x86 machines but (at least) not for AArch64 and
      Power architectures.
      
      Fortunately, there is a different code path which is able to get
      that data from sysfs. The needed patch has landed in 2006 for
      Linux 2.6.16 which is safe to assume nowadays (even RHEL 5 had
      a kernel version derived from 2.6.18, and we are now at RHEL 7!).
      
      Differential Revision: https://reviews.llvm.org/D40357
      
      llvm-svn: 320151
      ce528acf
    • Jonas Hahnfeld's avatar
      Add missing memory barrier for queuing locks · 86c30782
      Jonas Hahnfeld authored
      Otherwise I see hangs in the omp_single_copyprivate test when
      compiling in release mode. With the debug assertions, I get a
      failure `head > 0 && tail > 0`.
      
      Differential Revision: https://reviews.llvm.org/D40722
      
      llvm-svn: 320150
      86c30782
  8. Dec 06, 2017
  9. Dec 05, 2017
    • Jonas Hahnfeld's avatar
      Fix alignment in teams-reduction.c test · 241d1d9e
      Jonas Hahnfeld authored
      The runtime will use the global kmp_critical_name as a lock and
      tries to atomically store a pointer in there. This will fail
      if the global is only aligned by 4 bytes, the size of one int32_t
      element. Use a union to ensure the global is aligned to the size
      of a pointer on the current platform.
      
      llvm-svn: 319811
      241d1d9e
    • Jonas Hahnfeld's avatar
      Fix PR30890: Reduction across teams hangs · a4ca525c
      Jonas Hahnfeld authored
      __kmpc_reduce_nowait() correctly swapped the teams for reductions
      in a teams construct. Apply the same logic to __kmpc_reduce() and
      __kmpc_reduce_end().
      
      Differential Revision: https://reviews.llvm.org/D40753
      
      llvm-svn: 319788
      a4ca525c
  10. Nov 30, 2017
  11. Nov 29, 2017
  12. Nov 25, 2017
  13. Nov 22, 2017
  14. Nov 20, 2017
  15. Nov 17, 2017
    • Jonas Hahnfeld's avatar
      [OMPT] Fix inaccuracies in worksharing tests · 0924094e
      Jonas Hahnfeld authored
      These tests were failing rarely on my MacBook when there was some
      activity in the background. Read: one of a thousand executions?
      
       * sections.c missed the sorting based on thread ids. This worked
         as long as the master thread finished its section before the
         worker thread started the second one but failed if the master
         thread was put to sleep by the OS.
       * The checks in single.c assumed that the master thread executes
         the single region which works most of the time because it is
         usually faster than the newly spawned worker thread.
      
      Differential Revision: https://reviews.llvm.org/D39853
      
      llvm-svn: 318527
      0924094e
  16. Nov 16, 2017
  17. Nov 11, 2017
    • Jonas Hahnfeld's avatar
      [OMPT] Provide initialization for Mac OS X · d0ef19ef
      Jonas Hahnfeld authored
      Traditionally, the library had a weak symbol for ompt_start_tool()
      that served as fallback and disabled OMPT if called. Tools could
      provide their own version and replace the default implementation
      to register callbacks and lookup functions. This mechanism has
      worked reasonably well on Linux systems where this interface was
      initially developed.
      
      On Darwin / Mac OS X the situation is a bit more complicated and
      the weak symbol doesn't work out-of-the-box. In my tests, the
      library with the tool needed to link against the OpenMP runtime
      to make the process work. This would effectively mean that a tool
      needed to choose a runtime library whereas one design goal of the
      interface was to allow tools that are agnostic of the runtime.
      
      The solution is to use dlsym() with the argument RTLD_DEFAULT so
      that static implementations of ompt_start_tool() are found in the
      main executable. This works because the linker on Mac OS X includes
      all symbols of an executable in the global symbol table by default.
      To use the same code path on Linux, the application would need to
      be built with -Wl,--export-dynamic. To avoid this restriction, we
      continue to use weak symbols on Linux systems as before.
      
      Finally this patch extends the existing test to cover all possible
      ways of initializing the tool as described by the standard. It
      also fixes ompt_finalize() to not call omp_get_thread_num() when
      the library is shut down which resulted in hangs on Darwin.
      The changes have been tested on Linux to make sure that it passes
      the current tests as well as the newly extended one.
      
      Differential Revision: https://reviews.llvm.org/D39801
      
      llvm-svn: 317980
      d0ef19ef
  18. Nov 10, 2017
  19. Nov 09, 2017
Loading