Skip to content
  1. Dec 08, 2016
  2. Nov 21, 2016
  3. Nov 14, 2016
    • Jonathan Peyton's avatar
      Update stats-gathering code · 5375fe82
      Jonathan Peyton authored
      Have developer timers use partitioning scheme which also required that some
      redundant developer timers be removed in favor of the already existing normal
      timers. Move per thread stats initialization to just after global thread id
      assignment which is as early as possible. Also put all global stats
      initialization code in __kmp_stats_init() and all global stats destruction code
      in __kmp_stats_fini().
      
      Differential Revision: https://reviews.llvm.org/D26361
      
      llvm-svn: 286892
      5375fe82
    • Jonathan Peyton's avatar
      Introduce dynamic affinity dispatch capabilities · 1cdd87ad
      Jonathan Peyton authored
      This set of changes enables the affinity interface (Either the preexisting
      native operating system or HWLOC) to be dynamically set at runtime
      initialization. The point of this change is that we were seeing performance
      degradations when using HWLOC. This allows the user to use the old affinity
      mechanisms which on large machines (>64 cores) makes a large difference in
      initialization time.
      
      These changes mostly move affinity code under a small class hierarchy:
      
      KMPAffinity
        class Mask {}
      KMPNativeAffinity : public KMPAffinity
        class Mask : public KMPAffinity::Mask
      KMPHwlocAffinity
        class Mask : public KMPAffinity::Mask
      
      Since all interface functions (for both affinity and the mask implementation)
      are virtual, the implementation can be chosen at runtime initialization.
      
      Differential Revision: https://reviews.llvm.org/D26356
      
      llvm-svn: 286890
      1cdd87ad
  4. Nov 07, 2016
    • Jonas Hahnfeld's avatar
      [OpenMP] Enable ThreadSanitizer to check OpenMP programs · 50fed047
      Jonas Hahnfeld authored
      This patch allows ThreadSanitizer (Tsan) to verify OpenMP programs.
      It means that no false positive will be reported by Tsan when
      verifying an OpenMP programs.
      This patch introduces annotations within the OpenMP runtime module to
      provide information about thread synchronization to the Tsan runtime.
      
      In order to enable the Tsan support when building the runtime, you must
      enable the TSAN_SUPPORT option with the following environment variable:
      
      -DLIBOMP_TSAN_SUPPORT=TRUE
      
      The annotations will be enabled in the main shared library
      (same mechanism of OMPT).
      
      Patch by Simone Atzeni and Joachim Protze!
      
      Differential Revision: https://reviews.llvm.org/D13072
      
      llvm-svn: 286115
      50fed047
  5. Oct 27, 2016
  6. Oct 26, 2016
  7. Oct 20, 2016
    • Samuel Antao's avatar
      [OpenMP] Fix issue with directives used in a macro. · 33515191
      Samuel Antao authored
      Summary:
      If directives are used in a macro, clang complains with:
      ```
      src/projects/openmp/runtime/src/kmp_runtime.c:7486:2: error: embedding a directive within macro arguments has undefined behavior [-Werror,-Wembedded-directive]
      #if KMP_USE_MONITOR
      ```
      
      This patch fixes two occurrences of the issue in `kmp_runtime.cpp`.
      
      Reviewers: tlwilmar, jlpeyton, AndreyChurbanov, Hahnfeld
      
      Subscribers: Hahnfeld, openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D25823
      
      llvm-svn: 284728
      33515191
  8. Oct 07, 2016
  9. Sep 30, 2016
  10. Sep 27, 2016
    • Jonathan Peyton's avatar
      Disable monitor thread creation by default. · b66d1aab
      Jonathan Peyton authored
      This change set disables creation of the monitor thread by default.  The global
      counter maintained by the monitor thread was replaced by logic that uses system
      time directly, and cyclic yielding on Linux target was also removed since there
      was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1)
      enables creation of monitor thread again if it is really necessary for some
      reasons.
      
      Differential Revision: https://reviews.llvm.org/D24739
      
      llvm-svn: 282507
      b66d1aab
  11. Sep 14, 2016
    • Jonas Hahnfeld's avatar
      [OMPT] fix task frame information for gomp interface · 848d6906
      Jonas Hahnfeld authored
      Previous differencials D23305-D23310 changed task frame information management only for the kmp interface, but not for the whole gomp interface. This broke some testcases when building with gcc.
      This patch fixes the broken task frame information for the gomp interface.
      
      Patch by Joachim Protze!
      
      Differential Revision: https://reviews.llvm.org/D24502
      
      llvm-svn: 281468
      848d6906
    • Jonas Hahnfeld's avatar
      [OMPT] Reset task exit frame when execution is finished · 8a27064e
      Jonas Hahnfeld authored
      The exit address is set when execution of a task is started and should be reset as soon as the execution is finished.
      Especially for the asm implementation of __kmp_invoke_microtask, resetting in this call would be painfull, so reset just after the invokation.
      
      The testcase shows the effect of this patch:
      Before, the implicit barriers at the end of an implicit task would see an exit address for the implicit task.
      
      This barrier is a task scheduling point. Thus, any explicit task scheduled there would see an exit, but no reenter address for the implicit task.
      
      Patch by Joachim Protze!
      
      Differential Revision: https://reviews.llvm.org/D23307
      
      llvm-svn: 281465
      8a27064e
    • Jonas Hahnfeld's avatar
      [OMPT] Align implementation of reenter frame address to latest (frozen) version of OMPT spec · fd0614d8
      Jonas Hahnfeld authored
      The latest OMPT spec changed the semantic of a tasks reenter frame to be the application frame, that will be entered, when the runtime frame drops.
      Before it was the last frame in the runtime. This doesn't work for some gcc execution pathes or even clang generated code for :
      Since there is no runtime frame between the executed task and the encountering task.
      
      The test case compares exit and reenter addresses against addresses captured in application code
      
      Patch by Joachim Protze!
      
      Differential Revision: https://reviews.llvm.org/D23305
      
      llvm-svn: 281464
      fd0614d8
  12. Sep 09, 2016
  13. Sep 02, 2016
  14. Aug 11, 2016
  15. Jul 08, 2016
  16. Jul 04, 2016
  17. Jul 01, 2016
  18. Jun 21, 2016
  19. Jun 16, 2016
    • Jonathan Peyton's avatar
      Bug fix: crash if teams executed on host · 7cf08d42
      Jonathan Peyton authored
      Added argv array check/allocation for parallel directly nested inside the teams
      construct, as new coming Fortran codegen passes parameters directly into
      kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen
      passed to parallel the subset of parameter passed to teams, and thus
      no check/allocation needed).
      
      Patch by Andrey Churbanov
      
      Differential Revision: http://reviews.llvm.org/D21336
      
      llvm-svn: 272935
      7cf08d42
  20. Jun 14, 2016
    • Jonathan Peyton's avatar
      Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be
      Jonathan Peyton authored
      OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
      45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
      41 is deprecated and to use 45 instead.
      
      llvm-svn: 272687
      df6818be
  21. Jun 13, 2016
  22. May 31, 2016
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  23. May 26, 2016
    • Jonathan Peyton's avatar
      Fix for OMP_PROC_BIND=spread strategy · 7ba9baef
      Jonathan Peyton authored
      The OMP_PROC_BIND=spread strategy fails to assign the master thread the
      correct place partition after the first parallel region. Other threads in the
      hot team will remember their place_partition, but the master's place partition
      is restored to what it was before entering the parallel region. So when the hot
      team is used for subsequent parallel regions, the master has lost this info.
      This fix calls __kmp_partition_places to update only the master thread's place
      partition in the spread case when there are no other changes to the hot team.
      
      Patch by Terry Wilmarth
      
      Differential Revision: http://reviews.llvm.org/D20539
      
      llvm-svn: 270890
      7ba9baef
    • Jonathan Peyton's avatar
      Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled · 7abf9d59
      Jonathan Peyton authored
      On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
      statically-linked binary causes a failure at runtime because dlopen fails.
      This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
      that can be disabled.
      
      Patch by John Mellor-Crummey
      
      Differential Revision: http://reviews.llvm.org/D20517
      
      llvm-svn: 270884
      7abf9d59
  24. May 23, 2016
  25. May 20, 2016
  26. May 12, 2016
    • Jonathan Peyton's avatar
      Fix team reuse with foreign threads · 2b749b33
      Jonathan Peyton authored
      After hot teams were enabled by default, the library started using levels kept
      in the team structure. The levels are broken in case foreign thread exits and
      puts its team into the pool which is then re-used by another foreign thread.
      The broken behavior observed is when printing the levels for each new team, one
      gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
      team is nested which is incorrect. What is wanted is for the levels to be
      1, 1, 1, etc.
      
      Differential Revision: http://reviews.llvm.org/D19980
      
      llvm-svn: 269363
      2b749b33
  27. May 05, 2016
    • Jonathan Peyton's avatar
      [STATS] Use partitioned timer scheme · 11dc82fa
      Jonathan Peyton authored
      This change removes the current timers with ones that partition time properly.
      The current timers are nested, so that if a new timer, B, starts when the
      current timer, A, is already timing, A's time will include B's. To eliminate
      this problem, the partitioned timers are designed to stop the current timer (A),
      let the new timer run (B), and when the new timer is finished, restart the
      previously running timer (A). With this partitioning of time, a threads' timers
      all sum up to the OMP_worker_thread_life time and can now easily show the
      percentage of time a thread is spending in different parts of the runtime or
      user code.
      
      There is also a new state variable associated with each thread which tells where
      it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
      time is spent in OMP_task_taskwait, then that thread executed tasks inside a
      #pragma omp taskwait construct.
      
      The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
      the new partitionedTimers class and its methods, and new state logic.
      
      Differential Revision: http://reviews.llvm.org/D19229
      
      llvm-svn: 268640
      11dc82fa
  28. Apr 19, 2016
  29. Apr 18, 2016
    • Jonathan Peyton's avatar
      Fix for pthread_setspecific (TLS and shutdown) problem · f252010f
      Jonathan Peyton authored
      Some codes that use TLS fail intermittently because one thread tries to write
      TLS values after the TLS key has been destroyed by another thread. This happens
      when one thread executes library shutdown (and destroys TLS keys), while another
      thread starts to execute the TLS key destructor routine. Before this change, the
      kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
      this flag is set to FALSE later than the destruction of the TLS keys, which
      leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
      unset *before* the destruction of TLS keys.
      
      Differential Revision: http://reviews.llvm.org/D19022
      
      llvm-svn: 266674
      f252010f
  30. Mar 29, 2016
  31. Mar 02, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 doacross loop nest feature · 71909c57
      Jonathan Peyton authored
      From the standard: A doacross loop nest is a loop nest that has cross-iteration
      dependence. An iteration is dependent on one or more lexicographically earlier
      iterations. The ordered clause parameter on a loop directive identifies the
      loop(s) associated with the doacross loop nest.
      
      The init/fini routines allocate/free doacross buffer(s) for each loop for each
      thread.  The wait routine waits for a flag designated by the dependence vector.
      The post routine sets the flag designated by current iteration vector.  We use
      a similar technique of shared buffer indices that covers up to 7 nowait loops
      executed simultaneously by different threads (number 7 has no real meaning,
      just heuristic value).  Also, the size of structures are kept intact via
      reducing dummy arrays.
      
      This needs to be put into the OpenMP runtime library in order for the compiler
      team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17399
      
      llvm-svn: 262532
      71909c57
  32. Feb 25, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 affinity API · 2f7c077b
      Jonathan Peyton authored
      This change introduces the new OpenMP 4.5 affinity api surrounding
      OpenMP Places. There are six new entry points:
      
      Typically called in serial region:
       * omp_get_num_places - returns the number of places available to the execution
             environment in the place list.
       * omp_get_place_num_procs - returns the number of processors available to the
             execution environment in the specified place.
       * omp_get_place_proc_ids - returns the numerical identifiers of the processors
             available to the execution environment in the specified place.
      
      Typically called inside parallel region:
       * omp_get_place_num - returns the place number of the place to which the
             encountering thread is bound.
       * omp_get_partition_num_places - returns the number of places in the place
             partition of the innermost implicit task.
       * omp_get_partition_place_nums - returns the list of place numbers
             corresponding to the places in the place-var ICV of the innermost
             implicit task.
      
      Differential Revision: http://reviews.llvm.org/D17417
      
      llvm-svn: 261915
      2f7c077b
  33. Feb 09, 2016
Loading