Skip to content
  1. May 31, 2016
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  2. May 27, 2016
  3. May 26, 2016
    • Jonathan Peyton's avatar
      Fix for OMP_PROC_BIND=spread strategy · 7ba9baef
      Jonathan Peyton authored
      The OMP_PROC_BIND=spread strategy fails to assign the master thread the
      correct place partition after the first parallel region. Other threads in the
      hot team will remember their place_partition, but the master's place partition
      is restored to what it was before entering the parallel region. So when the hot
      team is used for subsequent parallel regions, the master has lost this info.
      This fix calls __kmp_partition_places to update only the master thread's place
      partition in the spread case when there are no other changes to the hot team.
      
      Patch by Terry Wilmarth
      
      Differential Revision: http://reviews.llvm.org/D20539
      
      llvm-svn: 270890
      7ba9baef
    • Jonathan Peyton's avatar
      Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled · 7abf9d59
      Jonathan Peyton authored
      On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
      statically-linked binary causes a failure at runtime because dlopen fails.
      This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
      that can be disabled.
      
      Patch by John Mellor-Crummey
      
      Differential Revision: http://reviews.llvm.org/D20517
      
      llvm-svn: 270884
      7abf9d59
    • Hal Finkel's avatar
      Add a test case for microtask dispatch with many arguments · 0a665a83
      Hal Finkel authored
      This is a cleaned-up version of the test case posted in the D19879 review.
      
      llvm-svn: 270867
      0a665a83
    • Hal Finkel's avatar
      Add an assembly __kmp_invoke_microtask for ppc64[le] · 91e19a3d
      Hal Finkel authored
      Clang no longer restricts itself to generating microtasks with a small number
      of arguments, and so an assembly implementation is required to prevent hitting
      the parameter limit present in the C implementation. This adds an
      implementation for ppc64[le].
      
      llvm-svn: 270821
      91e19a3d
  4. May 25, 2016
  5. May 23, 2016
  6. May 20, 2016
  7. May 18, 2016
  8. May 17, 2016
  9. May 16, 2016
    • Paul Osmialowski's avatar
      Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf
      Paul Osmialowski authored
      KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
      inconsequently throughout LLVM libomp code.
      
      * some .c files that use this define do not include kmp_lock.h file,
        in effect guarded part of code are never compiled
      * some places in code use architecture-depending preprocessor
        logic expressions which effectively disable use of Futex for
        AArch64 architecture, all these places should use
        '#if KMP_USE_FUTEX' instead to avoid any further confusions
      * some places use KMP_HAS_FUTEX which is nowhere defined,
        KMP_USE_FUTEX should be used instead
      
      Differential Revision: http://reviews.llvm.org/D19629
      
      llvm-svn: 269642
      fb043fdf
  10. May 13, 2016
  11. May 12, 2016
    • Jonathan Peyton's avatar
      Fix team reuse with foreign threads · 2b749b33
      Jonathan Peyton authored
      After hot teams were enabled by default, the library started using levels kept
      in the team structure. The levels are broken in case foreign thread exits and
      puts its team into the pool which is then re-used by another foreign thread.
      The broken behavior observed is when printing the levels for each new team, one
      gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
      team is nested which is incorrect. What is wanted is for the levels to be
      1, 1, 1, etc.
      
      Differential Revision: http://reviews.llvm.org/D19980
      
      llvm-svn: 269363
      2b749b33
    • Paul Osmialowski's avatar
      New hwloc API compatibility · 562a3c2b
      Paul Osmialowski authored
      Differential Revision: http://reviews.llvm.org/D19628
      
      llvm-svn: 269284
      562a3c2b
    • Hal Finkel's avatar
      Restore NULL flag check in __kmp_null_resume_wrapper · 55acbf88
      Hal Finkel authored
      This reverts a presumaby-unintentional change in:
      
        r268640 - [STATS] Use partitioned timer scheme
      
      and fixes segfaults in an x86_64 debug build of the runtime library.
      
      llvm-svn: 269259
      55acbf88
  12. May 07, 2016
  13. May 05, 2016
    • Jonathan Peyton's avatar
      [STATS] Use partitioned timer scheme · 11dc82fa
      Jonathan Peyton authored
      This change removes the current timers with ones that partition time properly.
      The current timers are nested, so that if a new timer, B, starts when the
      current timer, A, is already timing, A's time will include B's. To eliminate
      this problem, the partitioned timers are designed to stop the current timer (A),
      let the new timer run (B), and when the new timer is finished, restart the
      previously running timer (A). With this partitioning of time, a threads' timers
      all sum up to the OMP_worker_thread_life time and can now easily show the
      percentage of time a thread is spending in different parts of the runtime or
      user code.
      
      There is also a new state variable associated with each thread which tells where
      it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
      time is spent in OMP_task_taskwait, then that thread executed tasks inside a
      #pragma omp taskwait construct.
      
      The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
      the new partitionedTimers class and its methods, and new state logic.
      
      Differential Revision: http://reviews.llvm.org/D19229
      
      llvm-svn: 268640
      11dc82fa
  14. May 04, 2016
  15. Apr 25, 2016
  16. Apr 19, 2016
  17. Apr 18, 2016
    • Jonathan Peyton's avatar
      Fix trip count calculation for parallel loops in runtime · 5235a1b6
      Jonathan Peyton authored
      The trip count calculation was incorrect for loops with large bounds. For example,
      for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
      calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
      signed integers) and wasn't giving the right value. This patch fixes this error
      in the runtime by using unsigned integers instead. There is still a bug in the
      clang compiler component because it warns that there is overflow in the
      test case file when there isn't. This error isn't there for the Intel Compiler.
      So for now, the test case is designated as XFAIL.
      
      Differential Revision: http://reviews.llvm.org/D19078
      
      llvm-svn: 266677
      5235a1b6
    • Jonathan Peyton's avatar
      Runtime support for untied tasks · e6643daa
      Jonathan Peyton authored
      Introduced a counter of parts of an untied task submitted for execution. The
      counter controls whether all parts of the task are already finished. The
      compiler should generate re-submission of partially executed untied task by
      itself before exiting of each task part except for the lexical last part.
      
      Differential Revision: http://reviews.llvm.org/D19026
      
      llvm-svn: 266675
      e6643daa
    • Jonathan Peyton's avatar
      Fix for pthread_setspecific (TLS and shutdown) problem · f252010f
      Jonathan Peyton authored
      Some codes that use TLS fail intermittently because one thread tries to write
      TLS values after the TLS key has been destroyed by another thread. This happens
      when one thread executes library shutdown (and destroys TLS keys), while another
      thread starts to execute the TLS key destructor routine. Before this change, the
      kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
      this flag is set to FALSE later than the destruction of the TLS keys, which
      leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
      unset *before* the destruction of TLS keys.
      
      Differential Revision: http://reviews.llvm.org/D19022
      
      llvm-svn: 266674
      f252010f
    • Jonathan Peyton's avatar
      [STATS] Remove timePair class and unused functions · e2289a42
      Jonathan Peyton authored
      llvm-svn: 266634
      e2289a42
    • Jonathan Peyton's avatar
      [STATS] print Total_* stats on their own line · 53eca521
      Jonathan Peyton authored
      llvm-svn: 266633
      53eca521
  18. Apr 14, 2016
    • Jonathan Peyton's avatar
      [ITTNOTIFY] Correct barrier imbalance time in case of tasks · 99ef4d04
      Jonathan Peyton authored
      ittnotify fix for barrier imbalance time in case tasks exist. In the current
      implementation, task execution time is included into aggregated time on a
      barrier. This fix calculates task execution time and corrects the arrive time
      by subtracting the task execution time.
      
      Since __kmp_invoke_task() can not only be called on a barrier, the field
      th.th_bar_arrive_time is used to check if the function was called at the
      barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
      is set to zero right after the value is used on the barrier.
      
      Differential Revision: http://reviews.llvm.org/D19030
      
      llvm-svn: 266332
      99ef4d04
    • Jonathan Peyton's avatar
      Exponential back off logic for test-and-set lock · 377aa40d
      Jonathan Peyton authored
      This change adds back off logic in the test and set lock for better contended
      lock performance. It uses a simple truncated binary exponential back off
      function. The default back off parameters are tuned for x86.
      
      The main back off logic has a two loop structure where each is controlled by a
      user-level parameter:
      max_backoff - limits the outer loop number of iterations.
          This parameter should be a power of 2.
      min_ticks - the inner spin wait loop number of "ticks" which is system
          dependent and should be tuned for your system if you so choose.
          The "ticks" on x86 correspond to the time stamp counter,
          but on other architectures ticks is a timestamp derived
          from gettimeofday().
      
      The user can modify these via the environment variable:
      KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
      Currently, since the default user lock is a queuing lock,
      one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.
      
      Differential Revision: http://reviews.llvm.org/D19020
      
      llvm-svn: 266329
      377aa40d
  19. Apr 12, 2016
Loading