Skip to content
  1. May 05, 2016
    • Jonathan Peyton's avatar
      [STATS] Use partitioned timer scheme · 11dc82fa
      Jonathan Peyton authored
      This change removes the current timers with ones that partition time properly.
      The current timers are nested, so that if a new timer, B, starts when the
      current timer, A, is already timing, A's time will include B's. To eliminate
      this problem, the partitioned timers are designed to stop the current timer (A),
      let the new timer run (B), and when the new timer is finished, restart the
      previously running timer (A). With this partitioning of time, a threads' timers
      all sum up to the OMP_worker_thread_life time and can now easily show the
      percentage of time a thread is spending in different parts of the runtime or
      user code.
      
      There is also a new state variable associated with each thread which tells where
      it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
      time is spent in OMP_task_taskwait, then that thread executed tasks inside a
      #pragma omp taskwait construct.
      
      The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
      the new partitionedTimers class and its methods, and new state logic.
      
      Differential Revision: http://reviews.llvm.org/D19229
      
      llvm-svn: 268640
      11dc82fa
  2. May 04, 2016
  3. Apr 25, 2016
  4. Apr 19, 2016
  5. Apr 18, 2016
    • Jonathan Peyton's avatar
      Fix trip count calculation for parallel loops in runtime · 5235a1b6
      Jonathan Peyton authored
      The trip count calculation was incorrect for loops with large bounds. For example,
      for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
      calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
      signed integers) and wasn't giving the right value. This patch fixes this error
      in the runtime by using unsigned integers instead. There is still a bug in the
      clang compiler component because it warns that there is overflow in the
      test case file when there isn't. This error isn't there for the Intel Compiler.
      So for now, the test case is designated as XFAIL.
      
      Differential Revision: http://reviews.llvm.org/D19078
      
      llvm-svn: 266677
      5235a1b6
    • Jonathan Peyton's avatar
      Runtime support for untied tasks · e6643daa
      Jonathan Peyton authored
      Introduced a counter of parts of an untied task submitted for execution. The
      counter controls whether all parts of the task are already finished. The
      compiler should generate re-submission of partially executed untied task by
      itself before exiting of each task part except for the lexical last part.
      
      Differential Revision: http://reviews.llvm.org/D19026
      
      llvm-svn: 266675
      e6643daa
    • Jonathan Peyton's avatar
      Fix for pthread_setspecific (TLS and shutdown) problem · f252010f
      Jonathan Peyton authored
      Some codes that use TLS fail intermittently because one thread tries to write
      TLS values after the TLS key has been destroyed by another thread. This happens
      when one thread executes library shutdown (and destroys TLS keys), while another
      thread starts to execute the TLS key destructor routine. Before this change, the
      kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
      this flag is set to FALSE later than the destruction of the TLS keys, which
      leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
      unset *before* the destruction of TLS keys.
      
      Differential Revision: http://reviews.llvm.org/D19022
      
      llvm-svn: 266674
      f252010f
    • Jonathan Peyton's avatar
      [STATS] Remove timePair class and unused functions · e2289a42
      Jonathan Peyton authored
      llvm-svn: 266634
      e2289a42
    • Jonathan Peyton's avatar
      [STATS] print Total_* stats on their own line · 53eca521
      Jonathan Peyton authored
      llvm-svn: 266633
      53eca521
  6. Apr 14, 2016
    • Jonathan Peyton's avatar
      [ITTNOTIFY] Correct barrier imbalance time in case of tasks · 99ef4d04
      Jonathan Peyton authored
      ittnotify fix for barrier imbalance time in case tasks exist. In the current
      implementation, task execution time is included into aggregated time on a
      barrier. This fix calculates task execution time and corrects the arrive time
      by subtracting the task execution time.
      
      Since __kmp_invoke_task() can not only be called on a barrier, the field
      th.th_bar_arrive_time is used to check if the function was called at the
      barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
      is set to zero right after the value is used on the barrier.
      
      Differential Revision: http://reviews.llvm.org/D19030
      
      llvm-svn: 266332
      99ef4d04
    • Jonathan Peyton's avatar
      Exponential back off logic for test-and-set lock · 377aa40d
      Jonathan Peyton authored
      This change adds back off logic in the test and set lock for better contended
      lock performance. It uses a simple truncated binary exponential back off
      function. The default back off parameters are tuned for x86.
      
      The main back off logic has a two loop structure where each is controlled by a
      user-level parameter:
      max_backoff - limits the outer loop number of iterations.
          This parameter should be a power of 2.
      min_ticks - the inner spin wait loop number of "ticks" which is system
          dependent and should be tuned for your system if you so choose.
          The "ticks" on x86 correspond to the time stamp counter,
          but on other architectures ticks is a timestamp derived
          from gettimeofday().
      
      The user can modify these via the environment variable:
      KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
      Currently, since the default user lock is a queuing lock,
      one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.
      
      Differential Revision: http://reviews.llvm.org/D19020
      
      llvm-svn: 266329
      377aa40d
  7. Apr 12, 2016
  8. Apr 05, 2016
  9. Apr 04, 2016
    • Jonathan Peyton's avatar
      OMP_WAIT_POLICY changes · 50e8f18b
      Jonathan Peyton authored
      This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in
      spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means
      that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was
      the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or
      KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should
      also be able to specify this behavior.
      
      Differential Revision: http://reviews.llvm.org/D18577
      
      llvm-svn: 265339
      50e8f18b
  10. Mar 30, 2016
  11. Mar 29, 2016
  12. Mar 27, 2016
  13. Mar 24, 2016
  14. Mar 23, 2016
    • Jonathan Peyton's avatar
      Fix Visual Studio builds · b7d30cbc
      Jonathan Peyton authored
      Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove
      __declspec align attribute from function parameters in kmp_atomic.h
      
      llvm-svn: 264166
      b7d30cbc
  15. Mar 21, 2016
  16. Mar 16, 2016
    • Jonathan Peyton's avatar
      [CMake] Fix Windows build problem for CMake versions < 3.3 · 8a46c067
      Jonathan Peyton authored
      Building libomp using CMake versions < 3.3 caused a link time error.  These
      errors occurred because when assembling z_Windows_NT-586_asm.asm, the
      definitions: OMPT_SUPPORT, _M_AMD64|_M_IA32 weren't defined on the command line.
      To fix the problem, the COMPILE_FLAGS property for the assembly file is appended
      to instead of the COMPILE_DEFINITIONS property being set.  For whatever reason, the
      COMPILE_DEFINITIONS property doesn't pick up the definitions for assembly files
      for the older CMake versions.
      
      llvm-svn: 263651
      8a46c067
  17. Mar 15, 2016
  18. Mar 12, 2016
    • Samuel Antao's avatar
      Initialize two variables in kmp_tasking. · 11e4c539
      Samuel Antao authored
      Summary:
      Two initialized local variables are causing clang to produce warnings:
      
      ```
      ./src/projects/openmp/runtime/src/kmp_tasking.c:3019:5: error: variable 'num_tasks' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
          default:
          ^~~~~~~
      ./src/projects/openmp/runtime/src/kmp_tasking.c:3027:21: note: uninitialized use occurs here
          for( i = 0; i < num_tasks; ++i ) {
                          ^~~~~~~~~
      ./src/projects/openmp/runtime/src/kmp_tasking.c:2968:28: note: initialize the variable 'num_tasks' to silence this warning
          kmp_uint64 i, num_tasks, extras;
                                 ^
                                  = 0
      ./src/projects/openmp/runtime/src/kmp_tasking.c:3019:5: error: variable 'extras' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
          default:
          ^~~~~~~
      ./src/projects/openmp/runtime/src/kmp_tasking.c:3022:52: note: uninitialized use occurs here
          KMP_DEBUG_ASSERT(tc == num_tasks * grainsize + extras);
                                                         ^~~~~~
      ./src/projects/openmp/runtime/src/kmp_debug.h:62:60: note: expanded from macro 'KMP_DEBUG_ASSERT'
              #define KMP_DEBUG_ASSERT( cond )       KMP_ASSERT( cond )
                                                                 ^
      ./src/projects/openmp/runtime/src/kmp_debug.h:60:51: note: expanded from macro 'KMP_ASSERT'
              #define KMP_ASSERT( cond )             ( (cond) ? 0 : __kmp_debug_assert( #cond, __FILE__, __LINE__ ) )
                                                        ^
      ./src/projects/openmp/runtime/src/kmp_tasking.c:2968:36: note: initialize the variable 'extras' to silence this warning
          kmp_uint64 i, num_tasks, extras;
                                         ^
                                          = 0
      2 errors generated.
      ```
      
      This patch initializes these two variables.
      
      Reviewers: tlwilmar, jlpeyton
      
      Subscribers: tlwilmar, openmp-commits
      
      Differential Revision: http://reviews.llvm.org/D17909
      
      llvm-svn: 263316
      11e4c539
  19. Mar 11, 2016
  20. Mar 03, 2016
  21. Mar 02, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 taskloop construct feature · 283a215c
      Jonathan Peyton authored
      From the standard: The taskloop construct specifies that the iterations of one
      or more associated loops will be executed in parallel using OpenMP tasks. The
      iterations are distributed across tasks created by the construct and scheduled
      to be executed.
      
      This initial implementation uses a simple linear tasks distribution algorithm.
      Later we can add other algorithms to speedup generation of huge number of tasks
      (i.e., tree-like tasks generation should be faster).
      
      This needs to be put into the OpenMP runtime library in order for the
      compiler team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17404
      
      llvm-svn: 262535
      283a215c
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 doacross loop nest feature · 71909c57
      Jonathan Peyton authored
      From the standard: A doacross loop nest is a loop nest that has cross-iteration
      dependence. An iteration is dependent on one or more lexicographically earlier
      iterations. The ordered clause parameter on a loop directive identifies the
      loop(s) associated with the doacross loop nest.
      
      The init/fini routines allocate/free doacross buffer(s) for each loop for each
      thread.  The wait routine waits for a flag designated by the dependence vector.
      The post routine sets the flag designated by current iteration vector.  We use
      a similar technique of shared buffer indices that covers up to 7 nowait loops
      executed simultaneously by different threads (number 7 has no real meaning,
      just heuristic value).  Also, the size of structures are kept intact via
      reducing dummy arrays.
      
      This needs to be put into the OpenMP runtime library in order for the compiler
      team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17399
      
      llvm-svn: 262532
      71909c57
  22. Feb 25, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 affinity API · 2f7c077b
      Jonathan Peyton authored
      This change introduces the new OpenMP 4.5 affinity api surrounding
      OpenMP Places. There are six new entry points:
      
      Typically called in serial region:
       * omp_get_num_places - returns the number of places available to the execution
             environment in the place list.
       * omp_get_place_num_procs - returns the number of processors available to the
             execution environment in the specified place.
       * omp_get_place_proc_ids - returns the numerical identifiers of the processors
             available to the execution environment in the specified place.
      
      Typically called inside parallel region:
       * omp_get_place_num - returns the place number of the place to which the
             encountering thread is bound.
       * omp_get_partition_num_places - returns the number of places in the place
             partition of the innermost implicit task.
       * omp_get_partition_place_nums - returns the list of place numbers
             corresponding to the places in the place-var ICV of the innermost
             implicit task.
      
      Differential Revision: http://reviews.llvm.org/D17417
      
      llvm-svn: 261915
      2f7c077b
    • Jonathan Peyton's avatar
      Add initial support for OpenMP 4.5 task priority feature · 2851072d
      Jonathan Peyton authored
      The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
      But as of now, nothing is done with it.  We just handle the environment variable
      and add the new api: omp_get_max_task_priority() which returns that value or
      zero if it is not set.
      
      Differential Revision: http://reviews.llvm.org/D17411
      
      llvm-svn: 261908
      2851072d
Loading