Skip to content
  1. Dec 08, 2016
  2. Sep 14, 2016
  3. Jun 22, 2016
  4. Jun 14, 2016
    • Jonathan Peyton's avatar
      Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be
      Jonathan Peyton authored
      OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
      45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
      41 is deprecated and to use 45 instead.
      
      llvm-svn: 272687
      df6818be
  5. May 31, 2016
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  6. May 26, 2016
  7. May 20, 2016
  8. May 16, 2016
    • Paul Osmialowski's avatar
      Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf
      Paul Osmialowski authored
      KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
      inconsequently throughout LLVM libomp code.
      
      * some .c files that use this define do not include kmp_lock.h file,
        in effect guarded part of code are never compiled
      * some places in code use architecture-depending preprocessor
        logic expressions which effectively disable use of Futex for
        AArch64 architecture, all these places should use
        '#if KMP_USE_FUTEX' instead to avoid any further confusions
      * some places use KMP_HAS_FUTEX which is nowhere defined,
        KMP_USE_FUTEX should be used instead
      
      Differential Revision: http://reviews.llvm.org/D19629
      
      llvm-svn: 269642
      fb043fdf
  9. May 05, 2016
    • Jonathan Peyton's avatar
      [STATS] Use partitioned timer scheme · 11dc82fa
      Jonathan Peyton authored
      This change removes the current timers with ones that partition time properly.
      The current timers are nested, so that if a new timer, B, starts when the
      current timer, A, is already timing, A's time will include B's. To eliminate
      this problem, the partitioned timers are designed to stop the current timer (A),
      let the new timer run (B), and when the new timer is finished, restart the
      previously running timer (A). With this partitioning of time, a threads' timers
      all sum up to the OMP_worker_thread_life time and can now easily show the
      percentage of time a thread is spending in different parts of the runtime or
      user code.
      
      There is also a new state variable associated with each thread which tells where
      it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
      time is spent in OMP_task_taskwait, then that thread executed tasks inside a
      #pragma omp taskwait construct.
      
      The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
      the new partitionedTimers class and its methods, and new state logic.
      
      Differential Revision: http://reviews.llvm.org/D19229
      
      llvm-svn: 268640
      11dc82fa
  10. Apr 19, 2016
  11. Apr 14, 2016
    • Jonathan Peyton's avatar
      Exponential back off logic for test-and-set lock · 377aa40d
      Jonathan Peyton authored
      This change adds back off logic in the test and set lock for better contended
      lock performance. It uses a simple truncated binary exponential back off
      function. The default back off parameters are tuned for x86.
      
      The main back off logic has a two loop structure where each is controlled by a
      user-level parameter:
      max_backoff - limits the outer loop number of iterations.
          This parameter should be a power of 2.
      min_ticks - the inner spin wait loop number of "ticks" which is system
          dependent and should be tuned for your system if you so choose.
          The "ticks" on x86 correspond to the time stamp counter,
          but on other architectures ticks is a timestamp derived
          from gettimeofday().
      
      The user can modify these via the environment variable:
      KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
      Currently, since the default user lock is a queuing lock,
      one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.
      
      Differential Revision: http://reviews.llvm.org/D19020
      
      llvm-svn: 266329
      377aa40d
  12. Mar 27, 2016
  13. Mar 24, 2016
  14. Mar 23, 2016
    • Jonathan Peyton's avatar
      Fix Visual Studio builds · b7d30cbc
      Jonathan Peyton authored
      Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove
      __declspec align attribute from function parameters in kmp_atomic.h
      
      llvm-svn: 264166
      b7d30cbc
  15. Mar 21, 2016
  16. Mar 03, 2016
  17. Mar 02, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 doacross loop nest feature · 71909c57
      Jonathan Peyton authored
      From the standard: A doacross loop nest is a loop nest that has cross-iteration
      dependence. An iteration is dependent on one or more lexicographically earlier
      iterations. The ordered clause parameter on a loop directive identifies the
      loop(s) associated with the doacross loop nest.
      
      The init/fini routines allocate/free doacross buffer(s) for each loop for each
      thread.  The wait routine waits for a flag designated by the dependence vector.
      The post routine sets the flag designated by current iteration vector.  We use
      a similar technique of shared buffer indices that covers up to 7 nowait loops
      executed simultaneously by different threads (number 7 has no real meaning,
      just heuristic value).  Also, the size of structures are kept intact via
      reducing dummy arrays.
      
      This needs to be put into the OpenMP runtime library in order for the compiler
      team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17399
      
      llvm-svn: 262532
      71909c57
  18. Dec 23, 2015
    • Jonathan Peyton's avatar
      Fix build error: OMPT_SUPPORT=true was not tested after hinted lock changes · 2c295c4e
      Jonathan Peyton authored
      Recent changes to support dynamic locks didn't consider the code compiled when
      OMPT_SUPPORT=true. As a result, the OMPT support was broken by recent changes
      to nested locks to support dynamic locks. For OMPT to work with dynamic locks,
      they need to provide a return code indicating whether a nested lock acquisition
      was the first or not.
      
      This patch moves the OMPT support for nested locks into the #else case when
      DYNAMIC locks were not used. New support is needed for dynamic locks. This patch
      fixes the build and leaves a placeholder where the missing OMPT callbacks can be
      added either the author of the OMPT support for locks, or the dynamic
      locking support.
      
      Patch by John Mellor-Crummey
      
      Differential Revision: http://reviews.llvm.org/D15656
      
      llvm-svn: 256314
      2c295c4e
  19. Dec 11, 2015
    • Jonathan Peyton's avatar
      Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 3 · b87b5813
      Jonathan Peyton authored
      This change set includes all changes to make the code conform to the OMP 4.5 specification:
      
      * Removed hint / hinted_init definitions from include/40 files
      * Hint values are powers of 2 to enable composition (4.5 spec)
      * Hinted lock initialization functions were renamed (4.5 spec)
        kmp_init_lock_hinted -> omp_init_lock_with_hint
        kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
      * __kmpc_critical_section_with_hint was added to support a critical section with
        a hint (4.5 spec)
      * __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
        an internal lock type
      * kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
        internal entries for the hinted lock initializers. The preivous internal
        functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
        places
      * Added the two init functions to dllexports
      * KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
      
      Differential Revision: http://reviews.llvm.org/D15205
      
      llvm-svn: 255376
      b87b5813
    • Jonathan Peyton's avatar
      Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 2 · dae13d81
      Jonathan Peyton authored
      * Added a new user TSX lock implementation, RTM, This implementation is a
        light-weight version of the adaptive lock implementation, omitting the
        back-off logic for deciding when to specualte (or not). The fall-back lock is
        still the queuing lock.
      * Changed indirect lock table management. The data for indirect lock management
        was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table
        dimension was changed to 2D (was linear), and each entry is a
        kmp_indirect_lock_t object now (was a pointer to an object).
      * Some clean up in the critical section code
      * Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS
      * KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches:
        KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS
      
      Differential Revision: http://reviews.llvm.org/D15204
      
      llvm-svn: 255375
      dae13d81
    • Jonathan Peyton's avatar
      Hinted lock (OpenMP 4.5 feature) Updates/Fixes · a03533d3
      Jonathan Peyton authored
      There are going to be two more patches which bring this feature up to date and in line with OpenMP 4.5.
      
      * Renamed jump tables for the lock functions (and some clean up).
      * Renamed some macros to be in KMP_ namespace.
      * Return type of unset functions changed from void to int.
      * Enabled use of _xebgin() et al. intrinsics for accessing TSX instructions.
      
      Differential Revision: http://reviews.llvm.org/D15199
      
      llvm-svn: 255373
      a03533d3
  20. Dec 03, 2015
  21. Oct 16, 2015
    • Jonathan Peyton's avatar
      [OMPT] Add OMPT events for API locking · 0e6d4577
      Jonathan Peyton authored
      This fix implements the following OMPT events for the API locking routines:
      * ompt_event_acquired_lock
      * ompt_event_acquired_nest_lock_first
      * ompt_event_acquired_nest_lock_next
      * ompt_event_init_lock
      * ompt_event_init_nest_lock
      * ompt_event_destroy_lock
      * ompt_event_destroy_nest_lock
      
      For the acquired events the depths of the locks ist required, so a return value
      was added similiar to the return values we already have for the release lock
      routines.
      
      Patch by Tim Cramer
      
      Differential Revision: http://reviews.llvm.org/D13689
      
      llvm-svn: 250526
      0e6d4577
  22. Oct 09, 2015
  23. Oct 08, 2015
    • Jonathan Peyton's avatar
      Added sockets to the syntax of KMP_PLACE_THREADS environment variable. · dd4aa9b6
      Jonathan Peyton authored
      Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
      Some limitations:
      * The number of sockets and then optional offset should be specified first (before other parameters).
      * The letter designation is mandatory for sockets and then for other parameters.
      * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
      * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
      * The number of cores per socket cannot be specified before sockets or after threads per core.
      * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
      * Parameters delimiter can be: empty, comma, lower-case x;
      * Spaces are allowed around numbers, around letters, around delimiter.
      Approximate shorthand specification:
      KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
      
      Differential Revision: http://reviews.llvm.org/D13175
      
      llvm-svn: 249708
      dd4aa9b6
  24. Sep 21, 2015
    • Jonathan Peyton's avatar
      [OMPT] Simplify control variable logic for OMPT · b68a85d1
      Jonathan Peyton authored
      Prior to this change, OMPT had a status flag ompt_status, which could take
      several values. This was due to an earlier OMPT design that had several levels
      of enablement (ready, disabled, tracking state, tracking callbacks). The
      current OMPT design has OMPT support either on or off.
      This revision replaces ompt_status with a boolean flag ompt_enabled, which 
      simplifies the runtime logic for OMPT.
      
      Patch by John Mellor-Crummey
      
      Differential Revision: http://reviews.llvm.org/D12999
      
      llvm-svn: 248189
      b68a85d1
  25. Aug 31, 2015
  26. Aug 11, 2015
    • Jonathan Peyton's avatar
      Tidy statistics collection · 45be4500
      Jonathan Peyton authored
      This removes some statistics counters and timers which were not used,
      adds new counters and timers for some language features that were not
      monitored previously and separates the counters and timers into those
      which are of interest for investigating user code and those which are
      only of interest to the developer of the runtime itself.
      The runtime developer statistics are now ony collected if the
      additional #define KMP_DEVELOPER_STATS is set.
      
      Additional user statistics which are now collected include:
      * Count of nested parallelism (omp parallel inside a parallel region)
      * Count of omp distribute occurrences
      * Count of omp teams occurrences
      * Counts of task related statistics (taskyield, task execution, task
        cancellation, task steal)
      * Values passed to omp_set_numtheads
      * Time spent in omp single and omp master
      
      None of this affects code compiled without stats gathering enabled,
      which is the normal library build mode.
      
      This also fixes the CMake build by linking to the standard c++ library
      when building the stats library as it is a requirement.  The normal library
      does not have this requirement and its link phase is left alone.
      
      Differential Revision: http://reviews.llvm.org/D11759
      
      llvm-svn: 244677
      45be4500
  27. Aug 05, 2015
  28. Jul 21, 2015
    • Jonathan Peyton's avatar
      Fix OMPT support for task frames, parallel regions, and parallel regions + loops · 3fdf3294
      Jonathan Peyton authored
      This patch makes it possible for a performance tool that uses call stack
      unwinding to map implementation-level call stacks from master and worker
      threads into a unified global view. There are several components to this patch.
      
      include/*/ompt.h.var
        Add a new enumeration type that indicates whether the code for a master task
          for a parallel region is invoked by the user program or the runtime system
        Change the signature for OMPT parallel begin/end callbacks to indicate whether
          the master task will be invoked by the program or the runtime system. This
          enables a performance tool using call stack unwinding to handle these two
          cases differently. For this case, a profiler that uses call stack unwinding
          needs to know that the call path prefix for the master task may differ from
          those available within the begin/end callbacks if the program invokes the
          master.
      
      kmp.h
        Change the signature for __kmp_join_call to take an additional parameter
        indicating the fork_context type. This is needed to supply the OMPT parallel
        end callback with information about whether the compiler or the runtime
        invoked the master task for a parallel region.
      
      kmp_csupport.c
        Ensure that the OMPT task frame field reenter_runtime_frame is properly set
          and cleared before and after calls to fork and join threads for a parallel
          region.
        Adjust the code for the new signature for __kmp_join_call.
        Adjust the OMPT parallel begin callback invocations to carry the extra
          parameter indicating whether the program or the runtime invokes the master
          task for a parallel region.
      
      kmp_gsupport.c
        Apply all of the analogous changes described for kmp_csupport.c for the GOMP
          interface
        Add OMPT support for the GOMP combined parallel region + loop API to
          maintain the OMPT task frame field reenter_runtime_frame.
      
      kmp_runtime.c:
        Use the new information passed by __kmp_join_call to adjust the OMPT
          parallel end callback invocations to carry the extra parameter indicating
          whether the program or the runtime invokes the master task for a parallel
          region.
      
      ompt_internal.h:
        Use the flavor of the parallel region API (GNU or Intel) to determine who
          invokes the master task.
      
      Differential Revision: http://reviews.llvm.org/D11259
      
      llvm-svn: 242817
      3fdf3294
  29. Jul 13, 2015
    • Jonathan Peyton's avatar
      Fix some bugs in OMPT support · 122dd76f
      Jonathan Peyton authored
      1.) in kmp_csupport.c, move computation of parameters only needed for OMPT tracing
      inside a conditional to reduce overhead if not receiving ompt_event_master_begin
      callbacks.
      2.) in kmp_gsupport.c, remove spurious reset of OMPT reenter_runtime_frame (which 
      is set in its caller, GOMP_parallel_start correct placement of #if OMP_TRACE so 
      that state is maintained even if tracing support not included.  
      3.) in z_Linux_util.c, add architecture independent support for OMPT by setting 
      and resetting OMPT's exit_frame_ptr before and after invoking a microtask.  
      4.) On the Intel MIC, the loader refuses to retain static symbols in the 
      libomp.so shared library, even though tools need them. The loader could not be
      bullied into doing so. To accommodate this, I changed the visibility of OMPT 
      placeholder functions to public. This required additions in exports.so.txt, 
      adding extern "C" scoping in ompt-general.c so that the public placeholder
      symbols won't be mangled.
      
      Patch by John Mellor-Crummey
      
      Differential Revision: http://reviews.llvm.org/D11062
      
      llvm-svn: 242052
      122dd76f
  30. Jun 08, 2015
  31. Jun 03, 2015
  32. May 23, 2015
  33. May 07, 2015
  34. May 06, 2015
Loading