Skip to content
  1. Nov 14, 2016
    • Jonathan Peyton's avatar
      Update stats-gathering code · 5375fe82
      Jonathan Peyton authored
      Have developer timers use partitioning scheme which also required that some
      redundant developer timers be removed in favor of the already existing normal
      timers. Move per thread stats initialization to just after global thread id
      assignment which is as early as possible. Also put all global stats
      initialization code in __kmp_stats_init() and all global stats destruction code
      in __kmp_stats_fini().
      
      Differential Revision: https://reviews.llvm.org/D26361
      
      llvm-svn: 286892
      5375fe82
    • Jonathan Peyton's avatar
      Introduce dynamic affinity dispatch capabilities · 1cdd87ad
      Jonathan Peyton authored
      This set of changes enables the affinity interface (Either the preexisting
      native operating system or HWLOC) to be dynamically set at runtime
      initialization. The point of this change is that we were seeing performance
      degradations when using HWLOC. This allows the user to use the old affinity
      mechanisms which on large machines (>64 cores) makes a large difference in
      initialization time.
      
      These changes mostly move affinity code under a small class hierarchy:
      
      KMPAffinity
        class Mask {}
      KMPNativeAffinity : public KMPAffinity
        class Mask : public KMPAffinity::Mask
      KMPHwlocAffinity
        class Mask : public KMPAffinity::Mask
      
      Since all interface functions (for both affinity and the mask implementation)
      are virtual, the implementation can be chosen at runtime initialization.
      
      Differential Revision: https://reviews.llvm.org/D26356
      
      llvm-svn: 286890
      1cdd87ad
  2. Oct 07, 2016
  3. Sep 27, 2016
    • Jonathan Peyton's avatar
      Disable monitor thread creation by default. · b66d1aab
      Jonathan Peyton authored
      This change set disables creation of the monitor thread by default.  The global
      counter maintained by the monitor thread was replaced by logic that uses system
      time directly, and cyclic yielding on Linux target was also removed since there
      was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1)
      enables creation of monitor thread again if it is really necessary for some
      reasons.
      
      Differential Revision: https://reviews.llvm.org/D24739
      
      llvm-svn: 282507
      b66d1aab
  4. Sep 09, 2016
  5. Jun 16, 2016
    • Jonathan Peyton's avatar
      Teach OpenMP Library to use Hwloc on Windows · 0f3c2b92
      Jonathan Peyton authored
      This patch allows a user to enable Hwloc on windows. There are three main
      changes in here:
      1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
                implementation of affinity) because they need to be defined when
                KMP_USE_HWLOC is on as well.
      2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
              __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
      3.teach CMake how to include hwloc when building Windows
      
      Another minor change in here is to make sure that anything under KMP_USE_HWLOC
      is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
      builds from requiring anything from Hwloc.
      
      Differential Revision: http://reviews.llvm.org/D21441
      
      llvm-svn: 272951
      0f3c2b92
  6. Jun 14, 2016
    • Jonathan Peyton's avatar
      Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be
      Jonathan Peyton authored
      OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
      45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
      41 is deprecated and to use 45 instead.
      
      llvm-svn: 272687
      df6818be
  7. May 31, 2016
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  8. Mar 27, 2016
  9. Feb 25, 2016
  10. Dec 17, 2015
  11. Nov 30, 2015
    • Jonathan Peyton's avatar
      Adding Hwloc library option for affinity mechanism · 01dcf36b
      Jonathan Peyton authored
      These changes allow libhwloc to be used as the topology discovery/affinity
      mechanism for libomp.  It is supported on Unices. The code additions:
      * Canonicalize KMP_CPU_* interface macros so bitmask operations are
        implementation independent and work with both hwloc bitmaps and libomp
        bitmaps.  So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
        the like. These are all in kmp.h and appropriately placed.
      * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
        interface to create a libomp address2os object which the rest of libomp knows
        how to handle already.
      * To build, use -DLIBOMP_USE_HWLOC=on and
        -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
        can't find the library or hwloc.h, then it will tell you and exit.
      
      Differential Revision: http://reviews.llvm.org/D13991
      
      llvm-svn: 254320
      01dcf36b
  12. Nov 04, 2015
  13. Oct 08, 2015
    • Jonathan Peyton's avatar
    • Jonathan Peyton's avatar
      Added sockets to the syntax of KMP_PLACE_THREADS environment variable. · dd4aa9b6
      Jonathan Peyton authored
      Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
      Some limitations:
      * The number of sockets and then optional offset should be specified first (before other parameters).
      * The letter designation is mandatory for sockets and then for other parameters.
      * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
      * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
      * The number of cores per socket cannot be specified before sockets or after threads per core.
      * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
      * Parameters delimiter can be: empty, comma, lower-case x;
      * Spaces are allowed around numbers, around letters, around delimiter.
      Approximate shorthand specification:
      KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
      
      Differential Revision: http://reviews.llvm.org/D13175
      
      llvm-svn: 249708
      dd4aa9b6
  14. Aug 13, 2015
    • Jonathan Peyton's avatar
      Remove unused KMP_SETVERSION macro · 221104be
      Jonathan Peyton authored
      This macro and the small amount of code along with it are unused and
      can be removed.  The macro is never defined in any build script or source file.
      
      llvm-svn: 244899
      221104be
  15. May 06, 2015
  16. Mar 10, 2015
  17. Feb 20, 2015
  18. Jan 29, 2015
  19. Jan 27, 2015
  20. Oct 07, 2014
    • Jim Cownie's avatar
      I apologise in advance for the size of this check-in. At Intel we do · 4cc4bb4c
      Jim Cownie authored
      understand that this is not friendly, and are working to change our
      internal code-development to make it easier to make development
      features available more frequently and in finer (more functional)
      chunks. Unfortunately we haven't got that in place yet, and unpicking
      this into multiple separate check-ins would be non-trivial, so please
      bear with me on this one. We should be better in the future.
      
      Apologies over, what do we have here?
      
      GGC 4.9 compatibility
      --------------------
      * We have implemented the new entrypoints used by code compiled by GCC
      4.9 to implement the same functionality in gcc 4.8. Therefore code
      compiled with gcc 4.9 that used to work will continue to do so.
      However, there are some other new entrypoints (associated with task
      cancellation) which are not implemented. Therefore user code compiled
      by gcc 4.9 that uses these new features will not link against the LLVM
      runtime. (It remains unclear how to handle those entrypoints, since
      the GCC interface has potentially unpleasant performance implications
      for join barriers even when cancellation is not used)
      
      --- new parallel entry points ---
      new entry points that aren't OpenMP 4.0 related
      These are implemented fully :-
            GOMP_parallel_loop_dynamic()
            GOMP_parallel_loop_guided()
            GOMP_parallel_loop_runtime()
            GOMP_parallel_loop_static()
            GOMP_parallel_sections()
            GOMP_parallel()
      
      --- cancellation entry points ---
      Currently, these only give a runtime error if OMP_CANCELLATION is true
      because our plain barriers don't check for cancellation while waiting
              GOMP_barrier_cancel()
              GOMP_cancel()
              GOMP_cancellation_point()
              GOMP_loop_end_cancel()
              GOMP_sections_end_cancel()
      
      --- taskgroup entry points ---
      These are implemented fully.
            GOMP_taskgroup_start()
            GOMP_taskgroup_end()
      
      --- target entry points ---
      These are empty (as they are in libgomp)
           GOMP_target()
           GOMP_target_data()
           GOMP_target_end_data()
           GOMP_target_update()
           GOMP_teams()
      
      Improvements in Barriers and Fork/Join
      --------------------------------------
      * Barrier and fork/join code is now in its own file (which makes it
      easier to understand and modify).
      * Wait/release code is now templated and in its own file; suspend/resume code is also templated
      * There's a new, hierarchical, barrier, which exploits the
      cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve
      fork/join and barrier performance.
      
      ***BEWARE*** the new source files have *not* been added to the legacy
      Cmake build system. If you want to use that fixes wil be required.
      
      Statistics Collection Code
      --------------------------
      * New code has been added to collect application statistics (if this
      is enabled at library compile time; by default it is not). The
      statistics code itself is generally useful, the lightweight timing
      code uses the X86 rdtsc instruction, so will require changes for other
      architectures.
      The intent of this code is not for users to tune their codes but
      rather 
      1) For timing code-paths inside the runtime
      2) For gathering general properties of OpenMP codes to focus attention
      on which OpenMP features are most used. 
      
      Nested Hot Teams
      ----------------
      * The runtime now maintains more state to reduce the overhead of
      creating and destroying inner parallel teams. This improves the
      performance of code that repeatedly uses nested parallelism with the
      same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL
      envirable to a depth to enable this (and, of course, OMP_NESTED=true
      to enable nested parallelism at all).
      
      Improved Intel(r) VTune(Tm) Amplifier support
      ---------------------------------------------
      * The runtime provides additional information to Vtune via the
      itt_notify interface to allow it to display better OpenMP specific
      analyses of load-imbalance.
      
      Support for OpenMP Composite Statements
      ---------------------------------------
      * Implement new entrypoints required by some of the OpenMP 4.1
      composite statements.
      
      Improved ifdefs
      ---------------
      * More separation of concepts ("Does this platform do X?") from
      platforms ("Are we compiling for platform Y?"), which should simplify
      future porting.
      
      
      ScaleMP* contribution
      ---------------------
      Stack padding to improve the performance in their environment where
      cross-node coherency is managed at the page level.
      
      Redesign of wait and release code
      ---------------------------------
      The code is simplified and performance improved.
      
      Bug Fixes
      ---------
          *Fixes for Windows multiple processor groups.
          *Fix Fortran module build on Linux: offload attribute added.
          *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen.
          *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable.
      
      llvm-svn: 219214
      4cc4bb4c
  21. Aug 07, 2014
  22. Mar 02, 2014
  23. Feb 24, 2014
  24. Dec 23, 2013
    • Jim Cownie's avatar
      For your Christmas hacking pleasure. · 181b4bb3
      Jim Cownie authored
      This release use aligns with Intel(r) Composer XE 2013 SP1 Product Update 2 
      
      New features
      * The library can now be built with clang (though wiht some
        limitations since clang does not support 128 bit floats)
      * Support for Vtune analysis of load imbalance
      * Code contribution from Steven Noonan to build the runtime for ARM*
        architecture processors 
      * First implementation of runtime API for OpenMP cancellation
      
      Bug Fixes
      * Fixed hang on Windows (only) when using KMP_BLOCKTIME=0
      
      llvm-svn: 197914
      181b4bb3
  25. Sep 27, 2013
Loading