Skip to content
  1. Nov 14, 2016
    • Jonathan Peyton's avatar
      Introduce dynamic affinity dispatch capabilities · 1cdd87ad
      Jonathan Peyton authored
      This set of changes enables the affinity interface (Either the preexisting
      native operating system or HWLOC) to be dynamically set at runtime
      initialization. The point of this change is that we were seeing performance
      degradations when using HWLOC. This allows the user to use the old affinity
      mechanisms which on large machines (>64 cores) makes a large difference in
      initialization time.
      
      These changes mostly move affinity code under a small class hierarchy:
      
      KMPAffinity
        class Mask {}
      KMPNativeAffinity : public KMPAffinity
        class Mask : public KMPAffinity::Mask
      KMPHwlocAffinity
        class Mask : public KMPAffinity::Mask
      
      Since all interface functions (for both affinity and the mask implementation)
      are virtual, the implementation can be chosen at runtime initialization.
      
      Differential Revision: https://reviews.llvm.org/D26356
      
      llvm-svn: 286890
      1cdd87ad
  2. Oct 27, 2016
  3. Oct 18, 2016
  4. Oct 07, 2016
  5. Sep 27, 2016
    • Jonathan Peyton's avatar
      Disable monitor thread creation by default. · b66d1aab
      Jonathan Peyton authored
      This change set disables creation of the monitor thread by default.  The global
      counter maintained by the monitor thread was replaced by logic that uses system
      time directly, and cyclic yielding on Linux target was also removed since there
      was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1)
      enables creation of monitor thread again if it is really necessary for some
      reasons.
      
      Differential Revision: https://reviews.llvm.org/D24739
      
      llvm-svn: 282507
      b66d1aab
  6. Sep 12, 2016
    • Jonathan Peyton's avatar
      Fix bitmask upper bounds check · 7c465a5f
      Jonathan Peyton authored
      Rather than checking KMP_CPU_SETSIZE, which doesn't exist when using Hwloc, we
      use the get_max_proc() function which can vary based on the operating system.
      For example on Windows with multiple processor groups, it might be the case that
      the highest bit possible in the bitmask is not equal to the number of hardware
      threads on the machine but something higher than that.
      
      Differential Revision: https://reviews.llvm.org/D24206
      
      llvm-svn: 281245
      7c465a5f
  7. Sep 09, 2016
  8. Aug 03, 2016
  9. Jul 11, 2016
  10. Jul 08, 2016
    • Jonathan Peyton's avatar
      Improving EPCC performance when linking with hwloc · 4d3c2130
      Jonathan Peyton authored
      When linking with libhwloc, the ORDERED EPCC test slows down on big
      machines (> 48 cores). Performance analysis showed that a cache thrash
      was occurring and this padding helps alleviate the problem.
      
      Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate
      the references to the global shared variables by instead creating a local
      variable, oversubscribed and instead checking that.
      
      Differential Revision: http://reviews.llvm.org/D22093
      
      llvm-svn: 274894
      4d3c2130
  11. Jun 16, 2016
    • Jonathan Peyton's avatar
      Teach OpenMP Library to use Hwloc on Windows · 0f3c2b92
      Jonathan Peyton authored
      This patch allows a user to enable Hwloc on windows. There are three main
      changes in here:
      1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
                implementation of affinity) because they need to be defined when
                KMP_USE_HWLOC is on as well.
      2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
              __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
      3.teach CMake how to include hwloc when building Windows
      
      Another minor change in here is to make sure that anything under KMP_USE_HWLOC
      is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
      builds from requiring anything from Hwloc.
      
      Differential Revision: http://reviews.llvm.org/D21441
      
      llvm-svn: 272951
      0f3c2b92
  12. Jun 14, 2016
  13. Jun 13, 2016
    • Jonathan Peyton's avatar
      Affinity mask processing improvements · c5304aa3
      Jonathan Peyton authored
      Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
      When iterating through procs in a mask, always check if proc is in fullMask
      (this check was missing in a few places).
      
      Patch by Brian Bliss.
      
      Differential Revision: http://reviews.llvm.org/D21300
      
      llvm-svn: 272589
      c5304aa3
    • Jonathan Peyton's avatar
      Fix bitmask complement operation · 34c72c47
      Jonathan Peyton authored
      The bitmask complement operation doesn't consider the max proc id which means
      something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
      Linux system even though there aren't 600 processors on said system. This
      change has the complement bitmask and-ed with the fullmask so that it will only
      contain valid processors.
      
      Differential Revision: http://reviews.llvm.org/D21245
      
      llvm-svn: 272561
      34c72c47
  14. May 31, 2016
    • Paul Osmialowski's avatar
      Use C++11 atomics for ticket locks implementation · f7cc6aff
      Paul Osmialowski authored
      This patch replaces use of compiler builtin atomics with
      C++11 atomics for ticket locks implementation. Ticket locks
      are used in critical places of the runtime, e.g. in the tasking
      mechanism.
      
      The main reason this change was introduced is the problem
      with work stealing function on ARM architecture which suffered
      from nasty race condition. It turned out that the root cause of
      the problem lies in the way ticket locks are implemented. Changing
      compiler builtins into C++11 atomics solves the problem.
      
      Two assertions were added into kmp_tasking.c which are useful
      for detecting early symptoms of something wrong going on with
      work stealing, which were among the possible outcomes of the
      race condition.
      
      Differential Revision: http://reviews.llvm.org/D19878
      
      llvm-svn: 271324
      f7cc6aff
    • Jonathan Peyton's avatar
      Addition of OpenMP 4.5 feature: schedule(simd:static) · ef734799
      Jonathan Peyton authored
      This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
      the compiler will generate when it encounters schedule(simd: static). It just
      adds the new constant and the new switch case __kmp_for_static_init.
      
      Patch by Alex Duran.
      
      Differential Revision: http://reviews.llvm.org/D20699
      
      llvm-svn: 271320
      ef734799
    • Jonathan Peyton's avatar
      Avoid deadlock with COI · f4f96956
      Jonathan Peyton authored
      When an asynchronous offload task is completed, COI calls the runtime to queue
      a "destructor task".  When the task deques are full, a dead-lock situation
      arises where the OpenMP threads are inside but cannot progress because the COI
      thread is stuck inside the runtime trying to find a slot in a deque.
      
      This patch implements the solution where the task deques doubled in size when
      a task is being queued from a COI thread.
      
      Differential Revision: http://reviews.llvm.org/D20733
      
      llvm-svn: 271319
      f4f96956
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  15. May 23, 2016
  16. May 16, 2016
    • Paul Osmialowski's avatar
      Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf
      Paul Osmialowski authored
      KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
      inconsequently throughout LLVM libomp code.
      
      * some .c files that use this define do not include kmp_lock.h file,
        in effect guarded part of code are never compiled
      * some places in code use architecture-depending preprocessor
        logic expressions which effectively disable use of Futex for
        AArch64 architecture, all these places should use
        '#if KMP_USE_FUTEX' instead to avoid any further confusions
      * some places use KMP_HAS_FUTEX which is nowhere defined,
        KMP_USE_FUTEX should be used instead
      
      Differential Revision: http://reviews.llvm.org/D19629
      
      llvm-svn: 269642
      fb043fdf
  17. May 13, 2016
    • Jonathan Peyton's avatar
      Adding new kmp_aligned_malloc() entry point · f83ae31c
      Jonathan Peyton authored
      This change adds a new entry point,
      kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
      to kmp_malloc() but with the capability to return aligned memory as well.
      Other allocator routines have been adjusted so that kmp_free() can be used for
      freeing memory blocks allocated by any kmp_*alloc() routine, including the new
      kmp_aligned_malloc() routine.
      
      Differential Revision: http://reviews.llvm.org/D19814
      
      llvm-svn: 269365
      f83ae31c
  18. Apr 19, 2016
  19. Apr 18, 2016
    • Jonathan Peyton's avatar
      Runtime support for untied tasks · e6643daa
      Jonathan Peyton authored
      Introduced a counter of parts of an untied task submitted for execution. The
      counter controls whether all parts of the task are already finished. The
      compiler should generate re-submission of partially executed untied task by
      itself before exiting of each task part except for the lexical last part.
      
      Differential Revision: http://reviews.llvm.org/D19026
      
      llvm-svn: 266675
      e6643daa
  20. Mar 27, 2016
  21. Mar 15, 2016
  22. Mar 02, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 taskloop construct feature · 283a215c
      Jonathan Peyton authored
      From the standard: The taskloop construct specifies that the iterations of one
      or more associated loops will be executed in parallel using OpenMP tasks. The
      iterations are distributed across tasks created by the construct and scheduled
      to be executed.
      
      This initial implementation uses a simple linear tasks distribution algorithm.
      Later we can add other algorithms to speedup generation of huge number of tasks
      (i.e., tree-like tasks generation should be faster).
      
      This needs to be put into the OpenMP runtime library in order for the
      compiler team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17404
      
      llvm-svn: 262535
      283a215c
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 doacross loop nest feature · 71909c57
      Jonathan Peyton authored
      From the standard: A doacross loop nest is a loop nest that has cross-iteration
      dependence. An iteration is dependent on one or more lexicographically earlier
      iterations. The ordered clause parameter on a loop directive identifies the
      loop(s) associated with the doacross loop nest.
      
      The init/fini routines allocate/free doacross buffer(s) for each loop for each
      thread.  The wait routine waits for a flag designated by the dependence vector.
      The post routine sets the flag designated by current iteration vector.  We use
      a similar technique of shared buffer indices that covers up to 7 nowait loops
      executed simultaneously by different threads (number 7 has no real meaning,
      just heuristic value).  Also, the size of structures are kept intact via
      reducing dummy arrays.
      
      This needs to be put into the OpenMP runtime library in order for the compiler
      team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17399
      
      llvm-svn: 262532
      71909c57
  23. Feb 25, 2016
    • Jonathan Peyton's avatar
      Add initial support for OpenMP 4.5 task priority feature · 2851072d
      Jonathan Peyton authored
      The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
      But as of now, nothing is done with it.  We just handle the environment variable
      and add the new api: omp_get_max_task_priority() which returns that value or
      zero if it is not set.
      
      Differential Revision: http://reviews.llvm.org/D17411
      
      llvm-svn: 261908
      2851072d
    • Jonathan Peyton's avatar
      dd new OpenMP 4.5 schedule clause modifiers (monotonic/non-monotonic) feature · ea0fe1df
      Jonathan Peyton authored
      The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
      setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
      Macros are added to probe if monotonic or non-monotonic is specified
      (SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
      and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
      
      Currently, nothing is done with the modifiers.
      
      Also, this patch adds some comments on the use of the enumerations in at least
       one place where it is subtle.
      
      Differential Revision: http://reviews.llvm.org/D17406
      
      llvm-svn: 261906
      ea0fe1df
  24. Feb 18, 2016
  25. Feb 12, 2016
    • Jonathan Peyton's avatar
      Fix incorrect task_team in __kmp_give_task · 134f90d5
      Jonathan Peyton authored
      When a target task finishes and it tries to access the th_task_team from the
      threads in the team where it was created, th_task_team can be NULL or point to
      a different place when that thread started a nested region that is still
      running. Finding the exact task_team that the threads were using is difficult
      as it would require to unwind the task_state_memo_stack. So a new field was added
      in the taskdata structure to point to the active task_team when the task was
      created.
      
      llvm-svn: 260615
      134f90d5
  26. Jan 29, 2016
    • Jonathan Peyton's avatar
      Fix task dependency performance problem · 7d45451a
      Jonathan Peyton authored
      In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
      performance issue was found with libomp's task dependencies.  The task
      dependencies hash table has an issue with collisions. The current table size is
      a power of two. This combined with the current hash function causes a large
      number of collisions to occurr. Also, the current size (64) is too small for
      larger applications so the table size is increased.
      
      This patch creates a two level hash table approach for task dependencies. The
      implicit task is considered the "master" or "top-level" task which has a large
      static sized hash table (997), and nested tasks will have smaller hash
      tables (97). Prime numbers were chosen to help reduce collisions.
      
      Differential Revision: http://reviews.llvm.org/D16640
      
      llvm-svn: 259113
      7d45451a
  27. Jan 27, 2016
  28. Jan 05, 2016
  29. Jan 04, 2016
  30. Dec 11, 2015
    • Jonathan Peyton's avatar
      Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 3 · b87b5813
      Jonathan Peyton authored
      This change set includes all changes to make the code conform to the OMP 4.5 specification:
      
      * Removed hint / hinted_init definitions from include/40 files
      * Hint values are powers of 2 to enable composition (4.5 spec)
      * Hinted lock initialization functions were renamed (4.5 spec)
        kmp_init_lock_hinted -> omp_init_lock_with_hint
        kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
      * __kmpc_critical_section_with_hint was added to support a critical section with
        a hint (4.5 spec)
      * __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
        an internal lock type
      * kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
        internal entries for the hinted lock initializers. The preivous internal
        functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
        places
      * Added the two init functions to dllexports
      * KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
      
      Differential Revision: http://reviews.llvm.org/D15205
      
      llvm-svn: 255376
      b87b5813
  31. Nov 30, 2015
    • Jonathan Peyton's avatar
      Adding Hwloc library option for affinity mechanism · 01dcf36b
      Jonathan Peyton authored
      These changes allow libhwloc to be used as the topology discovery/affinity
      mechanism for libomp.  It is supported on Unices. The code additions:
      * Canonicalize KMP_CPU_* interface macros so bitmask operations are
        implementation independent and work with both hwloc bitmaps and libomp
        bitmaps.  So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
        the like. These are all in kmp.h and appropriately placed.
      * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
        interface to create a libomp address2os object which the rest of libomp knows
        how to handle already.
      * To build, use -DLIBOMP_USE_HWLOC=on and
        -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
        can't find the library or hwloc.h, then it will tell you and exit.
      
      Differential Revision: http://reviews.llvm.org/D13991
      
      llvm-svn: 254320
      01dcf36b
  32. Nov 04, 2015
Loading