Skip to content
  1. Jun 13, 2016
    • Jonathan Peyton's avatar
      Affinity mask processing improvements · c5304aa3
      Jonathan Peyton authored
      Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
      When iterating through procs in a mask, always check if proc is in fullMask
      (this check was missing in a few places).
      
      Patch by Brian Bliss.
      
      Differential Revision: http://reviews.llvm.org/D21300
      
      llvm-svn: 272589
      c5304aa3
    • Jonathan Peyton's avatar
      Fix bitmask complement operation · 34c72c47
      Jonathan Peyton authored
      The bitmask complement operation doesn't consider the max proc id which means
      something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
      Linux system even though there aren't 600 processors on said system. This
      change has the complement bitmask and-ed with the fullmask so that it will only
      contain valid processors.
      
      Differential Revision: http://reviews.llvm.org/D21245
      
      llvm-svn: 272561
      34c72c47
  2. May 31, 2016
    • Paul Osmialowski's avatar
      Use C++11 atomics for ticket locks implementation · f7cc6aff
      Paul Osmialowski authored
      This patch replaces use of compiler builtin atomics with
      C++11 atomics for ticket locks implementation. Ticket locks
      are used in critical places of the runtime, e.g. in the tasking
      mechanism.
      
      The main reason this change was introduced is the problem
      with work stealing function on ARM architecture which suffered
      from nasty race condition. It turned out that the root cause of
      the problem lies in the way ticket locks are implemented. Changing
      compiler builtins into C++11 atomics solves the problem.
      
      Two assertions were added into kmp_tasking.c which are useful
      for detecting early symptoms of something wrong going on with
      work stealing, which were among the possible outcomes of the
      race condition.
      
      Differential Revision: http://reviews.llvm.org/D19878
      
      llvm-svn: 271324
      f7cc6aff
    • Jonathan Peyton's avatar
      Addition of OpenMP 4.5 feature: schedule(simd:static) · ef734799
      Jonathan Peyton authored
      This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
      the compiler will generate when it encounters schedule(simd: static). It just
      adds the new constant and the new switch case __kmp_for_static_init.
      
      Patch by Alex Duran.
      
      Differential Revision: http://reviews.llvm.org/D20699
      
      llvm-svn: 271320
      ef734799
    • Jonathan Peyton's avatar
      Avoid deadlock with COI · f4f96956
      Jonathan Peyton authored
      When an asynchronous offload task is completed, COI calls the runtime to queue
      a "destructor task".  When the task deques are full, a dead-lock situation
      arises where the OpenMP threads are inside but cannot progress because the COI
      thread is stuck inside the runtime trying to find a slot in a deque.
      
      This patch implements the solution where the task deques doubled in size when
      a task is being queued from a COI thread.
      
      Differential Revision: http://reviews.llvm.org/D20733
      
      llvm-svn: 271319
      f4f96956
    • Jonathan Peyton's avatar
      Offer API for setting number of loop dispatch buffers · 067325f9
      Jonathan Peyton authored
      The problem is the lack of dispatch buffers when thousands of loops with nowait,
      about 10 iterations each, are executed by hundreds of threads. We only have
      built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
      buffers.
      
      The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
      to give users same possibility I changed build-time control into run-time one,
      adding API just in case.
      
      This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
      function kmp_set_disp_num_buffers(int num_buffers).
      
      The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
      because during the serial initialization we already allocate buffers for the hot
      team, so it is too late to change the number of buffers later (or we need to
      reallocate buffers for all teams which sounds too complicated). The
      kmp_set_defaults() routine does not work for this envirable, because it calls
      serial initialization before reading the parameter string. So a new routine,
      kmp_set_disp_num_buffers(), is created so that it can set our internal global
      variable before the library initialization. If both the envirable and API used
      the envirable wins.
      
      Differential Revision: http://reviews.llvm.org/D20697
      
      llvm-svn: 271318
      067325f9
  3. May 23, 2016
  4. May 16, 2016
    • Paul Osmialowski's avatar
      Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf
      Paul Osmialowski authored
      KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
      inconsequently throughout LLVM libomp code.
      
      * some .c files that use this define do not include kmp_lock.h file,
        in effect guarded part of code are never compiled
      * some places in code use architecture-depending preprocessor
        logic expressions which effectively disable use of Futex for
        AArch64 architecture, all these places should use
        '#if KMP_USE_FUTEX' instead to avoid any further confusions
      * some places use KMP_HAS_FUTEX which is nowhere defined,
        KMP_USE_FUTEX should be used instead
      
      Differential Revision: http://reviews.llvm.org/D19629
      
      llvm-svn: 269642
      fb043fdf
  5. May 13, 2016
    • Jonathan Peyton's avatar
      Adding new kmp_aligned_malloc() entry point · f83ae31c
      Jonathan Peyton authored
      This change adds a new entry point,
      kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
      to kmp_malloc() but with the capability to return aligned memory as well.
      Other allocator routines have been adjusted so that kmp_free() can be used for
      freeing memory blocks allocated by any kmp_*alloc() routine, including the new
      kmp_aligned_malloc() routine.
      
      Differential Revision: http://reviews.llvm.org/D19814
      
      llvm-svn: 269365
      f83ae31c
  6. Apr 19, 2016
  7. Apr 18, 2016
    • Jonathan Peyton's avatar
      Runtime support for untied tasks · e6643daa
      Jonathan Peyton authored
      Introduced a counter of parts of an untied task submitted for execution. The
      counter controls whether all parts of the task are already finished. The
      compiler should generate re-submission of partially executed untied task by
      itself before exiting of each task part except for the lexical last part.
      
      Differential Revision: http://reviews.llvm.org/D19026
      
      llvm-svn: 266675
      e6643daa
  8. Mar 27, 2016
  9. Mar 15, 2016
  10. Mar 02, 2016
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 taskloop construct feature · 283a215c
      Jonathan Peyton authored
      From the standard: The taskloop construct specifies that the iterations of one
      or more associated loops will be executed in parallel using OpenMP tasks. The
      iterations are distributed across tasks created by the construct and scheduled
      to be executed.
      
      This initial implementation uses a simple linear tasks distribution algorithm.
      Later we can add other algorithms to speedup generation of huge number of tasks
      (i.e., tree-like tasks generation should be faster).
      
      This needs to be put into the OpenMP runtime library in order for the
      compiler team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17404
      
      llvm-svn: 262535
      283a215c
    • Jonathan Peyton's avatar
      Add new OpenMP 4.5 doacross loop nest feature · 71909c57
      Jonathan Peyton authored
      From the standard: A doacross loop nest is a loop nest that has cross-iteration
      dependence. An iteration is dependent on one or more lexicographically earlier
      iterations. The ordered clause parameter on a loop directive identifies the
      loop(s) associated with the doacross loop nest.
      
      The init/fini routines allocate/free doacross buffer(s) for each loop for each
      thread.  The wait routine waits for a flag designated by the dependence vector.
      The post routine sets the flag designated by current iteration vector.  We use
      a similar technique of shared buffer indices that covers up to 7 nowait loops
      executed simultaneously by different threads (number 7 has no real meaning,
      just heuristic value).  Also, the size of structures are kept intact via
      reducing dummy arrays.
      
      This needs to be put into the OpenMP runtime library in order for the compiler
      team to develop the compiler side of the implementation.
      
      Differential Revision: http://reviews.llvm.org/D17399
      
      llvm-svn: 262532
      71909c57
  11. Feb 25, 2016
    • Jonathan Peyton's avatar
      Add initial support for OpenMP 4.5 task priority feature · 2851072d
      Jonathan Peyton authored
      The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
      But as of now, nothing is done with it.  We just handle the environment variable
      and add the new api: omp_get_max_task_priority() which returns that value or
      zero if it is not set.
      
      Differential Revision: http://reviews.llvm.org/D17411
      
      llvm-svn: 261908
      2851072d
    • Jonathan Peyton's avatar
      dd new OpenMP 4.5 schedule clause modifiers (monotonic/non-monotonic) feature · ea0fe1df
      Jonathan Peyton authored
      The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
      setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
      Macros are added to probe if monotonic or non-monotonic is specified
      (SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
      and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
      
      Currently, nothing is done with the modifiers.
      
      Also, this patch adds some comments on the use of the enumerations in at least
       one place where it is subtle.
      
      Differential Revision: http://reviews.llvm.org/D17406
      
      llvm-svn: 261906
      ea0fe1df
  12. Feb 18, 2016
  13. Feb 12, 2016
    • Jonathan Peyton's avatar
      Fix incorrect task_team in __kmp_give_task · 134f90d5
      Jonathan Peyton authored
      When a target task finishes and it tries to access the th_task_team from the
      threads in the team where it was created, th_task_team can be NULL or point to
      a different place when that thread started a nested region that is still
      running. Finding the exact task_team that the threads were using is difficult
      as it would require to unwind the task_state_memo_stack. So a new field was added
      in the taskdata structure to point to the active task_team when the task was
      created.
      
      llvm-svn: 260615
      134f90d5
  14. Jan 29, 2016
    • Jonathan Peyton's avatar
      Fix task dependency performance problem · 7d45451a
      Jonathan Peyton authored
      In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
      performance issue was found with libomp's task dependencies.  The task
      dependencies hash table has an issue with collisions. The current table size is
      a power of two. This combined with the current hash function causes a large
      number of collisions to occurr. Also, the current size (64) is too small for
      larger applications so the table size is increased.
      
      This patch creates a two level hash table approach for task dependencies. The
      implicit task is considered the "master" or "top-level" task which has a large
      static sized hash table (997), and nested tasks will have smaller hash
      tables (97). Prime numbers were chosen to help reduce collisions.
      
      Differential Revision: http://reviews.llvm.org/D16640
      
      llvm-svn: 259113
      7d45451a
  15. Jan 27, 2016
  16. Jan 05, 2016
  17. Jan 04, 2016
  18. Dec 11, 2015
    • Jonathan Peyton's avatar
      Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 3 · b87b5813
      Jonathan Peyton authored
      This change set includes all changes to make the code conform to the OMP 4.5 specification:
      
      * Removed hint / hinted_init definitions from include/40 files
      * Hint values are powers of 2 to enable composition (4.5 spec)
      * Hinted lock initialization functions were renamed (4.5 spec)
        kmp_init_lock_hinted -> omp_init_lock_with_hint
        kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
      * __kmpc_critical_section_with_hint was added to support a critical section with
        a hint (4.5 spec)
      * __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
        an internal lock type
      * kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
        internal entries for the hinted lock initializers. The preivous internal
        functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
        places
      * Added the two init functions to dllexports
      * KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
      
      Differential Revision: http://reviews.llvm.org/D15205
      
      llvm-svn: 255376
      b87b5813
  19. Nov 30, 2015
    • Jonathan Peyton's avatar
      Adding Hwloc library option for affinity mechanism · 01dcf36b
      Jonathan Peyton authored
      These changes allow libhwloc to be used as the topology discovery/affinity
      mechanism for libomp.  It is supported on Unices. The code additions:
      * Canonicalize KMP_CPU_* interface macros so bitmask operations are
        implementation independent and work with both hwloc bitmaps and libomp
        bitmaps.  So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
        the like. These are all in kmp.h and appropriately placed.
      * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
        interface to create a libomp address2os object which the rest of libomp knows
        how to handle already.
      * To build, use -DLIBOMP_USE_HWLOC=on and
        -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
        can't find the library or hwloc.h, then it will tell you and exit.
      
      Differential Revision: http://reviews.llvm.org/D13991
      
      llvm-svn: 254320
      01dcf36b
  20. Nov 04, 2015
  21. Oct 19, 2015
    • Dimitry Andric's avatar
      On FreeBSD, PTHREADS_THREADS_MAX does not fit into an int, leading to · 9b8c353c
      Dimitry Andric authored
      warnings similar to the following:
      
          runtime/src/kmp_global.c:117:35: warning: implicit conversion from
          'unsigned long' to 'int' changes value from 18446744073709551615 to -1
          [-Wconstant-conversion]
          int           __kmp_sys_max_nth = KMP_MAX_NTH;
                        ~~~~~~~~~~~~~~~~~   ^~~~~~~~~~~
          runtime/src/kmp.h:849:34: note: expanded from macro 'KMP_MAX_NTH'
          #    define KMP_MAX_NTH          PTHREAD_THREADS_MAX
                                           ^~~~~~~~~~~~~~~~~~~
      
      Clamp KMP_MAX_NTH to INT_MAX to avoid these warnings.  Also use INT_MAX
      whenever PTHREAD_THREADS_MAX is not defined at all.
      
      Differential Revision: http://reviews.llvm.org/D13827
      
      llvm-svn: 250708
      9b8c353c
  22. Oct 09, 2015
    • Jonathan Peyton's avatar
      [OMPT] Initialize task fields only if needed · b401db6d
      Jonathan Peyton authored
      Because __kmp_task_init_ompt is called for every initial task in each thread
      and always generated task ids, this was a big performance issue on bigger
      systems even without any tool attached.  After changing the initialization 
      interface to ompt_tool, we can now rely on already knowing whether a tool is
      attached and OMPT is enabled at this point.
      
      Patch by Jonas Hahnfeld
      
      Differential Revision: http://reviews.llvm.org/D13494
      
      llvm-svn: 249855
      b401db6d
  23. Oct 08, 2015
    • Jonathan Peyton's avatar
      Added sockets to the syntax of KMP_PLACE_THREADS environment variable. · dd4aa9b6
      Jonathan Peyton authored
      Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable.
      Some limitations:
      * The number of sockets and then optional offset should be specified first (before other parameters).
      * The letter designation is mandatory for sockets and then for other parameters.
      * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped.
      * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively.
      * The number of cores per socket cannot be specified before sockets or after threads per core.
      * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset);
      * Parameters delimiter can be: empty, comma, lower-case x;
      * Spaces are allowed around numbers, around letters, around delimiter.
      Approximate shorthand specification:
      KMP_PLACE_THREADS="[num_sockets(S|s)[[delim]offset(O|o)][delim]][num_cores_per_socket(C|c)[[delim]offset(O|o)][delim]][num_threads_per_core(T|t)]"
      
      Differential Revision: http://reviews.llvm.org/D13175
      
      llvm-svn: 249708
      dd4aa9b6
  24. Sep 21, 2015
  25. Sep 10, 2015
  26. Aug 31, 2015
  27. Aug 28, 2015
    • Jonathan Peyton's avatar
      [OpenMP] [CMake] Removing expand-vars.pl in favor of CMake's configure_file() · c0225ca2
      Jonathan Peyton authored
      Currently, the libomp CMake build system uses a Perl script to configure files
      (tools/expand-vars.pl). This patch replaces the use of the Perl script by using
      CMake's configure_file() function. The major changes include:
      1. *.var has every $KMP_* variable changed to @LIBOMP_*@
      2. kmp_config.h.cmake is a new file which contains all the feature macros and
         #cmakedefine lines
      3. Most of the -D lines have been moved from LibompDefinitions.cmake but some
         OS specific MACROs (e.g., _GNU_SOURCE) remain.
      4. All expand-vars.pl related logic is removed from the CMake files.
      
      One important note about this change is that it breaks the old Perl+Makefile
      build system because it can't create kmp_config.h properly.
      
      Differential Review: http://reviews.llvm.org/D12211
      
      llvm-svn: 246314
      c0225ca2
  28. Aug 13, 2015
    • Jonathan Peyton's avatar
      Remove unused KMP_SETVERSION macro · 221104be
      Jonathan Peyton authored
      This macro and the small amount of code along with it are unused and
      can be removed.  The macro is never defined in any build script or source file.
      
      llvm-svn: 244899
      221104be
  29. Jul 21, 2015
    • Jonathan Peyton's avatar
      Fix OMPT support for task frames, parallel regions, and parallel regions + loops · 3fdf3294
      Jonathan Peyton authored
      This patch makes it possible for a performance tool that uses call stack
      unwinding to map implementation-level call stacks from master and worker
      threads into a unified global view. There are several components to this patch.
      
      include/*/ompt.h.var
        Add a new enumeration type that indicates whether the code for a master task
          for a parallel region is invoked by the user program or the runtime system
        Change the signature for OMPT parallel begin/end callbacks to indicate whether
          the master task will be invoked by the program or the runtime system. This
          enables a performance tool using call stack unwinding to handle these two
          cases differently. For this case, a profiler that uses call stack unwinding
          needs to know that the call path prefix for the master task may differ from
          those available within the begin/end callbacks if the program invokes the
          master.
      
      kmp.h
        Change the signature for __kmp_join_call to take an additional parameter
        indicating the fork_context type. This is needed to supply the OMPT parallel
        end callback with information about whether the compiler or the runtime
        invoked the master task for a parallel region.
      
      kmp_csupport.c
        Ensure that the OMPT task frame field reenter_runtime_frame is properly set
          and cleared before and after calls to fork and join threads for a parallel
          region.
        Adjust the code for the new signature for __kmp_join_call.
        Adjust the OMPT parallel begin callback invocations to carry the extra
          parameter indicating whether the program or the runtime invokes the master
          task for a parallel region.
      
      kmp_gsupport.c
        Apply all of the analogous changes described for kmp_csupport.c for the GOMP
          interface
        Add OMPT support for the GOMP combined parallel region + loop API to
          maintain the OMPT task frame field reenter_runtime_frame.
      
      kmp_runtime.c:
        Use the new information passed by __kmp_join_call to adjust the OMPT
          parallel end callback invocations to carry the extra parameter indicating
          whether the program or the runtime invokes the master task for a parallel
          region.
      
      ompt_internal.h:
        Use the flavor of the parallel region API (GNU or Intel) to determine who
          invokes the master task.
      
      Differential Revision: http://reviews.llvm.org/D11259
      
      llvm-svn: 242817
      3fdf3294
  30. Jul 09, 2015
    • Jonathan Peyton's avatar
      Enable debugger support · 8fbb49ab
      Jonathan Peyton authored
      These changes enable external debuggers to conveniently interface with 
      the LLVM OpenMP Library.  Structures are added which describe the important
      internal structures of the OpenMP Library e.g., teams, threads, etc.
      This feature is turned on by default (CMake variable LIBOMP_USE_DEBUGGER)
      and can be turned off with -DLIBOMP_USE_DEBUGGER=off.
      
      Differential Revision: http://reviews.llvm.org/D10038
      
      llvm-svn: 241832
      8fbb49ab
  31. Jun 04, 2015
    • Jonathan Peyton's avatar
      Fix some sign compare warnings. · 1e7a1ddc
      Jonathan Peyton authored
      This change changes kmp_bstate.old_tid to sign integer instead of unsigned integer.
      It also defines two new macros KMP_NSEC_PER_SEC and KMP_USEC_PER_SEC which lets us take
      control of the sign (we want them to be longs).  Also, in kmp_wait_release.h, the byteref()
      function's return type is changed from char to unsigned char.
      
      llvm-svn: 239057
      1e7a1ddc
Loading