Commits · 5dee8c43da578307cf3de40ab808e7f15ffeaac8 · Lorenzo Albano / LLVM bpEVL

Nov 14, 2016

Introduce dynamic affinity dispatch capabilities · 1cdd87ad

Jonathan Peyton authored Nov 14, 2016

This set of changes enables the affinity interface (Either the preexisting
native operating system or HWLOC) to be dynamically set at runtime
initialization. The point of this change is that we were seeing performance
degradations when using HWLOC. This allows the user to use the old affinity
mechanisms which on large machines (>64 cores) makes a large difference in
initialization time.

These changes mostly move affinity code under a small class hierarchy:

KMPAffinity
  class Mask {}
KMPNativeAffinity : public KMPAffinity
  class Mask : public KMPAffinity::Mask
KMPHwlocAffinity
  class Mask : public KMPAffinity::Mask

Since all interface functions (for both affinity and the mask implementation)
are virtual, the implementation can be chosen at runtime initialization.

Differential Revision: https://reviews.llvm.org/D26356

llvm-svn: 286890

1cdd87ad

Oct 27, 2016
- Fixed a memory leak related to task dependencies. · df0d75ed
  Andrey Churbanov authored Oct 27, 2016
```
Differential Revision: http://reviews.llvm.org/D25504

Patch by Alex Duran.

llvm-svn: 285283
```
  df0d75ed
Oct 18, 2016

Fix OpenMP 4.0 library build · 0ac7b75f

Jonathan Peyton authored Oct 18, 2016

Patch by Andrey Churbanov

Differential Revision: https://reviews.llvm.org/D25505

llvm-svn: 284499

0ac7b75f

Oct 07, 2016

Code cleanup for the runtime without monitor thread · e1c7c13c

Jonathan Peyton authored Oct 07, 2016

This change removes/disables unnecessary code when monitor thread is not used.

Patch by Hansang Bae

Differential Revision: https://reviews.llvm.org/D25102

llvm-svn: 283577

e1c7c13c

Enable omp_get_schedule() to return static steal type. · a1234cf2

Jonathan Peyton authored Oct 07, 2016

As the code is now, calling omp_get_schedule() when OMP_SCHEDULE=static_steal
will cause an assert.

llvm-svn: 283576

a1234cf2

Sep 27, 2016

Disable monitor thread creation by default. · b66d1aab

Jonathan Peyton authored Sep 27, 2016

This change set disables creation of the monitor thread by default.  The global
counter maintained by the monitor thread was replaced by logic that uses system
time directly, and cyclic yielding on Linux target was also removed since there
was no clear benefit of using it. Turning on KMP_USE_MONITOR variable (=1)
enables creation of monitor thread again if it is really necessary for some
reasons.

Differential Revision: https://reviews.llvm.org/D24739

llvm-svn: 282507

b66d1aab

Sep 12, 2016

Fix bitmask upper bounds check · 7c465a5f

Jonathan Peyton authored Sep 12, 2016

Rather than checking KMP_CPU_SETSIZE, which doesn't exist when using Hwloc, we
use the get_max_proc() function which can vary based on the operating system.
For example on Windows with multiple processor groups, it might be the case that
the highest bit possible in the bitmask is not equal to the number of hardware
threads on the machine but something higher than that.

Differential Revision: https://reviews.llvm.org/D24206

llvm-svn: 281245

7c465a5f

Sep 09, 2016

[OPENMP] Implementation of omp_get_default_device and omp_set_default_device · 28f31b40

George Rokos authored Sep 09, 2016

Implementation of missing OpenMP 4.0 API functions omp_get_default_device and omp_set_default_device.
Also, added support for the environment variable OMP_DEFAULT_DEVICE.

Differential Revision: https://reviews.llvm.org/D23587

llvm-svn: 281065

28f31b40

Aug 03, 2016

Disable KMP_CANCEL_THREADS on Android · 0554d25e

Pirama Arumuga Nainar authored Aug 03, 2016

Summary:
Android does not have pthread_cancel.  Disable KMP_CANCEL_THREADS if
__ANDROID__ is defined.

Subscribers: tberghammer, srhines, openmp-commits, danalbert

Differential Revision: https://reviews.llvm.org/D23029

llvm-svn: 277618

0554d25e

Jul 11, 2016
- http://reviews.llvm.org/D22134: Implementation of OpenMP 4.5 nonmonotonic schedule modifier · 429dbc2a
  Andrey Churbanov authored Jul 11, 2016
```
llvm-svn: 275052
```
  429dbc2a
Jul 08, 2016

Improving EPCC performance when linking with hwloc · 4d3c2130

Jonathan Peyton authored Jul 08, 2016

When linking with libhwloc, the ORDERED EPCC test slows down on big
machines (> 48 cores). Performance analysis showed that a cache thrash
was occurring and this padding helps alleviate the problem.

Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate
the references to the global shared variables by instead creating a local
variable, oversubscribed and instead checking that.

Differential Revision: http://reviews.llvm.org/D22093

llvm-svn: 274894

4d3c2130

Jun 16, 2016

Teach OpenMP Library to use Hwloc on Windows · 0f3c2b92

Jonathan Peyton authored Jun 16, 2016

This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
          implementation of affinity) because they need to be defined when
          KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
        __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows

Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.

Differential Revision: http://reviews.llvm.org/D21441

llvm-svn: 272951

0f3c2b92

Jun 14, 2016

Remove unused wait/release code. · e85ba3f5

Jonathan Peyton authored Jun 14, 2016

Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21332

llvm-svn: 272697

e85ba3f5

Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be

Jonathan Peyton authored Jun 14, 2016

OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.

llvm-svn: 272687

df6818be

Jun 13, 2016

Affinity mask processing improvements · c5304aa3

Jonathan Peyton authored Jun 13, 2016

Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).

Patch by Brian Bliss.

Differential Revision: http://reviews.llvm.org/D21300

llvm-svn: 272589

c5304aa3

Fix bitmask complement operation · 34c72c47

Jonathan Peyton authored Jun 13, 2016

The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.

Differential Revision: http://reviews.llvm.org/D21245

llvm-svn: 272561

34c72c47

May 31, 2016

Use C++11 atomics for ticket locks implementation · f7cc6aff

Paul Osmialowski authored May 31, 2016

This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.

The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.

Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.

Differential Revision: http://reviews.llvm.org/D19878

llvm-svn: 271324

f7cc6aff

Addition of OpenMP 4.5 feature: schedule(simd:static) · ef734799

Jonathan Peyton authored May 31, 2016

This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.

Patch by Alex Duran.

Differential Revision: http://reviews.llvm.org/D20699

llvm-svn: 271320

ef734799

Avoid deadlock with COI · f4f96956

Jonathan Peyton authored May 31, 2016

When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task".  When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.

This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.

Differential Revision: http://reviews.llvm.org/D20733

llvm-svn: 271319

f4f96956

Offer API for setting number of loop dispatch buffers · 067325f9

Jonathan Peyton authored May 31, 2016

The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.

The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.

This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).

The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.

Differential Revision: http://reviews.llvm.org/D20697

llvm-svn: 271318

067325f9

May 23, 2016

Fork performance improvements · b044e4fa

Jonathan Peyton authored May 23, 2016

Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.

Patch by Diego Caballero

Differential Revision: http://reviews.llvm.org/D20487

llvm-svn: 270468

b044e4fa

May 16, 2016

Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf

Paul Osmialowski authored May 16, 2016

KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.

* some .c files that use this define do not include kmp_lock.h file,
  in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
  logic expressions which effectively disable use of Futex for
  AArch64 architecture, all these places should use
  '#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
  KMP_USE_FUTEX should be used instead

Differential Revision: http://reviews.llvm.org/D19629

llvm-svn: 269642

fb043fdf

May 13, 2016

Adding new kmp_aligned_malloc() entry point · f83ae31c

Jonathan Peyton authored May 12, 2016

This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.

Differential Revision: http://reviews.llvm.org/D19814

llvm-svn: 269365

f83ae31c

Apr 19, 2016
- [ITTNOTIFY] Remove serialized parallel regions from frame notification · a1202bf5
  Jonathan Peyton authored Apr 19, 2016
```
llvm-svn: 266760
```
  a1202bf5
Apr 18, 2016

Runtime support for untied tasks · e6643daa

Jonathan Peyton authored Apr 18, 2016

Introduced a counter of parts of an untied task submitted for execution. The
counter controls whether all parts of the task are already finished. The
compiler should generate re-submission of partially executed untied task by
itself before exiting of each task part except for the lexical last part.

Differential Revision: http://reviews.llvm.org/D19026

llvm-svn: 266675

e6643daa

Mar 27, 2016

Fixing the non-x86 build by removing dependence on kmp_cpuid_t · 01bb2406

Hal Finkel authored Mar 27, 2016

The problem is that the definition of kmp_cpuinfo_t contains:

  char       name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004)

and kmp_cpuid_t is only defined when compiling for x86.

Differential Revision: http://reviews.llvm.org/D18245

llvm-svn: 264535

01bb2406

Mar 15, 2016

[STATS] Add header information to stats print out · 6e98d798

Jonathan Peyton authored Mar 15, 2016

This change adds a header to the printout of the statistics which includes the
time, machine name, and processor info if available. This change also includes
some cosmetic changes like using enum casting for timer and counter iteration.

Differential Revision: http://reviews.llvm.org/D18153

llvm-svn: 263580

6e98d798

Mar 02, 2016

Add new OpenMP 4.5 taskloop construct feature · 283a215c

Jonathan Peyton authored Mar 02, 2016

From the standard: The taskloop construct specifies that the iterations of one
or more associated loops will be executed in parallel using OpenMP tasks. The
iterations are distributed across tasks created by the construct and scheduled
to be executed.

This initial implementation uses a simple linear tasks distribution algorithm.
Later we can add other algorithms to speedup generation of huge number of tasks
(i.e., tree-like tasks generation should be faster).

This needs to be put into the OpenMP runtime library in order for the
compiler team to develop the compiler side of the implementation.

Differential Revision: http://reviews.llvm.org/D17404

llvm-svn: 262535

283a215c

Add new OpenMP 4.5 doacross loop nest feature · 71909c57

Jonathan Peyton authored Mar 02, 2016

From the standard: A doacross loop nest is a loop nest that has cross-iteration
dependence. An iteration is dependent on one or more lexicographically earlier
iterations. The ordered clause parameter on a loop directive identifies the
loop(s) associated with the doacross loop nest.

The init/fini routines allocate/free doacross buffer(s) for each loop for each
thread. The wait routine waits for a flag designated by the dependence vector.
The post routine sets the flag designated by current iteration vector. We use
a similar technique of shared buffer indices that covers up to 7 nowait loops
executed simultaneously by different threads (number 7 has no real meaning,
just heuristic value). Also, the size of structures are kept intact via
reducing dummy arrays.

This needs to be put into the OpenMP runtime library in order for the compiler
team to develop the compiler side of the implementation.

Differential Revision: http://reviews.llvm.org/D17399

llvm-svn: 262532

71909c57

Feb 25, 2016

Add initial support for OpenMP 4.5 task priority feature · 2851072d

Jonathan Peyton authored Feb 25, 2016

The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
But as of now, nothing is done with it.  We just handle the environment variable
and add the new api: omp_get_max_task_priority() which returns that value or
zero if it is not set.

Differential Revision: http://reviews.llvm.org/D17411

llvm-svn: 261908

2851072d

dd new OpenMP 4.5 schedule clause modifiers (monotonic/non-monotonic) feature · ea0fe1df

Jonathan Peyton authored Feb 25, 2016

The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
Macros are added to probe if monotonic or non-monotonic is specified
(SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)

Currently, nothing is done with the modifiers.

Also, this patch adds some comments on the use of the enumerations in at least
 one place where it is subtle.

Differential Revision: http://reviews.llvm.org/D17406

llvm-svn: 261906

ea0fe1df

Feb 18, 2016
- Remove unnecessary semicolons after braces · 95c95c35
  Jonathan Peyton authored Feb 18, 2016
```
llvm-svn: 261249
```
  95c95c35
Feb 12, 2016

Fix incorrect task_team in __kmp_give_task · 134f90d5

Jonathan Peyton authored Feb 11, 2016

When a target task finishes and it tries to access the th_task_team from the
threads in the team where it was created, th_task_team can be NULL or point to
a different place when that thread started a nested region that is still
running. Finding the exact task_team that the threads were using is difficult
as it would require to unwind the task_state_memo_stack. So a new field was added
in the taskdata structure to point to the active task_team when the task was
created.

llvm-svn: 260615

134f90d5

Jan 29, 2016

Fix task dependency performance problem · 7d45451a

Jonathan Peyton authored Jan 28, 2016

In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
performance issue was found with libomp's task dependencies. The task
dependencies hash table has an issue with collisions. The current table size is
a power of two. This combined with the current hash function causes a large
number of collisions to occurr. Also, the current size (64) is too small for
larger applications so the table size is increased.

This patch creates a two level hash table approach for task dependencies. The
implicit task is considered the "master" or "top-level" task which has a large
static sized hash table (997), and nested tasks will have smaller hash
tables (97). Prime numbers were chosen to help reduce collisions.

Differential Revision: http://reviews.llvm.org/D16640

llvm-svn: 259113

7d45451a

Jan 27, 2016
- Removing extra empty lines · bf89c491
  Jonathan Peyton authored Jan 27, 2016
```
llvm-svn: 258984
```
  bf89c491
Jan 05, 2016
- Removed unused __kmp_*_i8 functions. · 32a1ea1b
  Jonathan Peyton authored Jan 04, 2016
```
llvm-svn: 256790
```
  32a1ea1b
Jan 04, 2016
- Fix for barrier problem: applications with many parallel regions (2^30) hang · 703d4042
  Jonathan Peyton authored Jan 04, 2016
```
The barrier states type doesn't need to be explicitly set.

llvm-svn: 256778
```
  703d4042
Dec 11, 2015

Hinted lock (OpenMP 4.5 feature) Updates/Fixes Part 3 · b87b5813

Jonathan Peyton authored Dec 11, 2015

This change set includes all changes to make the code conform to the OMP 4.5 specification:

* Removed hint / hinted_init definitions from include/40 files
* Hint values are powers of 2 to enable composition (4.5 spec)
* Hinted lock initialization functions were renamed (4.5 spec)
  kmp_init_lock_hinted -> omp_init_lock_with_hint
  kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
* __kmpc_critical_section_with_hint was added to support a critical section with
  a hint (4.5 spec)
* __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
  an internal lock type
* kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
  internal entries for the hinted lock initializers. The preivous internal
  functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
  places
* Added the two init functions to dllexports
* KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on

Differential Revision: http://reviews.llvm.org/D15205

llvm-svn: 255376

b87b5813

Nov 30, 2015

Adding Hwloc library option for affinity mechanism · 01dcf36b

Jonathan Peyton authored Nov 30, 2015

These changes allow libhwloc to be used as the topology discovery/affinity
mechanism for libomp.  It is supported on Unices. The code additions:
* Canonicalize KMP_CPU_* interface macros so bitmask operations are
  implementation independent and work with both hwloc bitmaps and libomp
  bitmaps.  So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
  the like. These are all in kmp.h and appropriately placed.
* Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
  interface to create a libomp address2os object which the rest of libomp knows
  how to handle already.
* To build, use -DLIBOMP_USE_HWLOC=on and
  -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
  can't find the library or hwloc.h, then it will tell you and exit.

Differential Revision: http://reviews.llvm.org/D13991

llvm-svn: 254320

01dcf36b

Nov 04, 2015
- Remove some empty lines. · 4505bf68
  Jonathan Peyton authored Nov 04, 2015
```
llvm-svn: 252084
```
  4505bf68