Commits · 7abf9d5927b39abf8e2d29a2edc66fc7f828289c · Lorenzo Albano / LLVM bpEVL

May 26, 2016

Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled · 7abf9d59

Jonathan Peyton authored May 26, 2016

On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.

Patch by John Mellor-Crummey

Differential Revision: http://reviews.llvm.org/D20517

llvm-svn: 270884

7abf9d59

Add a test case for microtask dispatch with many arguments · 0a665a83
Hal Finkel authored May 26, 2016
```
This is a cleaned-up version of the test case posted in the D19879 review.

llvm-svn: 270867
```
0a665a83

Add an assembly __kmp_invoke_microtask for ppc64[le] · 91e19a3d

Hal Finkel authored May 26, 2016

Clang no longer restricts itself to generating microtasks with a small number
of arguments, and so an assembly implementation is required to prevent hitting
the parameter limit present in the C implementation. This adds an
implementation for ppc64[le].

llvm-svn: 270821

91e19a3d

May 25, 2016
- D20525: Use more general function for getting gtid which may be faster than specific one. · 2fd16542
  Andrey Churbanov authored May 25, 2016
```
llvm-svn: 270694
```
  2fd16542
May 23, 2016

Fork performance improvements · b044e4fa

Jonathan Peyton authored May 23, 2016

Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.

Patch by Diego Caballero

Differential Revision: http://reviews.llvm.org/D20487

llvm-svn: 270468

b044e4fa

Allow unit testing on Windows · 1ab887d4

Jonathan Peyton authored May 23, 2016

These changes allow testing on Windows using clang.exe.
There are two main changes:
1. Only link to -lm when it actually exists on the system
2. Create basic versions of pthread_create() and pthread_join() for windows.
   They are not POSIX compliant by any stretch but will allow any existing
   and future tests to use pthread_create() and pthread_join() for testing
   interactions of libomp with os threads.

Differential Revision: http://reviews.llvm.org/D20391

llvm-svn: 270464

1ab887d4

Changed parameter names in Fortran modules to correspond with OpenMP 4.5 specification · b2b6d4e2
Jonathan Peyton authored May 23, 2016
```
llvm-svn: 270447
```
b2b6d4e2

May 20, 2016
- Remove trailing whitespace in src/ directory · 61118491
  Jonathan Peyton authored May 20, 2016
```
This patch doesn't affect D19878's context.  So D19878 still cleanly applies.

llvm-svn: 270252
```
  61118491
May 18, 2016
- Remove unnecessary unistd.h header from tests. · aa7d2d78
  Jonathan Peyton authored May 18, 2016
```
llvm-svn: 269987
```
  aa7d2d78
May 17, 2016
- Remove trailing whitespace in files in doc/ directory · 096ccdd3
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269842
```
  096ccdd3
- Remove trailing whitespace from tests · 37310769
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269841
```
  37310769
- Remove trailing whitespace in files in tools/ directory · 0c3a85a3
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269837
```
  0c3a85a3
- Remove trailing whitespace in CMake files · 975dabc9
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269836
```
  975dabc9
- Remove trailing whitespace in READMEs, CREDITS.txt and index.html · 924a6627
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269835
```
  924a6627
- [OpenMP Testing] Have lit.py be a valid lit executable · 0e8f0530
  Jonathan Peyton authored May 17, 2016
```
Users can use either llvm-lit (generated during llvm build) or lit.py which
exists in llvm/utils/lit.

llvm-svn: 269774
```
  0e8f0530
May 16, 2016

Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf

Paul Osmialowski authored May 16, 2016

KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.

* some .c files that use this define do not include kmp_lock.h file,
  in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
  logic expressions which effectively disable use of Futex for
  AArch64 architecture, all these places should use
  '#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
  KMP_USE_FUTEX should be used instead

Differential Revision: http://reviews.llvm.org/D19629

llvm-svn: 269642

fb043fdf

May 13, 2016

NFC fix indent (relates to my previous commit) · 97ae10c6
Paul Osmialowski authored May 13, 2016
```
llvm-svn: 269443
```
97ae10c6

Solve 'Too many args to microtask' problem · 7e5e8684

Paul Osmialowski authored May 13, 2016

This patch solves 'Too many args to microtask' problem which occurs
while executing lulesh2.0.3 benchmark on AArch64.

To solve this I had to wrtite AArch64 assembly version of
__kmp_invoke_microtask() function, similar to x86 and x86_64
implementations.

Differential Revision: http://reviews.llvm.org/D19879

llvm-svn: 269399

7e5e8684

Adding new kmp_aligned_malloc() entry point · f83ae31c

Jonathan Peyton authored May 12, 2016

This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.

Differential Revision: http://reviews.llvm.org/D19814

llvm-svn: 269365

f83ae31c

May 12, 2016

Fix team reuse with foreign threads · 2b749b33

Jonathan Peyton authored May 12, 2016

After hot teams were enabled by default, the library started using levels kept
in the team structure. The levels are broken in case foreign thread exits and
puts its team into the pool which is then re-used by another foreign thread.
The broken behavior observed is when printing the levels for each new team, one
gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
team is nested which is incorrect. What is wanted is for the levels to be
1, 1, 1, etc.

Differential Revision: http://reviews.llvm.org/D19980

llvm-svn: 269363

2b749b33

New hwloc API compatibility · 562a3c2b
Paul Osmialowski authored May 12, 2016
```
Differential Revision: http://reviews.llvm.org/D19628

llvm-svn: 269284
```
562a3c2b

Restore NULL flag check in __kmp_null_resume_wrapper · 55acbf88

Hal Finkel authored May 12, 2016

This reverts a presumaby-unintentional change in:

  r268640 - [STATS] Use partitioned timer scheme

and fixes segfaults in an x86_64 debug build of the runtime library.

llvm-svn: 269259

55acbf88

May 07, 2016

Fine tuning of TC* macros · 52bef53f

Paul Osmialowski authored May 07, 2016

This patch introduces following:
* TCI_* and TCD_* macros for incrementation and decrementation
* Fix for invalid use of TCR_8 in one expression

Differential Revision: http://reviews.llvm.org/D19880

llvm-svn: 268826

52bef53f

May 05, 2016

[STATS] Use partitioned timer scheme · 11dc82fa

Jonathan Peyton authored May 05, 2016

This change removes the current timers with ones that partition time properly.
The current timers are nested, so that if a new timer, B, starts when the
current timer, A, is already timing, A's time will include B's. To eliminate
this problem, the partitioned timers are designed to stop the current timer (A),
let the new timer run (B), and when the new timer is finished, restart the
previously running timer (A). With this partitioning of time, a threads' timers
all sum up to the OMP_worker_thread_life time and can now easily show the
percentage of time a thread is spending in different parts of the runtime or
user code.

There is also a new state variable associated with each thread which tells where
it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
time is spent in OMP_task_taskwait, then that thread executed tasks inside a
#pragma omp taskwait construct.

The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
the new partitionedTimers class and its methods, and new state logic.

Differential Revision: http://reviews.llvm.org/D19229

llvm-svn: 268640

11dc82fa

May 04, 2016
- NFC remove unneded spaces (test commit) · fedce46b
  Paul Osmialowski authored May 03, 2016
```
llvm-svn: 268462
```
  fedce46b
Apr 25, 2016

Remove architecture dependent Hwloc DEBUG section · 8407f5b3

Jonathan Peyton authored Apr 25, 2016

This debug sections's functionality can be replicated using the environment
variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose

llvm-svn: 267472

8407f5b3

Fix buffer problem with printing long Hwloc affinity mask · 1d5487c5

Jonathan Peyton authored Apr 25, 2016

This change has the hwloc_bitmap_list_snprintf() function use the entire buffer
to print the mask.  There is no need to shorten the buffer length by 7.  It only
needs to be shortened by one byte.

llvm-svn: 267470

1d5487c5

Apr 19, 2016
- [ITTNOTIFY] Remove serialized parallel regions from frame notification · a1202bf5
  Jonathan Peyton authored Apr 19, 2016
```
llvm-svn: 266760
```
  a1202bf5
Apr 18, 2016

Fix trip count calculation for parallel loops in runtime · 5235a1b6

Jonathan Peyton authored Apr 18, 2016

The trip count calculation was incorrect for loops with large bounds. For example,
for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
signed integers) and wasn't giving the right value. This patch fixes this error
in the runtime by using unsigned integers instead. There is still a bug in the
clang compiler component because it warns that there is overflow in the
test case file when there isn't. This error isn't there for the Intel Compiler.
So for now, the test case is designated as XFAIL.

Differential Revision: http://reviews.llvm.org/D19078

llvm-svn: 266677

5235a1b6

Runtime support for untied tasks · e6643daa

Jonathan Peyton authored Apr 18, 2016

Introduced a counter of parts of an untied task submitted for execution. The
counter controls whether all parts of the task are already finished. The
compiler should generate re-submission of partially executed untied task by
itself before exiting of each task part except for the lexical last part.

Differential Revision: http://reviews.llvm.org/D19026

llvm-svn: 266675

e6643daa

Fix for pthread_setspecific (TLS and shutdown) problem · f252010f

Jonathan Peyton authored Apr 18, 2016

Some codes that use TLS fail intermittently because one thread tries to write
TLS values after the TLS key has been destroyed by another thread. This happens
when one thread executes library shutdown (and destroys TLS keys), while another
thread starts to execute the TLS key destructor routine. Before this change, the
kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
this flag is set to FALSE later than the destruction of the TLS keys, which
leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
unset *before* the destruction of TLS keys.

Differential Revision: http://reviews.llvm.org/D19022

llvm-svn: 266674

f252010f

[STATS] Remove timePair class and unused functions · e2289a42
Jonathan Peyton authored Apr 18, 2016
```
llvm-svn: 266634
```
e2289a42
[STATS] print Total_* stats on their own line · 53eca521
Jonathan Peyton authored Apr 18, 2016
```
llvm-svn: 266633
```
53eca521

Apr 14, 2016

[ITTNOTIFY] Correct barrier imbalance time in case of tasks · 99ef4d04

Jonathan Peyton authored Apr 14, 2016

ittnotify fix for barrier imbalance time in case tasks exist. In the current
implementation, task execution time is included into aggregated time on a
barrier. This fix calculates task execution time and corrects the arrive time
by subtracting the task execution time.

Since __kmp_invoke_task() can not only be called on a barrier, the field
th.th_bar_arrive_time is used to check if the function was called at the
barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
is set to zero right after the value is used on the barrier.

Differential Revision: http://reviews.llvm.org/D19030

llvm-svn: 266332

99ef4d04

Exponential back off logic for test-and-set lock · 377aa40d

Jonathan Peyton authored Apr 14, 2016

This change adds back off logic in the test and set lock for better contended
lock performance. It uses a simple truncated binary exponential back off
function. The default back off parameters are tuned for x86.

The main back off logic has a two loop structure where each is controlled by a
user-level parameter:
max_backoff - limits the outer loop number of iterations.
    This parameter should be a power of 2.
min_ticks - the inner spin wait loop number of "ticks" which is system
    dependent and should be tuned for your system if you so choose.
    The "ticks" on x86 correspond to the time stamp counter,
    but on other architectures ticks is a timestamp derived
    from gettimeofday().

The user can modify these via the environment variable:
KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
Currently, since the default user lock is a queuing lock,
one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.

Differential Revision: http://reviews.llvm.org/D19020

llvm-svn: 266329

377aa40d

Apr 12, 2016
- Add declarations of OpenMP 4.5 target/offload routines to headers · 2e379fc7
  Jonathan Peyton authored Apr 12, 2016
```
All these routines are implemented in the offload library.

llvm-svn: 266120
```
  2e379fc7
Apr 05, 2016
- [STATS] Remove trailing whitespace in stats source files · 072772bf
  Jonathan Peyton authored Apr 05, 2016
```
llvm-svn: 265437
```
  072772bf
Apr 04, 2016

OMP_WAIT_POLICY changes · 50e8f18b

Jonathan Peyton authored Apr 04, 2016

This change has OMP_WAIT_POLICY=active to mean that threads will busy-wait in
spin loops and virtually never go to sleep. OMP_WAIT_POLICY=passive now means
that threads will immediately go to sleep inside a spin loop. KMP_BLOCKTIME was
the previous mechanism to specify this behavior via KMP_BLOCKTIME=0 or
KMP_BLOCKTIME=infinite, but the standard OpenMP environment variable should
also be able to specify this behavior.

Differential Revision: http://reviews.llvm.org/D18577

llvm-svn: 265339

50e8f18b

Mar 30, 2016

Fix bug when KMP_USE_ADAPTIVE_LOCKS is 0 · 1d46d979

Jonathan Peyton authored Mar 30, 2016

#endif was one line too low.  If KMP_USE_ADAPTIVE_LOCKS is 0,
then queuing locks would incorrectly use drdpa lock mechanism.
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649

llvm-svn: 264934

1d46d979

Mar 29, 2016

Fix comment in kmp_wait_release.h · 4cfe93c5

Jonathan Peyton authored Mar 29, 2016

Removed reference to "ref ct" in a comment, as ref_ct no longer exists. Also
moved the comment to where the task_team is about to be tested if NULL.

llvm-svn: 264786

4cfe93c5