Commits · df6818bea4f513a5c7adcd203905163d4052cb1b · Lorenzo Albano / LLVM bpEVL

Jun 14, 2016

Renaming change: 41 -> 45 and 4.1 -> 4.5 · df6818be

Jonathan Peyton authored Jun 14, 2016

OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.

llvm-svn: 272687

df6818be

Jun 13, 2016

Bug fix for Bugzilla bug 26602: Remove function bodies with KMP_ASSERT(0) · e1890e12

Jonathan Peyton authored Jun 13, 2016

Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602.  Removed functions
body consisted of the only KMP_ASSERT(0) statement.  Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).

TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21304

llvm-svn: 272590

e1890e12

Affinity mask processing improvements · c5304aa3

Jonathan Peyton authored Jun 13, 2016

Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).

Patch by Brian Bliss.

Differential Revision: http://reviews.llvm.org/D21300

llvm-svn: 272589

c5304aa3

Exclude untied tasks from task stealing constraint · 8cb45c83

Jonathan Peyton authored Jun 13, 2016

If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.

Differential Revision: http://reviews.llvm.org/D21196

llvm-svn: 272570

8cb45c83

Fix crash when libomp loaded/unloaded multiple times · 93495de2

Jonathan Peyton authored Jun 13, 2016

The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions).  An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so.  After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:

The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.

Differential Revision: http://reviews.llvm.org/D21200

llvm-svn: 272567

93495de2

Hwloc refactoring patch · 202a24dd

Jonathan Peyton authored Jun 13, 2016

These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.

For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.

There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.

Differential Revision: http://reviews.llvm.org/D21195

llvm-svn: 272565

202a24dd

Fix bitmask complement operation · 34c72c47

Jonathan Peyton authored Jun 13, 2016

The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.

Differential Revision: http://reviews.llvm.org/D21245

llvm-svn: 272561

34c72c47

[STATS] Add stats gathering for taskloop construct · 5a299da5
Jonathan Peyton authored Jun 13, 2016
```
llvm-svn: 272560
```
5a299da5

Jun 09, 2016

Fix spelling in comment · b6f0f521
Jonathan Peyton authored Jun 09, 2016
```
llvm-svn: 272291
```
b6f0f521
Revert accidental commit to lit.cfg · 61fdddfd
Jonathan Peyton authored Jun 09, 2016
```
llvm-svn: 272287
```
61fdddfd

Refactor __kmp_execute_tasks_template function · c4c722ac

Jonathan Peyton authored Jun 09, 2016

Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20879

llvm-svn: 272286

c4c722ac

kmp_lock.h: Fix VS2013 build after r271324 · 5b89fbc8

Hans Wennborg authored Jun 09, 2016

MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.

Fixing this with an #ifdef to unbreak the build for now.

llvm-svn: 272271

5b89fbc8

Jun 01, 2016

Fine tuning of TC* macros - small followup · 9cc353e2

Paul Osmialowski authored Jun 01, 2016

As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.

Extension to Differential Revision: http://reviews.llvm.org/D19880

llvm-svn: 271377

9cc353e2

May 31, 2016

Use C++11 atomics for ticket locks implementation · f7cc6aff

Paul Osmialowski authored May 31, 2016

This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.

The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.

Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.

Differential Revision: http://reviews.llvm.org/D19878

llvm-svn: 271324

f7cc6aff

Addition of OpenMP 4.5 feature: schedule(simd:static) · ef734799

Jonathan Peyton authored May 31, 2016

This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.

Patch by Alex Duran.

Differential Revision: http://reviews.llvm.org/D20699

llvm-svn: 271320

ef734799

Avoid deadlock with COI · f4f96956

Jonathan Peyton authored May 31, 2016

When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task".  When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.

This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.

Differential Revision: http://reviews.llvm.org/D20733

llvm-svn: 271319

f4f96956

Offer API for setting number of loop dispatch buffers · 067325f9

Jonathan Peyton authored May 31, 2016

The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.

The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.

This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).

The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.

Differential Revision: http://reviews.llvm.org/D20697

llvm-svn: 271318

067325f9

May 27, 2016
- Fix storing the frame pointer for OMP-T during ppc64 microtask dispatch · 49bee007
  Hal Finkel authored May 27, 2016
```
Thanks to John Mellor-Crummey for reporting the omission.

llvm-svn: 271035
```
  49bee007
- Add missing OpenMP 4.5 device entries to stubs library. · 50eae7f8
  Jonathan Peyton authored May 27, 2016
```
llvm-svn: 271006
```
  50eae7f8
May 26, 2016

Fix for OMP_PROC_BIND=spread strategy · 7ba9baef

Jonathan Peyton authored May 26, 2016

The OMP_PROC_BIND=spread strategy fails to assign the master thread the
correct place partition after the first parallel region. Other threads in the
hot team will remember their place_partition, but the master's place partition
is restored to what it was before entering the parallel region. So when the hot
team is used for subsequent parallel regions, the master has lost this info.
This fix calls __kmp_partition_places to update only the master thread's place
partition in the spread case when there are no other changes to the hot team.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20539

llvm-svn: 270890

7ba9baef

Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled · 7abf9d59

Jonathan Peyton authored May 26, 2016

On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.

Patch by John Mellor-Crummey

Differential Revision: http://reviews.llvm.org/D20517

llvm-svn: 270884

7abf9d59

Add a test case for microtask dispatch with many arguments · 0a665a83
Hal Finkel authored May 26, 2016
```
This is a cleaned-up version of the test case posted in the D19879 review.

llvm-svn: 270867
```
0a665a83

Add an assembly __kmp_invoke_microtask for ppc64[le] · 91e19a3d

Hal Finkel authored May 26, 2016

Clang no longer restricts itself to generating microtasks with a small number
of arguments, and so an assembly implementation is required to prevent hitting
the parameter limit present in the C implementation. This adds an
implementation for ppc64[le].

llvm-svn: 270821

91e19a3d

May 25, 2016
- D20525: Use more general function for getting gtid which may be faster than specific one. · 2fd16542
  Andrey Churbanov authored May 25, 2016
```
llvm-svn: 270694
```
  2fd16542
May 23, 2016

Fork performance improvements · b044e4fa

Jonathan Peyton authored May 23, 2016

Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.

Patch by Diego Caballero

Differential Revision: http://reviews.llvm.org/D20487

llvm-svn: 270468

b044e4fa

Allow unit testing on Windows · 1ab887d4

Jonathan Peyton authored May 23, 2016

These changes allow testing on Windows using clang.exe.
There are two main changes:
1. Only link to -lm when it actually exists on the system
2. Create basic versions of pthread_create() and pthread_join() for windows.
   They are not POSIX compliant by any stretch but will allow any existing
   and future tests to use pthread_create() and pthread_join() for testing
   interactions of libomp with os threads.

Differential Revision: http://reviews.llvm.org/D20391

llvm-svn: 270464

1ab887d4

Changed parameter names in Fortran modules to correspond with OpenMP 4.5 specification · b2b6d4e2
Jonathan Peyton authored May 23, 2016
```
llvm-svn: 270447
```
b2b6d4e2

May 20, 2016
- Remove trailing whitespace in src/ directory · 61118491
  Jonathan Peyton authored May 20, 2016
```
This patch doesn't affect D19878's context.  So D19878 still cleanly applies.

llvm-svn: 270252
```
  61118491
May 18, 2016
- Remove unnecessary unistd.h header from tests. · aa7d2d78
  Jonathan Peyton authored May 18, 2016
```
llvm-svn: 269987
```
  aa7d2d78
May 17, 2016
- Remove trailing whitespace in files in doc/ directory · 096ccdd3
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269842
```
  096ccdd3
- Remove trailing whitespace from tests · 37310769
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269841
```
  37310769
- Remove trailing whitespace in files in tools/ directory · 0c3a85a3
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269837
```
  0c3a85a3
- Remove trailing whitespace in CMake files · 975dabc9
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269836
```
  975dabc9
- Remove trailing whitespace in READMEs, CREDITS.txt and index.html · 924a6627
  Jonathan Peyton authored May 17, 2016
```
llvm-svn: 269835
```
  924a6627
- [OpenMP Testing] Have lit.py be a valid lit executable · 0e8f0530
  Jonathan Peyton authored May 17, 2016
```
Users can use either llvm-lit (generated during llvm build) or lit.py which
exists in llvm/utils/lit.

llvm-svn: 269774
```
  0e8f0530
May 16, 2016

Clean all the mess around KMP_USE_FUTEX and kmp_lock.h · fb043fdf

Paul Osmialowski authored May 16, 2016

KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.

* some .c files that use this define do not include kmp_lock.h file,
  in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
  logic expressions which effectively disable use of Futex for
  AArch64 architecture, all these places should use
  '#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
  KMP_USE_FUTEX should be used instead

Differential Revision: http://reviews.llvm.org/D19629

llvm-svn: 269642

fb043fdf

May 13, 2016

NFC fix indent (relates to my previous commit) · 97ae10c6
Paul Osmialowski authored May 13, 2016
```
llvm-svn: 269443
```
97ae10c6

Solve 'Too many args to microtask' problem · 7e5e8684

Paul Osmialowski authored May 13, 2016

This patch solves 'Too many args to microtask' problem which occurs
while executing lulesh2.0.3 benchmark on AArch64.

To solve this I had to wrtite AArch64 assembly version of
__kmp_invoke_microtask() function, similar to x86 and x86_64
implementations.

Differential Revision: http://reviews.llvm.org/D19879

llvm-svn: 269399

7e5e8684

Adding new kmp_aligned_malloc() entry point · f83ae31c

Jonathan Peyton authored May 12, 2016

This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.

Differential Revision: http://reviews.llvm.org/D19814

llvm-svn: 269365

f83ae31c

May 12, 2016

Fix team reuse with foreign threads · 2b749b33

Jonathan Peyton authored May 12, 2016

After hot teams were enabled by default, the library started using levels kept
in the team structure. The levels are broken in case foreign thread exits and
puts its team into the pool which is then re-used by another foreign thread.
The broken behavior observed is when printing the levels for each new team, one
gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
team is nested which is incorrect. What is wanted is for the levels to be
1, 1, 1, etc.

Differential Revision: http://reviews.llvm.org/D19980

llvm-svn: 269363

2b749b33