Skip to content
  1. Aug 02, 2018
  2. Aug 01, 2018
    • Joachim Protze's avatar
      [OMPT,tests] Fix taskloop testcase scheduling effects · 935399d2
      Joachim Protze authored
      The taskloop testcase had scheduling effects. Tasks of the taskloop would
      sometimes be scheduled before all task were created. The testing is now
      split into two phases. First, the task creation on the master is tested,
      than the scheduling events of the tasks are tested. Thus, the order of
      creation and scheduling events is irrelavant.
      
      Patch by Simon Convent
      
      Reviewed by: protze.joachim, Hahnfeld
      
      Subscribers: openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D50140
      
      llvm-svn: 338580
      935399d2
    • Jonas Hahnfeld's avatar
      [test] Convert test for PR36720 to c89 · 51fc3cc6
      Jonas Hahnfeld authored
      GCC 4.8.5 defaults to this old C standard. I think we should make the
      tests pass a newer -std=c99|c11 but that's too intrusive for now...
      
      Differential Revision: https://reviews.llvm.org/D50084
      
      llvm-svn: 338490
      51fc3cc6
  3. Jul 30, 2018
    • Jonathan Peyton's avatar
      [OpenMP] Fix tasking + parallel bug · 28226e7d
      Jonathan Peyton authored
      From the bug report, the runtime needs to initialize the nproc variables
      (inside middle init) for each root when the task is encountered, otherwise,
      a segfault can occur.
      
      Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36720
      
      Differential Revision: https://reviews.llvm.org/D49996
      
      llvm-svn: 338313
      28226e7d
    • Gheorghe-Teodor Bercea's avatar
      [OpenMP] Fix new task creation · f729df82
      Gheorghe-Teodor Bercea authored
      Summary:
      When OMPT is not supported the __kmp_omp_task() function is passed the parameters in the wrong order. This is a fix related to patch D47709.
      
      
      Reviewers: Hahnfeld, sconvent, caomhin, jlpeyton
      
      Reviewed By: Hahnfeld
      
      Subscribers: guansong, openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D50001
      
      llvm-svn: 338295
      f729df82
    • Jonas Hahnfeld's avatar
      [CMake] Disable -Wstringop-overflow · f985f981
      Jonas Hahnfeld authored
      GCC 8 produces false-positives with this:
      In file included from <openmp>/src/runtime/src/kmp_os.h:950,
                       from <openmp>/src/runtime/src/kmp.h:78,
                       from <openmp>/src/runtime/src/kmp_environment.cpp:54:
      <openmp>/src/runtime/src/kmp_environment.cpp: In function ‘char* __kmp_env_get(const char*)’:
      <openmp>/src/runtime/src/kmp_safe_c_api.h:52:50: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
       #define KMP_STRNCPY_S(dst, bsz, src, cnt) strncpy(dst, src, cnt)
                                                 ~~~~~~~^~~~~~~~~~~~~~~
      <openmp>/src/runtime/src/kmp_environment.cpp:97:5: note: in expansion of macro ‘KMP_STRNCPY_S’
           KMP_STRNCPY_S(result, len, value, len);
           ^~~~~~~~~~~~~
      <openmp>/src/runtime/src/kmp_environment.cpp:92:28: note: length computed here
           size_t len = KMP_STRLEN(value) + 1;
      
      This is stupid because result is allocated with KMP_INTERNAL_MALLOC(len),
      so the arguments are correct.
      
      Differential Revision: https://reviews.llvm.org/D49904
      
      llvm-svn: 338283
      f985f981
    • Jonathan Peyton's avatar
      [OpenMP] Add GOMP version symbols for OMP_4.5 API · 284fab19
      Jonathan Peyton authored
      This patch adds the appropriate version symbols to the relevant API functions
      
      Differential Revision: https://reviews.llvm.org/D49859
      
      llvm-svn: 338281
      284fab19
    • Jonathan Peyton's avatar
      [OpenMP] Implement GOMP doacross compatibility · 369d72db
      Jonathan Peyton authored
      This change introduces GOMP doacross compatibility. There are 12 new interface
      functions 6 for long type and 6 for unsigned long long type:
      GOMP_doacross_post, GOMP_doacross_wait, GOMP_loop_doacross_[schedule]_start
      where schedule can be static, dynamic, guided, or runtime.
      
      These functions just translate the parameters if necessary and send them
      to the corresponding kmp function.
      E.g., GOMP_doacross_post() -> __kmpc_doacross_post()
      
      For the GOMP_doacross_post function, there is template specialization to
      account for when long is a four byte vs an eight byte type. If it is a
      four byte type, then a temporary array has to be created to convert the
      four byte integers into eight byte integers and then sending that into
      __kmpc_doacross_post(). Because GOMP_doacross_wait uses varargs, it
      always needs a temporary array and does not need template specialization.
      
      Differential Revision: https://reviews.llvm.org/D49857
      
      llvm-svn: 338280
      369d72db
    • Jonathan Peyton's avatar
      [OpenMP] Fix build errors when building with KMP_DEBUG_ADAPTIVE_LOCKS=1 · 8692e142
      Jonathan Peyton authored
      This change fixes build errors when building a runtime with adaptive lock stats
      enabled. Most of the errors were due to the recent changes in the runtime, but
      it seems that we have not tried to build this debug runtime on Windows for a
      long time.
      
      Patch by Hansang Bae
      
      Differential Revision: https://reviews.llvm.org/D49823
      
      llvm-svn: 338277
      8692e142
    • Jonathan Peyton's avatar
      [OpenMP][Stats] Cleanup stats gathering code · f0682ac4
      Jonathan Peyton authored
      1) Remove unnecessary data from list node structure
      2) Remove timerPair in favor of pushing/popping explicitTimers.
         This way, nested timers will work properly.
      3) Fix #pragma omp critical timers
      4) Add histogram capability
      5) Add KMP_STATS_FILE formatting capability
      6) Have time partitioned into serial & parallel by introducing
         partitionedTimers::exchange(). This also counts the number of serial regions
         in the executable.
      7) Fix up the timers around OMP loops so that scheduling overhead and work are
         both counted correctly.
      8) Fix up the iterations statistics so they count the number of iterations the
         thread receives at each loop scheduling event
      9) Change timers so there is only one RDTSC read per event change
      10) Fix up the outdated comments for the timers
      
      Differential Revision: https://reviews.llvm.org/D49699
      
      llvm-svn: 338276
      f0682ac4
  4. Jul 27, 2018
    • Joachim Protze's avatar
      [OMPT] Fix OMPT callbacks for the taskloop construct and add testcase · cdaefac5
      Joachim Protze authored
      Fix the order of callbacks related to the taskloop construct.
      Add the iteration_count to work callbacks (according to the spec).
      Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks.
      Add a testcase.
      
      Patch by Simon Convent
      
      Reviewed by: protze.joachim, hbae
      
      Subscribers: openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D47709
      
      llvm-svn: 338146
      cdaefac5
    • Joachim Protze's avatar
      [OMPT] Adapt OMPT callbacks for tasks to handle untied tasks correctly · 86ed6aa6
      Joachim Protze authored
      The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now,
      frame addresses are tested and two scheduling points are added at which the
      task can switch to another thread. Due to scheduling effects, the frame address
      could be NULL.
      
      This needed a restructure of the way OMPT callbacks are called.
      __ompt_task_finish() now as an extra parameter, whether a task is completed.
      Its invocation has been moved into __kmp_task_finish(). Thus, the order of the
      writes to the frame addresses is not subject to scheduling effects anymore.
      
      Patch by Simon Convent
      
      Reviewed by: protze.joachim, hbae
      
      Subscribers: openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D49181
      
      llvm-svn: 338145
      86ed6aa6
    • Joachim Protze's avatar
      [OMPT] Print two more addresses in print_fuzzy_address_block() · f203109e
      Joachim Protze authored
      The two more outputs are needed to match the return addresses when using the
      Intel Compiler, as it generates more instructions between the fuzzy-printing
      of the address and the runtime call.
      
      Patch by Simon Convent
      
      Reviewed By: protze.joachim, hbae
      
      Differential Revision: https://reviews.llvm.org/D49373
      
      llvm-svn: 338144
      f203109e
  5. Jul 26, 2018
  6. Jul 25, 2018
  7. Jul 23, 2018
  8. Jul 19, 2018
    • Jonathan Peyton's avatar
      Block library shutdown until unreaped threads finish spin-waiting · a764af68
      Jonathan Peyton authored
      This change fixes possibly invalid access to the internal data structure during
      library shutdown.  In a heavily oversubscribed situation, the library shutdown
      sequence can reach the point where resources are deallocated while there still
      exist threads in their final spinning loop.  The added loop in
      __kmp_internal_end() checks if there are such busy-waiting threads and blocks
      the shutdown sequence if that is the case. Two versions of kmp_wait_template()
      are now used to minimize performance impact.
      
      Patch by Hansang Bae
      
      Differential Revision: https://reviews.llvm.org/D49452
      
      llvm-svn: 337486
      a764af68
    • George Rokos's avatar
      [OpenMP][libomptarget] New map interface: remove translation code and ensure... · a0da2468
      George Rokos authored
      [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members
      
      This patch removes the translation code since this functionality is now implemented in the compiler.
      target_data_begin and target_data_end are also patched to handle some special cases that used to be
      handled by the obsolete translation function, namely ensure proper alignment of struct members when
      we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning)
      can result in distortion of the alignment for some of its member fields. Padding restores the original
      (proper) alignment.
      
      Differential revision: https://reviews.llvm.org/D44186
      
      llvm-svn: 337455
      a0da2468
  9. Jul 18, 2018
    • Joachim Protze's avatar
      [libomptarget] Also support several images for elf · bb869f42
      Joachim Protze authored
      In revision r336569 (D49036) libomptarget support for multiple nvidia images
      has been fixed in case a target region resides inside one or multiple
      libraries and in the compiled application. But the issues is still present
      for elf images.
      This fix will also support multiple images for elf.
      
      Patch by Jannis Klinkenberg
      
      Reviewers: protze.joachim, ABataev, grokos
      
      Reviewed By: protze.joachim, ABataev, grokos
      
      Subscribers: openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D49418
      
      llvm-svn: 337355
      bb869f42
  10. Jul 15, 2018
  11. Jul 13, 2018
  12. Jul 12, 2018
    • Alexey Bataev's avatar
      [OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops. · c2c0138a
      Alexey Bataev authored
      Summary:
      Patch fixes the next problems.
      1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext
      class + simplified data members.
      2. Fixed calculation of loop boundaries for dynamic loops with static
      scheduling.
      3. Introduced saving/restoring of the dynamic loop boundaries to support
      several nested parallel dynamic loops.
      
      Reviewers: grokos
      
      Subscribers: guansong, kkwli0, openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D49241
      
      llvm-svn: 336915
      c2c0138a
  13. Jul 09, 2018
    • Jonathan Peyton's avatar
      Fix const cast problem introduced in r336563 · dc73f512
      Jonathan Peyton authored
      336563 eliminated CCAST() macros caused build failures
      
      llvm-svn: 336586
      dc73f512
    • Jonathan Peyton's avatar
      [OpenMP] Fix a few formatting issues · 61d44f18
      Jonathan Peyton authored
      llvm-svn: 336575
      61d44f18
    • Jonathan Peyton's avatar
      [OpenMP] Introduce hierarchical scheduling · f6399367
      Jonathan Peyton authored
      This patch introduces the logic implementing hierarchical scheduling.
      First and foremost, hierarchical scheduling is off by default
      To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
      This work is based off if the IWOMP paper:
      "Workstealing and Nested Parallelism in SMP Systems"
      
      Hierarchical scheduling is the layering of OpenMP schedules for different layers
      of the memory hierarchy. One can have multiple layers between the threads and
      the global iterations space. The threads will go up the hierarchy to grab
      iterations, using possibly a different schedule & chunk for each layer.
      
      [ Global iteration space (0-999) ]
      
      (use static)
      [ L1 | L1 | L1 | L1 ]
      
      (use dynamic,1)
      [ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
      
      In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
      If the topology indicates that there are two threads per core, then two
      consecutive threads will share the data of one L1 cache unit. This example
      would have the iteration space (0-999) split statically across the four L1
      caches (so the first L1 would get (0-249), the second would get (250-499), etc).
      Then the threads will use a dynamic,1 schedule to grab iterations from the L1
      cache units. There are currently four supported layers: L1, L2, L3, NUMA
      
      OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
      OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
      And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
      
      I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
      to try to keep it separate from the rest of the code.
      
      Differential Revision: https://reviews.llvm.org/D47962
      
      llvm-svn: 336571
      f6399367
    • Alexey Bataev's avatar
      [OPENMP, NVPTX] Support several images in the executable. · 2622e9e5
      Alexey Bataev authored
      Summary:
      Currently Cuda plugin supports loading of the single image, though we
      may have the executable with the several images, if it has target
      regions inside of the dynamically loaded library. Patch allows to load
      multiple images.
      
      Reviewers: grokos
      
      Subscribers: guansong, openmp-commits, kkwli0
      
      Differential Revision: https://reviews.llvm.org/D49036
      
      llvm-svn: 336569
      2622e9e5
    • Jonathan Peyton's avatar
      [OpenMP] Restructure loop code for hierarchical scheduling · 39ada854
      Jonathan Peyton authored
      This patch reorganizes the loop scheduling code in order to allow hierarchical
      scheduling to use it more effectively. In particular, the goal of this patch
      is to separate the algorithmic parts of the scheduling from the thread
      logistics code.
      
      Moves declarations & structures to kmp_dispatch.h for easier access in
      other files.  Extracts the algorithmic part of __kmp_dispatch_init() and
      __kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
      __kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
      __kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
      hierarchical scheduler needs to access the scheduling logic without the
      bookkeeping logic.  To prepare for new pointer in dispatch_private_info_t, a
      new flags variable is created which stores the ordered and nomerge flags instead
      of them being in two separate variables. This will keep the
      dispatch_private_info_t structure the same size.
      
      Differential Revision: https://reviews.llvm.org/D47961
      
      llvm-svn: 336568
      39ada854
    • Jonathan Peyton's avatar
      [OpenMP] Use C++11 Atomics - barrier, tasking, and lock code · 37e2ef54
      Jonathan Peyton authored
      These are preliminary changes that attempt to use C++11 Atomics in the runtime.
      We are expecting better portability with this change across architectures/OSes.
      Here is the summary of the changes.
      
      Most variables that need synchronization operation were converted to generic
      atomic variables (std::atomic<T>). Variables that are updated with combined CAS
      are packed into a single atomic variable, and partial read/write is done
      through unpacking/packing
      
      Patch by Hansang Bae
      
      Differential Revision: https://reviews.llvm.org/D47903
      
      llvm-svn: 336563
      37e2ef54
  14. Jul 06, 2018
  15. Jul 05, 2018
  16. Jul 02, 2018
  17. Jun 29, 2018
  18. Jun 25, 2018
    • Alexey Bataev's avatar
      [OPENMP, NVPTX] Fixes for NVPTX RTL · 0ac29350
      Alexey Bataev authored
      Summary:
      Patch fixes several problems in the implementation of NVPTX RTL.
      1. Detection of the last iteration for loops with static scheduling, no chunks.
      2. Fixes reductions for the serialized parallel constructs.
      3. Fixes handling of the barriers.
      
      Reviewers: grokos
      
      Reviewed By: grokos
      
      Subscribers: Hahnfeld, guansong, openmp-commits
      
      Differential Revision: https://reviews.llvm.org/D48480
      
      llvm-svn: 335469
      0ac29350
  19. Jun 20, 2018
  20. Jun 19, 2018
Loading