Skip to content
  1. Jul 09, 2018
    • Jonathan Peyton's avatar
      [OpenMP] Introduce hierarchical scheduling · f6399367
      Jonathan Peyton authored
      This patch introduces the logic implementing hierarchical scheduling.
      First and foremost, hierarchical scheduling is off by default
      To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
      This work is based off if the IWOMP paper:
      "Workstealing and Nested Parallelism in SMP Systems"
      
      Hierarchical scheduling is the layering of OpenMP schedules for different layers
      of the memory hierarchy. One can have multiple layers between the threads and
      the global iterations space. The threads will go up the hierarchy to grab
      iterations, using possibly a different schedule & chunk for each layer.
      
      [ Global iteration space (0-999) ]
      
      (use static)
      [ L1 | L1 | L1 | L1 ]
      
      (use dynamic,1)
      [ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
      
      In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
      If the topology indicates that there are two threads per core, then two
      consecutive threads will share the data of one L1 cache unit. This example
      would have the iteration space (0-999) split statically across the four L1
      caches (so the first L1 would get (0-249), the second would get (250-499), etc).
      Then the threads will use a dynamic,1 schedule to grab iterations from the L1
      cache units. There are currently four supported layers: L1, L2, L3, NUMA
      
      OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
      OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
      And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
      
      I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
      to try to keep it separate from the rest of the code.
      
      Differential Revision: https://reviews.llvm.org/D47962
      
      llvm-svn: 336571
      f6399367
    • Jonathan Peyton's avatar
      [OpenMP] Restructure loop code for hierarchical scheduling · 39ada854
      Jonathan Peyton authored
      This patch reorganizes the loop scheduling code in order to allow hierarchical
      scheduling to use it more effectively. In particular, the goal of this patch
      is to separate the algorithmic parts of the scheduling from the thread
      logistics code.
      
      Moves declarations & structures to kmp_dispatch.h for easier access in
      other files.  Extracts the algorithmic part of __kmp_dispatch_init() and
      __kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
      __kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
      __kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
      hierarchical scheduler needs to access the scheduling logic without the
      bookkeeping logic.  To prepare for new pointer in dispatch_private_info_t, a
      new flags variable is created which stores the ordered and nomerge flags instead
      of them being in two separate variables. This will keep the
      dispatch_private_info_t structure the same size.
      
      Differential Revision: https://reviews.llvm.org/D47961
      
      llvm-svn: 336568
      39ada854
    • Jonathan Peyton's avatar
      [OpenMP] Use C++11 Atomics - barrier, tasking, and lock code · 37e2ef54
      Jonathan Peyton authored
      These are preliminary changes that attempt to use C++11 Atomics in the runtime.
      We are expecting better portability with this change across architectures/OSes.
      Here is the summary of the changes.
      
      Most variables that need synchronization operation were converted to generic
      atomic variables (std::atomic<T>). Variables that are updated with combined CAS
      are packed into a single atomic variable, and partial read/write is done
      through unpacking/packing
      
      Patch by Hansang Bae
      
      Differential Revision: https://reviews.llvm.org/D47903
      
      llvm-svn: 336563
      37e2ef54
  2. Jul 06, 2018
  3. Jul 05, 2018
  4. Jul 02, 2018
  5. Jun 20, 2018
  6. Jun 09, 2018
  7. May 28, 2018
  8. May 27, 2018
    • Jonas Hahnfeld's avatar
      [OMPT] Fix test parallel/not_enough_threads.c · 3c6595d6
      Jonas Hahnfeld authored
      Upcoming changes to FileCheck will modify CHECK-DAG to not match
      overlapping regions of the input. This test was found to be affected
      because it expects to find four threads to invoke events of type
      ompt_event_implicit_task_begin. It turns out this is wrong because
      OMP_THREAD_LIMIT is set to 2, so there are only two threads. The
      rest of the test got it right so it went unnoticed until now.
      
      (Rewrite test and apply clang-format to it as discussed in the past.)
      
      Differential Revision: https://reviews.llvm.org/D47119
      
      llvm-svn: 333361
      3c6595d6
  9. May 25, 2018
  10. May 07, 2018
  11. Apr 30, 2018
  12. Apr 19, 2018
  13. Apr 18, 2018
    • Jonathan Peyton's avatar
      [OpenMP] Fix affinity API for KMP_AFFINITY=none|compact|scatter · 1482db9e
      Jonathan Peyton authored
      Currently, the affinity API reports garbage for the initial place list and any
      thread's place lists when using KMP_AFFINITY=none|compact|scatter.
      This patch does two things:
      
      for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the
      initial place list is just a single place with all the proc ids in it. We also
      set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the
      thread reports that single place (place 0) instead of garbage (-1) when using
      the affinity API.
      
      When non-OMP_PROC_BIND affinity is used
      (including KMP_AFFINITY=compact|scatter), a thread's place list is populated
      correctly. We assume that each thread is assigned to a single place. This is
      implemented in two of the affinity API functions
      
      Differential Revision: https://reviews.llvm.org/D45527
      
      llvm-svn: 330283
      1482db9e
    • Jonathan Peyton's avatar
      Introduce GOMP_taskloop API · 27a677fc
      Jonathan Peyton authored
      This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
      version symbols. Being a wrapper around __kmpc_taskloop, the function
      creates a task with the loop bounds properly nested in the shareds so that
      the GOMP task thunk will work properly. Also, the firstprivate copy constructors
      are properly handled using the __kmp_gomp_task_dup() auxiliary function.
      
      Currently, only linear spawning of tasks is supported
      for the GOMP_taskloop interface.
      
      Differential Revision: https://reviews.llvm.org/D45327
      
      llvm-svn: 330282
      27a677fc
  14. Apr 12, 2018
  15. Mar 30, 2018
  16. Mar 26, 2018
    • Jonathan Peyton's avatar
      Move blocktime_str variable right before its first use · ea82c769
      Jonathan Peyton authored
      llvm-svn: 328575
      ea82c769
    • Jonathan Peyton's avatar
      Add summarizeStats.py to tools directory · b6b79ac9
      Jonathan Peyton authored
      The summarizeStats.py script processes raw data provided by the
      instrumented (stats-gathering) OpenMP* runtime library. It provides:
      
      1) A radar chart which plots counters as frequency (per GigaTick) of use within
         the program. The frequencies are plotted as log10, however values less than
         one are kept as it is and represented in red color. This was done to help
         visualize the differences better.
      2) Pie charts separating total time as compute and non-compute. The compute and
         non-compute times have their own pie charts showing the constructs that
         contributed to them. The percentages listed are with respect to the total
         time.
      3) '.csv' file with percentage of time spent within the different constructs.
      
      The script can be used as:
      $ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv
      
      Patch by Taru Doodi
      
      Differential Revision: https://reviews.llvm.org/D41838
      
      llvm-svn: 328568
      b6b79ac9
  17. Mar 22, 2018
  18. Mar 20, 2018
    • Jonathan Peyton's avatar
      Read OMP_TARGET_OFFLOAD and provide API to access ICV · 78f977fc
      Jonathan Peyton authored
      Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added
      target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD,
      if available, otherwise defaulting to DEFAULT. Valid values for the ICV are
      specified as enum values {0,1,2} for disabled, default, and mandatory. An
      internal API access function __kmpc_get_target_offload is provided.
      
      Patch by Terry Wilmarth
      
      Differential Revision: https://reviews.llvm.org/D44577
      
      llvm-svn: 328046
      78f977fc
  19. Mar 19, 2018
  20. Mar 05, 2018
  21. Mar 01, 2018
  22. Feb 28, 2018
    • Joachim Protze's avatar
      [OMPT] Fix ompt_get_task_info() and add tests for it · aa2022e7
      Joachim Protze authored
      The thread_num parameter of ompt_get_task_info() was not being used previously,
      but need to be set.
      
      The print_task_type() function (form the task-types.c testcase) was merged into
      the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
      added to the task-types.c testcase. It was not tested extensively previously.
      
      Differential Revision: https://reviews.llvm.org/D42472
      
      llvm-svn: 326338
      aa2022e7
    • Joachim Protze's avatar
      [OMPT] Fix inconsistent testcases · 4df80bda
      Joachim Protze authored
      The main change of this patch is to insert {{.*}} in current_address=[[RETURN_ADDRESS_END]].
      This is needed to match any of the alternatively printed addresses.
      
      Additionally, clang-format is applied to the two tests.
      
      Differential Revision: https://reviews.llvm.org/D43115
      
      llvm-svn: 326312
      4df80bda
  23. Feb 23, 2018
  24. Feb 17, 2018
Loading