Skip to content
  1. May 05, 2017
    • Siddharth Bhat's avatar
      Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen" · c1267b9b
      Siddharth Bhat authored
      This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.
      
      Patches should have been submitted in the order of:
      
      1. D32852
      2. D32854
      3. D32431
      
      I mistakenly pushed D32431(3) first. Reverting to push in the correct
      order.
      
      llvm-svn: 302217
      c1267b9b
    • Siddharth Bhat's avatar
      [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen · 51904ae3
      Siddharth Bhat authored
      Summary:
      When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
      with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
      GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
      for that purpose, correctly choosing the corresponding library calls to the
      option chosen when compiling (via different initialization calls).
      
      Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).
      
      Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay
      
      Reviewed By: grosser, Meinersbur
      
      Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia
      
      Tags: #polly
      
      Differential Revision: https://reviews.llvm.org/D32431
      
      llvm-svn: 302215
      51904ae3
  2. May 03, 2017
  3. Apr 28, 2017
    • Siddharth Bhat's avatar
      [Polly] [PPCGCodeGeneration] Add managed memory support to GPU code · abed4969
      Siddharth Bhat authored
      generation.
      
      This needs changes to GPURuntime to expose synchronization between host
      and device.
      
      1. Needs better function naming, I want a better name than
      "getOrCreateManagedDeviceArray"
      
      2. DeviceAllocations is used by both the managed memory and the
      non-managed memory path. This exploits the fact that the two code paths
      are never run together. I'm not sure if this is the best design decision
      
      Reviewed by: PhilippSchaad
      
      Tags: #polly
      
      Differential Revision: https://reviews.llvm.org/D32215
      
      llvm-svn: 301640
      abed4969
  4. Apr 25, 2017
    • Siddharth Bhat's avatar
      [PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility · d277feda
      Siddharth Bhat authored
      Added a small change to the way pointer arguments are set in the kernel
      code generation. The way the pointer is retrieved now, specifically requests
      global address space to be annotated. This is necessary, if the IR should be
      run through NVPTX to generate OpenCL compatible PTX.
      
      The changes do not affect the PTX Strings generated for the CUDA target
      (nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).
      
      Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.
      
      Contributed-by: Philipp Schaad
      
      Reviewers: Meinersbur, grosser, bollu
      
      Reviewed By: grosser, bollu
      
      Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia
      
      Tags: #polly
      
      Differential Revision: https://reviews.llvm.org/D32215
      
      llvm-svn: 301299
      d277feda
  5. Mar 07, 2017
  6. Mar 03, 2017
  7. Jan 16, 2017
  8. Nov 13, 2016
  9. Sep 18, 2016
  10. Sep 17, 2016
    • Tobias Grosser's avatar
      GPGPU: Store back non-read-only scalars · 51dfc275
      Tobias Grosser authored
      We may generate GPU kernels that store into scalars in case we run some
      sequential code on the GPU because the remaining data is expected to already be
      on the GPU. For these kernels it is important to not keep the scalar values
      in thread-local registers, but to store them back to the corresponding device
      memory objects that backs them up.
      
      We currently only store scalars back at the end of a kernel. This is only
      correct if precisely one thread is executed. In case more than one thread may
      be run, we currently invalidate the scop. To support such cases correctly,
      we would need to always load and store back from a corresponding global
      memory slot instead of a thread-local alloca slot.
      
      llvm-svn: 281838
      51dfc275
    • Tobias Grosser's avatar
      GPGPU: Detect read-only scalar arrays ... · fe74a7a1
      Tobias Grosser authored
      and pass these by value rather than by reference.
      
      llvm-svn: 281837
      fe74a7a1
  11. Sep 15, 2016
    • Tobias Grosser's avatar
      GPGPU: Do not assume arrays start at 0 · aaabbbf8
      Tobias Grosser authored
      Our alias checks precisely check that the minimal and maximal accessed elements
      do not overlap in a kernel. Hence, we must ensure that our host <-> device
      transfers do not touch additional memory locations that are not covered in
      the alias check. To ensure this, we make sure that the data we copy for a
      given array is only the data from the smallest element accessed to the largest
      element accessed.
      
      We also adjust the size of the array according to the offset at which the array
      is actually accessed.
      
      An interesting result of this is: In case array are accessed with negative
      subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
      cover the full array. This is important as such code indeed exists in the wild.
      
      llvm-svn: 281611
      aaabbbf8
  12. Sep 13, 2016
  13. Sep 12, 2016
    • Tobias Grosser's avatar
      GPGPU: Bail out gracefully in case of invalid IR · 5857b701
      Tobias Grosser authored
      Instead of aborting, we now bail out gracefully in case the kernel IR we
      generate is invalid. This can currently happen in case the SCoP stores
      pointer values, which we model as arrays, as data values into other arrays. In
      this case, the original pointer value is not available on the device and can
      consequently not be stored. As detecting this ahead of time is not so easy, we
      detect these situations after the invalid IR has been generated and bail out.
      
      llvm-svn: 281193
      5857b701
  14. Sep 11, 2016
  15. Aug 10, 2016
    • Tobias Grosser's avatar
      [GPGPU] Ensure arrays where only parts are modified are copied to GPU · d58acf86
      Tobias Grosser authored
      To do so we change the way array exents are computed. Instead of the precise
      set of memory locations accessed, we now compute the extent as the range between
      minimal and maximal address in the first dimension and the full extent defined
      by the sizes of the inner array dimensions.
      
      We also move the computation of the may_persist region after the construction
      of the arrays, as it relies on array information. Without arrays being
      constructed no useful information is computed at all.
      
      llvm-svn: 278212
      d58acf86
  16. Aug 09, 2016
    • Tobias Grosser's avatar
      [GPGPU] Support PHI nodes used in GPU kernel · b06ff457
      Tobias Grosser authored
      Ensure the right scalar allocations are used as the host location of data
      transfers. For the device code, we clear the allocation cache before device
      code generation to be able to generate new device-specific allocation and
      we need to make sure to add back the old host allocations as soon as the
      device code generation is finished.
      
      llvm-svn: 278126
      b06ff457
    • Tobias Grosser's avatar
      [GPGPU] Use separate basic block for GPU initialization code · 750160e2
      Tobias Grosser authored
      This increases the readability of the IR and also clarifies that the GPU
      inititialization is executed _after_ the scalar initialization which needs
      to before the code of the transformed scop is executed.
      
      Besides increased readability, the IR should not change. Specifically, I
      do not expect any changes in program semantics due to this patch.
      
      llvm-svn: 278125
      750160e2
    • Tobias Grosser's avatar
      [tests] Add two missing 'REQUIRES' lines · 77f76788
      Tobias Grosser authored
      llvm-svn: 278104
      77f76788
    • Tobias Grosser's avatar
      [BlockGenerator] Also eliminate dead code not originating from BB · c59b3ce0
      Tobias Grosser authored
      After having generated the code for a ScopStmt, we run a simple dead-code
      elimination that drops all instructions that are known to be and remain unused.
      Until this change, we only considered instructions for dead-code elimination, if
      they have a corresponding instruction in the original BB that belongs to
      ScopStmt. However, when generating code we do not only copy code from the BB
      belonging to a ScopStmt, but also generate code for operands referenced from BB.
      After this change, we now also considers code for dead code elimination, which
      does not have a corresponding instruction in BB.
      
      This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
      within a GPU kernel, which is possible as we do not guarantee that all variables
      that are used in known-dead-code are moved to the GPU.
      
      llvm-svn: 278103
      c59b3ce0
    • Tobias Grosser's avatar
      [GPGPU] Pass parameters always by using their own type · cf66ef26
      Tobias Grosser authored
      llvm-svn: 278100
      cf66ef26
  17. Aug 08, 2016
  18. Aug 05, 2016
  19. Aug 04, 2016
  20. Aug 03, 2016
    • Tobias Grosser's avatar
      GPGPU: Mark kernel functions as polly.skip · 629109b6
      Tobias Grosser authored
      Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
      generate kernels that try to offload themselves, which does not work as the
      GPURuntime is not available on the accelerator and also does not make any
      sense.
      
      llvm-svn: 277589
      629109b6
  21. Jul 28, 2016
Loading