Skip to content
  1. Jul 22, 2017
  2. Jul 21, 2017
  3. Jul 20, 2017
    • Matt Arsenault's avatar
      Add an ID field to StackObjects · db78273b
      Matt Arsenault authored
      On AMDGPU SGPR spills are really spilled to another register.
      The spiller creates the spills to new frame index objects,
      which is used as a placeholder.
      
      This will eventually be replaced with a reference to a position
      in a VGPR to write to and the frame index deleted. It is
      most likely not a real stack location that can be shared
      with another stack object.
      
      This is a problem when StackSlotColoring decides it should
      combine a frame index used for a normal VGPR spill with
      a real stack location and a frame index used for an SGPR.
      
      Add an ID field so that StackSlotColoring has a way
      of knowing the different frame index types are
      incompatible.
      
      llvm-svn: 308673
      db78273b
  4. Jul 18, 2017
  5. Jul 17, 2017
  6. Jul 16, 2017
  7. Jul 15, 2017
    • Matt Arsenault's avatar
      AMDGPU: Return correct type during argument lowering · b3463555
      Matt Arsenault authored
      The type needs to be casted back to the original argument type.
      Fixes an assert that for some reason is only run when
      using -debug.
      
      Includes an additional combine to avoid test regressions
      from having conversions mixed with multiple Assert[SZ]ext
      nodes. On subtargets where i16 is legal, this was producing an i32
      register with an i16 AssertZExt, truncated to i16 with another i8
      AssertZExt.
      
      t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
      t3: i16 = truncate t2
      t5: i16 = AssertZext t3, ValueType:ch:i8
      t6: i8 = truncate t5
      t7: i32 = zero_extend t6
      llvm-svn: 308082
      b3463555
  8. Jul 14, 2017
  9. Jul 13, 2017
  10. Jul 12, 2017
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] fcanonicalize elimination optimization · 5680b0ca
      Stanislav Mekhanoshin authored
      We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
      That is possible to omit this multiplication if source of the
      fcanonicalize instruction is known to be flushed/quieted, i.e.
      if it comes from another instruction known to do the normalization
      and we are using IEEE mode to quiet sNaNs.
      
      Differential Revision: https://reviews.llvm.org/D35218
      
      llvm-svn: 307848
      5680b0ca
    • Konstantin Zhuravlyov's avatar
      Enhance synchscope representation · bb80d3e1
      Konstantin Zhuravlyov authored
        OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
        global and local memory. These scopes restrict how synchronization is
        achieved, which can result in improved performance.
      
        This change extends existing notion of synchronization scopes in LLVM to
        support arbitrary scopes expressed as target-specific strings, in addition to
        the already defined scopes (single thread, system).
      
        The LLVM IR and MIR syntax for expressing synchronization scopes has changed
        to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
        replaces *singlethread* keyword), or a target-specific name. As before, if
        the scope is not specified, it defaults to CrossThread/System scope.
      
        Implementation details:
          - Mapping from synchronization scope name/string to synchronization scope id
            is stored in LLVM context;
          - CrossThread/System and SingleThread scopes are pre-defined to efficiently
            check for known scopes without comparing strings;
          - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
            the bitcode.
      
      Differential Revision: https://reviews.llvm.org/D21723
      
      llvm-svn: 307722
      bb80d3e1
  11. Jul 10, 2017
  12. Jul 07, 2017
  13. Jul 06, 2017
    • Matt Arsenault's avatar
      AMDGPU: Add macro fusion schedule DAG mutation · 9aa45f04
      Matt Arsenault authored
      Try to increase opportunities to shrink vcc uses.
      
      llvm-svn: 307313
      9aa45f04
    • Matt Arsenault's avatar
      AMDGPU: Remove unnecessary IR from MIR tests · 60b91e0b
      Matt Arsenault authored
      llvm-svn: 307311
      60b91e0b
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Always use rcp + mul with fast math · 9d7b1c9d
      Stanislav Mekhanoshin authored
      Regardless of relaxation options such as -cl-fast-relaxed-math
      we are producing rather long code for fdiv via amdgcn_fdiv_fast
      intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
      metadata and does not handle denormals, thus believed to be fast.
      
      An fdiv instruction can also have fast math flag either by itself
      or together with fpmath metadata. Clang used with a relaxation flag
      always produces both metadata and fast flag:
      
      %div = fdiv fast float %v, %0, !fpmath !12
      !12 = !{float 2.500000e+00}
      
      Current implementation ignores fast flag and favors metadata. An
      instruction with just fast flag would be lowered to a fastest rcp +
      mul, but that never happen on practice because of described mutual
      clang and BE behavior.
      
      This change allows an "fdiv fast" to be always lowered as rcp + mul.
      
      Differential Revision: https://reviews.llvm.org/D34844
      
      llvm-svn: 307308
      9d7b1c9d
    • David Stuttard's avatar
      [RegisterCoalescer] Fix for SubRange join unreachable · 7528d4bd
      David Stuttard authored
      Summary:
      During remat, some subranges might end up having invalid segments which caused problems for later
      coalescing.
      
      Added in a check to remove segments that are invalidated as part of the remat.
      
      See http://llvm.org/PR33524
      
      Subscribers: MatzeB, qcolombet
      
      Differential Revision: https://reviews.llvm.org/D34391
      
      llvm-svn: 307247
      7528d4bd
  14. Jul 04, 2017
  15. Jul 03, 2017
  16. Jun 30, 2017
    • Richard Smith's avatar
      Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR · d0c0c134
      Richard Smith authored
      This is a short-term fix for PR33650 aimed to get the modules build bots green again.
      
      Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
      macros to try to locally specialize a global template for a global type. That's
      not how C++ works.
      
      Instead, we now centrally define how to format vectors of fundamental types and
      of string (std::string and StringRef). We use flow formatting for the former
      cases, since that's the obvious right thing to do; in the latter case, it's
      less clear what the right choice is, but flow formatting is really bad for some
      cases (due to very long strings), so we pick block formatting. (Many of the
      cases that were using flow formatting for strings are improved by this change.)
      
      Other than the flow -> block formatting change for some vectors of strings,
      this should result in no functionality change.
      
      Differential Revision: https://reviews.llvm.org/D34907
      
      Corresponding updates to clang, clang-tools-extra, and lld to follow.
      
      llvm-svn: 306878
      d0c0c134
  17. Jun 28, 2017
  18. Jun 27, 2017
Loading