Skip to content
  1. Aug 12, 2021
  2. Aug 11, 2021
  3. Aug 10, 2021
    • Nikita Popov's avatar
      [MemCpyOpt] Optimize MemoryDef insertion · 17db125b
      Nikita Popov authored
      When converting a store into a memset, we currently insert the new
      MemoryDef after the store MemoryDef, which requires all uses to be
      renamed to the new def using a whole block scan. Instead, we can
      insert the new MemoryDef before the store and not rename uses,
      because we know that the location is immediately overwritten, so
      all uses should still refer to the old MemoryDef. Those uses will
      get renamed when the old MemoryDef is actually dropped, which is
      efficient.
      
      I expect something similar can be done for some of the other MSSA
      updates in MemCpyOpt. This is an alternative to D107513, at least
      for this particular case.
      
      Differential Revision: https://reviews.llvm.org/D107702
      17db125b
    • Sanjay Patel's avatar
      [InstCombine] avoid infinite loops from min/max canonicalization · b267d3ce
      Sanjay Patel authored
      The intrinsics have an extra chunk of known bits logic
      compared to the normal cmp+select idiom. That allows
      folding the icmp in each case to something better, but
      that then opposes the canonical form of min/max that
      we try to form for a select.
      
      I'm carving out a narrow exception to preserve all
      existing regression tests while avoiding the inf-loop.
      It seems unlikely that this is the only bug like this
      left, but this should fix:
      https://llvm.org/PR51419
      b267d3ce
    • Carl Ritson's avatar
      [SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. · a1783b54
      Carl Ritson authored
      Avoid stack overflow errors on systems with small stack sizes
      by removing recursion in FoldCondBranchOnPHI.
      
      This is a simple change as the recursion was only iteratively
      calling the function again on the same arguments.
      Ideally this would be compiled to a tail call, but there is
      no guarantee.
      
      Reviewed By: lebedev.ri
      
      Differential Revision: https://reviews.llvm.org/D107803
      a1783b54
    • David Sherwood's avatar
      [InstCombine] Add more complex folds for extractelement + stepvector · ce394161
      David Sherwood authored
      I have updated cheapToScalarize to also consider the case when
      extracting lanes from a stepvector intrinsic. This required removing
      the existing 'bool IsConstantExtractIndex' and passing in the actual
      index as a Value instead. We do this because we need to know if the
      index is <= known minimum number of elements returned by the stepvector
      intrinsic. Effectively, when extracting lane X from a stepvector we
      know the value returned is also X.
      
      New tests added here:
      
        Transforms/InstCombine/vscale_extractelement.ll
      
      Differential Revision: https://reviews.llvm.org/D106358
      ce394161
  4. Aug 09, 2021
  5. Aug 08, 2021
  6. Aug 07, 2021
  7. Aug 06, 2021
    • Sanjay Patel's avatar
      [InstCombine] reduce vector casting before icmp · 0369714b
      Sanjay Patel authored
      There may be some generalizations (see test comments) of these patterns,
      but this should handle the cases motivated by:
      https://llvm.org/PR51315
      https://llvm.org/PR51259
      
      The backend may want to transform differently, but at least for
      the x86 examples that I looked at, there does not appear to be
      any significant perf diff either way.
      0369714b
    • Artem Belevich's avatar
      [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. · 6a9cf21f
      Artem Belevich authored
      Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
      there are users that do not expect LLVM to materialize `memset` intrinsic.
      
      While other passes can do that, too, MemCpyOpt triggers it more frequently and
      breaks sanitizers and some downstream users.
      
      For now introduce a flag to force-enable the flag and opt-in only CUDA
      compilation with NVPTX back-end.
      
      Differential Revision: https://reviews.llvm.org/D106401
      6a9cf21f
    • Michael Liao's avatar
      [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. · d1cacd59
      Michael Liao authored
      - Loads from the constant memory (either explicit one or as the source
        of memory transfer intrinsics) won't alias any stores.
      
      Reviewed By: asbirlea, efriedma
      
      Differential Revision: https://reviews.llvm.org/D107605
      d1cacd59
    • David Sherwood's avatar
      [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform · 3fd96e1b
      David Sherwood authored
      This patch adds more instructions to the Uniforms list, for example certain
      intrinsics that are uniform by definition or whose operands are loop invariant.
      This list includes:
      
        1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
        are always uniform by definition.
        2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
        loop invariant input operands then these are also uniform too.
      
      Also, in VPRecipeBuilder::handleReplication we check if an instruction is
      uniform based purely on whether or not the instruction lives in the Uniforms
      list. However, there are certain cases where calls to some intrinsics can
      be effectively treated as uniform too. Therefore, we now also treat the
      following cases as uniform for scalable vectors:
      
        1. If the 'assume' intrinsic's operand is not loop invariant, then we
        are free to treat this as uniform anyway since it's only a performance
        hint. We will get the benefit for the first lane.
        2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
        variant then for scalable vectors we assume these still ultimately come
        from the broadcast of an alloca. We do not support scalable vectorisation
        of loops containing alloca instructions, hence the alloca itself would
        be invariant. If the pointer does not come from an alloca then the
        intrinsic itself has no effect.
      
      I have updated the assume test for fixed width, since we now treat it
      as uniform:
      
        Transforms/LoopVectorize/assume.ll
      
      I've also added new scalable vectorisation tests for other intriniscs:
      
        Transforms/LoopVectorize/scalable-assume.ll
        Transforms/LoopVectorize/scalable-lifetime.ll
        Transforms/LoopVectorize/scalable-noalias-scope-decl.ll
      
      Differential Revision: https://reviews.llvm.org/D107284
      3fd96e1b
    • Chuanqi Xu's avatar
      [FuncSpec] Return changed if function is changed by tryToReplaceWithConstant · 0fd03feb
      Chuanqi Xu authored
      The may get changed before specialization by RunSCCPSolver. In other
      words, the pass may change the function without specialization happens.
      Add test and comment to reveal this.
      And it may return No Changed if the function get changed by
      RunSCCPSolver before the specialization. It looks like a potential bug.
      
      Test Plan: check-all
      
      Reviewed By: https://reviews.llvm.org/D107622
      
      Differential Revision: https://reviews.llvm.org/D107622
      0fd03feb
    • David Sherwood's avatar
    • Chuanqi Xu's avatar
Loading