Skip to content
  1. Jun 17, 2020
  2. Jun 16, 2020
  3. Jun 15, 2020
    • Davide Italiano's avatar
      [CodeGenPrepare] Reset the debug location when promoting trunc(s) · c2dccf9d
      Davide Italiano authored
      The promotion machinery in CGP moves instructions retaining
      debug locations. When the transformation is local, this is mostly
      correct, but when instructions are moved cross-BBs, this is not
      always true and causes jumpiness in line tables. This is the first
      of a series of commits. sext(s) and zext(s) need to be treated
      similarly.
      
      Differential Revision:  https://reviews.llvm.org/D81879
      c2dccf9d
    • Florian Hahn's avatar
      [IR] Add nocapture & nosync to matrix intrinsics. · 1d33c09f
      Florian Hahn authored
      As suggested in D81472, the load/store intrinsics' pointer arguments can
      be marked as nocapture and all matrix intrinsics as nosync.
      
      This also re-flows the intrinsic definitions, to make them a little more
      concise.
      1d33c09f
    • Florian Hahn's avatar
      [DSE,MSSA] Port partial store merging. · 120c0592
      Florian Hahn authored
      Port partial constant store merging logic to MemorySSA backed DSE. The
      heavy lifting is done by the existing helper function. It is used in
      context where we already ensured that the later instruction can
      eliminate the earlier one, if it is a complete overwrite.
      120c0592
    • Florian Hahn's avatar
      [DSE,MSSA] Delete instructions after printing it. · 8c61f13a
      Florian Hahn authored
      Also enables a now-passing test case, that exposed a crash caused by the
      wrong order.
      8c61f13a
    • Florian Hahn's avatar
      [DSE,MSSA] Add additional merging test cases (NFC). · 979720a9
      Florian Hahn authored
      Additional tests added ahead of partial overlapping store merging.
      979720a9
    • Max Kazantsev's avatar
      [Test] Add an example of unprofitable PR Phi insertion · 9e4f6748
      Max Kazantsev authored
      This test demonstrates weird behavior of SimplifyCFG: seems that bigger
      size of block leads to worse optimization choice.
      9e4f6748
    • Sam Parker's avatar
      [CostModel] getCFInstrCost in getUserCost. · 2596da31
      Sam Parker authored
      Have BasicTTI call the base implementation so that both agree on the
      default behaviour, which the default being a cost of '1'. This has
      required an X86 specific implementation as it seems to be very
      reliant on those instructions being free. Changes are also made to
      AMDGPU so that their implementations distinguish between cost kinds,
      so that the unrolling isn't affected. PowerPC also has its own
      implementation to prevent changes to the reg-usage vectorizer test.
      
      The cost model test changes now reflect that ret instructions are not
      generally free.
      
      Differential Revision: https://reviews.llvm.org/D79164
      2596da31
    • Sam Parker's avatar
      Revert "Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant"" · 3e39760f
      Sam Parker authored
      This reverts commit 23291b98.
      
      This caused performance regressions.
      3e39760f
    • Max Kazantsev's avatar
    • Wenlei He's avatar
      [NewPM] Avoid redundant CGSCC run for updated SCC · b559535a
      Wenlei He authored
      Summary:
      When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
      that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We can avoid rerun of
      the same SCC when the SCC is set to be processed by both mechanisms *back to back*. In pathological cases, such redundant
      rerun could cause exponential size growth due to inlining along cycles, even when there's no SCC mutation and hence
      convergence is not a problem.
      
      Note that it's ok to have SCC updated and rerun immediately, and also in the work list if we have actually moved an SCC
      to be topologically "below" the current one due to merging. In that case, we will need to revisit the current SCC after
      those moved SCCs. For that reason, the redundant avoidance here only targets back to back rerun of the same SCC - the
      case described by the now removed FIXME comment.
      
      Reviewers: chandlerc, wmi
      
      Subscribers: llvm-commits, hoy
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D80589
      b559535a
  4. Jun 14, 2020
    • Florian Hahn's avatar
      [LAA] Do not set CanDoRT to false for AS that do not need RT checks. · 6176f044
      Florian Hahn authored
      Alternative approach to D80570.
      
      canCheckPtrAtRT already contains checks the figure out for which alias
      sets runtime checks are needed. But it currently sets CanDoRT to false
      for alias sets for which we cannot do RT checks but also do not need
      any.
      
      If we know that we do not need RT checks based on the number of
      reads/writes in the alias set, we can skip processing the AS.
      
      This patch also adds an assertion to ensure that DepCands does not
      contain more than one write from the alias set.
      
      Reviewers: Ayal, anemet, hfinkel, dmgreen
      
      Reviewed By: dmgreen
      
      Differential Revision: https://reviews.llvm.org/D80622
      6176f044
    • Whitney Tsang's avatar
      [LoopUnroll] Allow loops with multiple exiting blocks where loop latch · 5225cd43
      Whitney Tsang authored
      is not necessary one of them.
      
      Summary: Currently LoopUnrollPass already allow loops with multiple
      exiting blocks, but it is only allowed when the loop latch is one of the
      exiting blocks.
      When the loop latch is not an exiting block, then only single exiting
      block is supported.
      When possible, the single loop latch or the single exiting block
      terminator is optimized to an unconditional branch in the unrolled loop.
      
      This patch allows loops with multiple exiting blocks even if the loop
      latch is not one of them. However, the optimization of exiting block
      terminator to unconditional branch is not done when there exists more
      than one exiting block.
      Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
      Reviewed By: efriedma
      Subscribers: hiraditya, zzheng, llvm-commits
      Tag: LLVM
      Differential Revision: https://reviews.llvm.org/D81053
      5225cd43
    • Sanjay Patel's avatar
      [InstCombine] reassociate FP diff of sums into sum of diffs · b5fb2695
      Sanjay Patel authored
      (a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
      (a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
      
      This should be the last step in solving PR43953:
      https://bugs.llvm.org/show_bug.cgi?id=43953
      
      We started emitting reduction intrinsics with:
      D80867/ rGe50059f6b6b3
      So it's a relatively easy pattern match now to re-order those ops.
      Also, I have not seen any complaints for the switch to intrinsics
      yet, so I'll propose to remove the "experimental" tag from the
      intrinsics soon.
      
      Differential Revision: https://reviews.llvm.org/D81491
      b5fb2695
    • Sanjay Patel's avatar
      [InstCombine] allow undef elements when comparing vector constants for min/max bailout · aeb50448
      Sanjay Patel authored
      This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
      https://bugs.llvm.org/show_bug.cgi?id=46271
      
      As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
      with a transform that wants to pull a 'not' op through min/max via
      SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
      undefined elements in vector constants to avoid that. Alternatively, we could
      improve or cripple the demanded elements analysis, but that could create even
      more problems.
      
      The likely better, safer alternative will be to create min/max intrinsics, so
      we can remove all of the hacks related to min/max matching in instcombine.
      
      Differential Revision: https://reviews.llvm.org/D81698
      aeb50448
Loading