Skip to content
  1. Sep 10, 2019
  2. Sep 09, 2019
    • Eli Friedman's avatar
      [IfConversion] Correctly handle cases where analyzeBranch fails. · 79f0d3a6
      Eli Friedman authored
      If analyzeBranch fails, on some targets, the out parameters point to
      some blocks in the function. But we can't use that information, so make
      sure to clear it out.  (In some places in IfConversion, we assume that
      any block with a TrueBB is analyzable.)
      
      The change to the testcase makes it trigger a bug on builds without this
      fix: IfConvertDiamond tries to perform a followup "merge" operation,
      which isn't legal, and we somehow end up with a branch to a deleted MBB.
      I'm not sure how this doesn't crash the compiler.
      
      Differential Revision: https://reviews.llvm.org/D67306
      
      llvm-svn: 371434
      79f0d3a6
    • Sam Parker's avatar
      [ARM][ParallelDSP] Fix for sext input · c363deb5
      Sam Parker authored
          
      The incoming accumulator value can be discovered through a sext, in
      which case there will be a mismatch between the input and the result.
      So sign extend the accumulator input if we're performing a 64-bit mac.
      
      Differential Revision: https://reviews.llvm.org/D67220
      
      llvm-svn: 371370
      c363deb5
  3. Sep 08, 2019
  4. Sep 06, 2019
  5. Sep 05, 2019
    • Eli Friedman's avatar
      [IfConversion] Fix diamond conversion with unanalyzable branches. · cae1e47f
      Eli Friedman authored
      The code was incorrectly counting the number of identical instructions,
      and therefore tried to predicate an instruction which should not have
      been predicated.  This could have various effects: a compiler crash,
      an assembler failure, a miscompile, or just generating an extra,
      unnecessary instruction.
      
      Instead of depending on TargetInstrInfo::removeBranch, which only
      works on analyzable branches, just remove all branch instructions.
      
      Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and
      https://bugs.llvm.org/show_bug.cgi?id=41121 .
      
      Differential Revision: https://reviews.llvm.org/D67203
      
      llvm-svn: 371111
      cae1e47f
    • Guillaume Chatelet's avatar
      [LLVM][Alignment] Make functions using log of alignment explicit · aff45e4b
      Guillaume Chatelet authored
      Summary:
      This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
      The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
      A few renames uncovered dubious assignments:
      
       - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
       - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
       - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
      
      Reviewers: lattner, thegameg, courbet
      
      Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D65945
      
      llvm-svn: 371045
      aff45e4b
  6. Sep 04, 2019
  7. Sep 03, 2019
    • Oliver Stannard's avatar
      Bug fix on function epilog optimization (ARM backend) · 39bf484d
      Oliver Stannard authored
      To save a 'add sp,#val' instruction by adding registers to the final pop instruction,
      the first register transferred by this pop instruction need to be found.
      If the function to be optimized has a non-void return value, the operand list contains
      r0 (implicit) which prevents the optimization to take place.
      Therefore implicit register references should be skipped in the search loop,
      because this registers are never popped from the stack.
      
      Patch by Rainer Herbertz (rOptimizer)!
      
      Differential revision: https://reviews.llvm.org/D66730
      
      llvm-svn: 370728
      39bf484d
  8. Sep 01, 2019
  9. Aug 29, 2019
    • Jordan Rupprecht's avatar
      Revert [MBP] Disable aggressive loop rotate in plain mode · f9f81289
      Jordan Rupprecht authored
      This reverts r369664 (git commit 51f48295)
      
      It causes many benchmark regressions, internally and in llvm's benchmark suite.
      
      llvm-svn: 370398
      f9f81289
    • Jeremy Morse's avatar
      [DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations · ca0e4b36
      Jeremy Morse authored
      The missing line added by this patch ensures that only spilt variable
      locations are candidates for being restored from the stack. Otherwise,
      register or constant-value information can be interpreted as a spill
      location, through a union.
      
      The added regression test replicates a scenario where this occurs: the
      stack load from [rsp] causes the register-location DBG_VALUE to be
      "restored" to rsi, when it should be left alone. See PR43058 for details.
      
      Un x-fail a test that was suffering from this from a previous patch.
      
      Differential Revision: https://reviews.llvm.org/D66895
      
      llvm-svn: 370334
      ca0e4b36
    • Jeremy Morse's avatar
      [DebugInfo] LiveDebugValues should always revisit backedges if it skips them · 313d2ce9
      Jeremy Morse authored
      The "join" method in LiveDebugValues does not attempt to join unseen
      predecessor blocks if their out-locations aren't yet initialized, instead
      the block should be re-visited later to see if any locations have changed
      validity. However, because the set of blocks were all being "process"'d
      once before "join" saw them, that logic in "join" was actually ignoring
      legitimate out-locations on the first pass through. This meant that some
      invalidated locations were not removed from the head of loops, allowing
      illegal locations to persist.
      
      Fix this by removing the run of "process" before the main join/process loop
      in ExtendRanges. Now the unseen predecessors that "join" skips truly are
      uninitialized, and we come back to the block at a later time to re-run
      "join", see the @baz function added.
      
      This also fixes another fault where stack/register transfers in the entry
      block (or any other before-any-loop-block) had their tranfers initially
      ignored, and were then never revisited. The MIR test added tests for this
      behaviour.
      
      XFail a test that exposes another bug; a fix for this is coming in D66895.
      
      Differential Revision: https://reviews.llvm.org/D66663
      
      llvm-svn: 370328
      313d2ce9
  10. Aug 28, 2019
    • Sam Parker's avatar
      [ARM][ParallelDSP] Change search for muls · a761ba0f
      Sam Parker authored
      rL369567 reverted a couple of recent changes made to ARMParallelDSP
      because of a miscompilation error: PR43073.
      
      The issue stemmed from an underlying bug that was caused by adding
      muls into a reduction before it was proved that they could be executed
      in parallel with another mul.
      
      Most of the changes here are from the previously reverted commits.
      The additional changes have been made area:
      1) The Search function now doesn't insert any muls into the Reduction
         object. That now happens once the search has successfully finished.
      2) For any muls added into the reduction but that weren't paired, we
         accumulate their values as an input into the smlad.
      
      Differential Revision: https://reviews.llvm.org/D66660
      
      llvm-svn: 370171
      a761ba0f
  11. Aug 26, 2019
  12. Aug 23, 2019
  13. Aug 22, 2019
  14. Aug 21, 2019
  15. Aug 17, 2019
  16. Aug 16, 2019
  17. Aug 13, 2019
    • Matt Arsenault's avatar
      GlobalISel: Add more verifier checks for G_SHUFFLE_VECTOR · 0a04a062
      Matt Arsenault authored
      llvm-svn: 368705
      0a04a062
    • Matt Arsenault's avatar
      GlobalISel: Change representation of shuffle masks · 5af9cf04
      Matt Arsenault authored
      Currently shufflemasks get emitted as any other constant, and you end
      up with a bunch of virtual registers of G_CONSTANT with a
      G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
      doesn't fit this pattern. This isn't an ideal representation, and
      should avoid legalization and have fewer opportunities for a
      representational error.
      
      Rather than invent a new shuffle mask operand type, similar to what
      ShuffleVectorSDNode does, just track the original IR Constant mask
      operand. I don't completely like the idea of adding another link to
      the IR, but MIR is already quite dependent on IR constants already,
      and this will allow sharing the shuffle mask utility functions with
      the IR.
      
      llvm-svn: 368704
      5af9cf04
    • Hans Wennborg's avatar
      Revert r368276 "[TargetLowering] SimplifyDemandedBits - call... · 5390d25f
      Hans Wennborg authored
      Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
      
      This introduced a false positive MemorySanitizer warning about use of
      uninitialized memory in a vectorized crc function in Chromium. That suggests
      maybe something is not right with this transformation. See
      https://crbug.com/992853#c7 for a reproducer.
      
      This also reverts the follow-up commits r368307 and r368308 which
      depended on this.
      
      > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
      >
      > In particular this helps remove some unnecessary scalar->vector->scalar patterns.
      >
      > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
      >
      > Differential Revision: https://reviews.llvm.org/D65887
      
      llvm-svn: 368660
      5390d25f
  18. Aug 12, 2019
    • Hans Wennborg's avatar
      Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode" · a45f301f
      Hans Wennborg authored
      It caused assertions to fire when building Chromium:
      
        lib/CodeGen/LiveDebugValues.cpp:331: bool
        {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
        `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.
      
      See https://crbug.com/992871#c3 for how to reproduce.
      
      > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
      >
      > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
      >
      > Differential Revision: https://reviews.llvm.org/D65673
      
      llvm-svn: 368579
      a45f301f
  19. Aug 09, 2019
    • Daniel Sanders's avatar
      [globalisel] Add G_SEXT_INREG · e9a57c2b
      Daniel Sanders authored
      Summary:
      Targets often have instructions that can sign-extend certain cases faster
      than the equivalent shift-left/arithmetic-shift-right. Such cases can be
      identified by matching a shift-left/shift-right pair but there are some
      issues with this in the context of combines. For example, suppose you can
      sign-extend 8-bit up to 32-bit with a target extend instruction.
        %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
        %2:_(s32) = G_ASHR %1:_(s32), i32 24
        %3:_(s32) = G_ASHR %2:_(s32), i32 1
      would reasonably combine to:
        %1:_(s32) = G_SHL %0:_(s32), i32 24
        %2:_(s32) = G_ASHR %1:_(s32), i32 25
      which no longer matches the special case. If your shifts and extend are
      equal cost, this would break even as a pair of shifts but if your shift is
      more expensive than the extend then it's cheaper as:
        %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
        %3:_(s32) = G_ASHR %2:_(s32), i32 1
      It's possible to match the shift-pair in ISel and emit an extend and ashr.
      However, this is far from the only way to break this shift pair and make
      it hard to match the extends. Another example is that with the right
      known-zeros, this:
        %1:_(s32) = G_SHL %0:_(s32), i32 24
        %2:_(s32) = G_ASHR %1:_(s32), i32 24
        %3:_(s32) = G_MUL %2:_(s32), i32 2
      can become:
        %1:_(s32) = G_SHL %0:_(s32), i32 24
        %2:_(s32) = G_ASHR %1:_(s32), i32 23
      
      All upstream targets have been configured to lower it to the current
      G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
      handle their faster cases.
      
      To follow-up: Provide a way to legalize based on the constant. At the
      moment, I'm thinking that the best way to achieve this is to provide the
      MI in LegalityQuery but that opens the door to breaking core principles
      of the legalizer (legality is not context sensitive). That said, it's
      worth noting that looking at other instructions and acting on that
      information doesn't violate this principle in itself. It's only a
      violation if, at the end of legalization, a pass that checks legality
      without being able to see the context would say an instruction might not be
      legal. That's a fairly subtle distinction so to give a concrete example,
      saying %2 in:
        %1 = G_CONSTANT 16
        %2 = G_SEXT_INREG %0, %1
      is legal is in violation of that principle if the legality of %2 depends
      on %1 being constant and/or being 16. However, legalizing to either:
        %2 = G_SEXT_INREG %0, 16
      or:
        %1 = G_CONSTANT 16
        %2:_(s32) = G_SHL %0, %1
        %3:_(s32) = G_ASHR %2, %1
      depending on whether %1 is constant and 16 does not violate that principle
      since both outputs are genuinely legal.
      
      Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm
      
      Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61289
      
      llvm-svn: 368487
      e9a57c2b
    • Sam Parker's avatar
      [ARM][ParallelDSP] Replace SExt uses · 0dba791a
      Sam Parker authored
      As loads are combined and widened, we replaced their sext users
      operands whereas we should have been replacing the uses of the sext.
      I've added a load of tests, with only a few of them originally
      causing assertion failures, the rest improve pattern coverage.
      
      Differential Revision: https://reviews.llvm.org/D65740
      
      llvm-svn: 368404
      0dba791a
  20. Aug 08, 2019
    • Guozhi Wei's avatar
      [MBP] Disable aggressive loop rotate in plain mode · 80347c3a
      Guozhi Wei authored
      Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
      
      To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
      
      Differential Revision: https://reviews.llvm.org/D65673
      
      llvm-svn: 368339
      80347c3a
    • Simon Pilgrim's avatar
      [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits... · e2e36679
      Simon Pilgrim authored
      [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
      
      This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
      
      In particular this helps remove some unnecessary scalar->vector->scalar patterns.
      
      The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
      
      Differential Revision: https://reviews.llvm.org/D65887
      
      llvm-svn: 368276
      e2e36679
  21. Aug 05, 2019
    • Oliver Stannard's avatar
      Reland: Fix and test inter-procedural register allocation for ARM · 8ed8353f
      Oliver Stannard authored
      Add an explicit construction of the ArrayRef, gcc 5 and earlier don't
      seem to select the ArrayRef constructor which takes a C array when the
      construction is implicit.
      
      Original commit message:
      
      - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
        with a null RegScavenger. Simply not updating the register scavenger
        is fine because IPRA only cares about the SavedRegs vector, the acutal
        code of the function has already been generated at this point.
      - Add a new hook to TargetRegisterInfo to get the set of registers which
        can be clobbered inside a call, even if the compiler can see both
        sides, by linker-generated code.
      
      Differential revision: https://reviews.llvm.org/D64908
      
      llvm-svn: 367819
      8ed8353f
  22. Aug 03, 2019
  23. Aug 02, 2019
    • Oliver Stannard's avatar
      [IPRA][ARM] Disable no-CSR optimisation for ARM · 4b7239eb
      Oliver Stannard authored
      This optimisation isn't generally profitable for ARM, because we can
      save/restore many registers in the prologue and epilogue using the PUSH
      and POP instructions, but mostly use individual LDR/STR instructions for
      other spills.
      
      Differential revision: https://reviews.llvm.org/D64910
      
      llvm-svn: 367670
      4b7239eb
    • Oliver Stannard's avatar
      Fix and test inter-procedural register allocation for ARM · f6b00c27
      Oliver Stannard authored
      - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
        with a null RegScavenger. Simply not updating the register scavenger
        is fine because IPRA only cares about the SavedRegs vector, the acutal
        code of the function has already been generated at this point.
      - Add a new hook to TargetRegisterInfo to get the set of registers which
        can be clobbered inside a call, even if the compiler can see both
        sides, by linker-generated code.
      
      Differential revision: https://reviews.llvm.org/D64908
      
      llvm-svn: 367669
      f6b00c27
Loading