Skip to content
  1. Jan 20, 2021
  2. Jan 19, 2021
    • Alexey Bataev's avatar
      Revert "[SLP]Merge reorder and reuse shuffles." · e463bd53
      Alexey Bataev authored
      This reverts commit 438682de to fix the
      bug with the reducing size of the resulting vector for the entry node
      with multiple users.
      e463bd53
    • Mariya Podchishchaeva's avatar
      [ScalarizeMaskedMemIntrin] Add missing dependency · 7113de30
      Mariya Podchishchaeva authored
      The pass has dependency on 'TargetTransformInfoWrapperPass', but the
      corresponding call to INITIALIZE_PASS_DEPENDENCY was missing.
      
      Differential Revision: https://reviews.llvm.org/D94916
      7113de30
    • Nikita Popov's avatar
      Reapply [InstCombine] Replace one-use select operand based on condition · 21443381
      Nikita Popov authored
      Relative to the original change, this adds a check that the
      instruction on which we're replacing operands is safe to speculatively
      execute, because that's what we're effectively doing. We're executing
      the instruction with the replaced operand, which is fine if it's pure,
      but not fine if can cause side-effects or UB (aka is not speculatable).
      
      Additionally, we cannot (generally) replace operands in phi nodes,
      as these may refer to a different loop iteration. This is also covered
      by the speculation check.
      
      -----
      
      InstCombine already performs a fold where X == Y ? f(X) : Z is
      transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
      if f(X) only has one use, then we can always directly replace the
      use inside the instruction. To actually be profitable, limit it to
      the case where Y is a non-expr constant.
      
      This could be further extended to replace uses further up a one-use
      instruction chain, but for now this only looks one level up.
      
      Among other things, this also subsumes D94860.
      
      Differential Revision: https://reviews.llvm.org/D94862
      21443381
    • Jeroen Dobbelaere's avatar
      [noalias.decl] Look through llvm.experimental.noalias.scope.decl · 121cac01
      Jeroen Dobbelaere authored
      Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D93042
      121cac01
    • Hans Wennborg's avatar
      Revert 5238e7b3 "[InstCombine] Replace one-use select operand based on condition" · 58bdfcfa
      Hans Wennborg authored
      This caused a miscompile in Chromium, see comments on the codereview for
      discussion and pointer to a reproducer.
      
      > InstCombine already performs a fold where X == Y ? f(X) : Z is
      > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
      > if f(X) only has one use, then we can always directly replace the
      > use inside the instruction. To actually be profitable, limit it to
      > the case where Y is a non-expr constant.
      >
      > This could be further extended to replace uses further up a one-use
      > instruction chain, but for now this only looks one level up.
      >
      > Among other things, this also subsumes D94860.
      >
      > Differential Revision: https://reviews.llvm.org/D94862
      
      This also reverts the follow-up
      a003f265:
      
      > [llvm] Prevent infinite loop in InstCombine of select statements
      >
      > This fixes an issue where the RHS and LHS the comparison operation
      > creating the predicate were swapped back and forth forever.
      >
      > Differential Revision: https://reviews.llvm.org/D94934
      58bdfcfa
    • Florian Hahn's avatar
      [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. · 83daa497
      Florian Hahn authored
      D84108 exposed a bad interaction between inlining and loop-rotation
      during regular LTO, which is causing notable regressions in at least
      CINT2006/473.astar.
      
      The problem boils down to: we now rotate a loop just before the vectorizer
      which requires duplicating a function call in the preheader when compiling
      the individual files ('prepare for LTO'). But this then prevents further
      inlining of the function during LTO.
      
      This patch tries to resolve this issue by making LoopRotate more
      conservative with respect to rotating loops that have inline-able calls
      during the 'prepare for LTO' stage.
      
      I think this change intuitively improves the current situation in
      general. Loop-rotate tries hard to avoid creating headers that are 'too
      big'. At the moment, it assumes all inlining already happened and the
      cost of duplicating a call is equal to just doing the call. But with LTO,
      inlining also happens during full LTO and it is possible that a previously
      duplicated call is actually a huge function which gets inlined
      during LTO.
      
      From the perspective of LV, not much should change overall. Most loops
      calling user-provided functions won't get vectorized to start with
      (unless we can infer that the function does not touch memory, has no
      other side effects). If we do not inline the 'inline-able' call during
      the LTO stage, we merely delayed loop-rotation & vectorization. If we
      inline during LTO, chances should be very high that the inlined code is
      itself vectorizable or the user call was not vectorizable to start with.
      
      There could of course be scenarios where we inline a sufficiently large
      function with code not profitable to vectorize, which would have be
      vectorized earlier (by scalarzing the call). But even in that case,
      there probably is no big performance impact, because it should be mostly
      down to the cost-model to reject vectorization in that case. And then
      the version with scalarized calls should also not be beneficial. In a way,
      LV should have strictly more information after inlining and make more
      accurate decisions (barring cost-model issues).
      
      There is of course plenty of room for things to go wrong unexpectedly,
      so we need to keep a close look at actual performance and address any
      follow-up issues.
      
      I took a look at the impact on statistics for
      MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
      loops rotated, but no change to the number of loops vectorized.
      
      Reviewed By: sanwou01
      
      Differential Revision: https://reviews.llvm.org/D94232
      83daa497
    • Tres Popp's avatar
      [llvm] Prevent infinite loop in InstCombine of select statements · a003f265
      Tres Popp authored
      This fixes an issue where the RHS and LHS the comparison operation
      creating the predicate were swapped back and forth forever.
      
      Differential Revision: https://reviews.llvm.org/D94934
      a003f265
    • David Sherwood's avatar
      [NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost · c3ce2627
      David Sherwood authored
      A previous patch has already changed getInstructionCost to return
      an InstructionCost type. This patch changes the other various
      getXXXCost functions to return an InstructionCost too. This is a
      non-functional change - I've added a few asserts that the costs
      are valid in places where we're selecting between vector call
      and intrinsic costs. However, since we don't yet return invalid
      costs from any of the TTI implementations these asserts should
      not fire.
      
      See this patch for the introduction of the type: https://reviews.llvm.org/D91174
      See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
      
      Differential Revision: https://reviews.llvm.org/D94065
      c3ce2627
    • Juneyoung Lee's avatar
      Address unused variable warning · 2d89ebd5
      Juneyoung Lee authored
      2d89ebd5
    • Juneyoung Lee's avatar
      [InstCombine,InstSimplify] Optimize select followed by and/or/xor · 0441df94
      Juneyoung Lee authored
      This patch adds `A & (A && B)` -> `A && B`  (similarly for or + logical or)
      
      Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`.
      
      Alive2 proof:
      merge_and: https://alive2.llvm.org/ce/z/teMR97
      merge_or: https://alive2.llvm.org/ce/z/b4yZUp
      xor_and: https://alive2.llvm.org/ce/z/_-TXHi
      xor_or: https://alive2.llvm.org/ce/z/2uYx_a
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D94861
      0441df94
    • Juneyoung Lee's avatar
      [SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or · 395c737d
      Juneyoung Lee authored
      This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of
      (x == C1 || x == C2 || ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible.
      D93065 has more context about the transition, including links to the list of optimizations being updated.
      
      Differential Revision: https://reviews.llvm.org/D93943
      395c737d
  3. Jan 18, 2021
  4. Jan 17, 2021
    • Dávid Bolvanský's avatar
      [InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) · ed396212
      Dávid Bolvanský authored
      ```
      unsigned r(int v)
      {
          return (1 | -(v < 0)) * v;
      }
      
      `r` is equivalent to `abs(v)`.
      
      ```
      
      ```
      define <4 x i8> @src(<4 x i8> %0) {
      %1:
        %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 }
        %3 = or <4 x i8> %2, { 1, 1, 1, undef }
        %4 = mul nsw <4 x i8> %3, %0
        ret <4 x i8> %4
      }
      =>
      define <4 x i8> @tgt(<4 x i8> %0) {
      %1:
        %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 }
        %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0
        %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0
        ret <4 x i8> %4
      }
      Transformation seems to be correct!
      ```
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D94874
      ed396212
  5. Jan 16, 2021
    • Nikita Popov's avatar
      [InstCombine] Replace one-use select operand based on condition · 5238e7b3
      Nikita Popov authored
      InstCombine already performs a fold where X == Y ? f(X) : Z is
      transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
      if f(X) only has one use, then we can always directly replace the
      use inside the instruction. To actually be profitable, limit it to
      the case where Y is a non-expr constant.
      
      This could be further extended to replace uses further up a one-use
      instruction chain, but for now this only looks one level up.
      
      Among other things, this also subsumes D94860.
      
      Differential Revision: https://reviews.llvm.org/D94862
      5238e7b3
    • Roman Lebedev's avatar
      [SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree · 32fc3231
      Roman Lebedev authored
      When removing catchpad's from catchswitch, if that removes a successor,
      we need to record that in DomTreeUpdater.
      
      This fixes PostDomTree preservation failure in an existing test.
      This appears to be the single issue that i see in my current test coverage.
      32fc3231
    • Sanjay Patel's avatar
      [SLP] remove opcode field from reduction data class · 49b96cd9
      Sanjay Patel authored
      This is NFC-intended and another step towards supporting
      intrinsics as reduction candidates.
      
      The remaining bits of the OperationData class do not make
      much sense as-is, so I will try to improve that, but I'm
      trying to take minimal steps because it's still not clear
      how this was intended to work.
      49b96cd9
    • Sanjay Patel's avatar
      [SLP] fix typos; NFC · fcfcc3cc
      Sanjay Patel authored
      fcfcc3cc
Loading