Skip to content
  1. Jul 11, 2018
  2. Jul 09, 2018
  3. Jul 06, 2018
  4. Jul 05, 2018
  5. Jul 04, 2018
  6. Jul 02, 2018
    • QingShan Zhang's avatar
      [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's... · 3b2aa2b4
      QingShan Zhang authored
      [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store
      
      For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse.
      
      unsigned long long result = 0;
      unsigned long long foo(char *p, unsigned long long n) {
        for (unsigned long long i = 0; i < n; i++) {
          unsigned long long x1 = *(unsigned long long *)(p - 50000 + i);
          unsigned long long x2 = *(unsigned long long *)(p - 61024 + i);
          unsigned long long x3 = *(unsigned long long *)(p - 62048 + i);
          unsigned long long x4 = *(unsigned long long *)(p - 64096 + i);
          result *= x1 * x2 * x3 * x4;
        }
        return result;
      }
      
      Patch by jedilyn(Kewen Lin).
      
      Differential Revision: https://reviews.llvm.org/D48813 
      --This line, and  those below, will be ignored--
      
      M    lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
      A    test/CodeGen/PowerPC/preincprep-i64-check.ll
      
      llvm-svn: 336074
      3b2aa2b4
  7. Jun 27, 2018
    • Sanjay Patel's avatar
      [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros · d052de85
      Sanjay Patel authored
      As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
      can produce -0.0 where the original code does not:
      
      #include <stdio.h>
        
      int main(int argc) {
        float x;
        x = -0.8 * argc;
        printf("%f\n", (float)((int)x));
        return 0;
      }
      
      $ clang -O0 -mavx fp.c ; ./a.out 
      0.000000
      $ clang -O1 -mavx fp.c ; ./a.out 
      -0.000000
      
      Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
      doesn't currently allow fast-math-flags on the cast instructions. So for now, 
      just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
      option.
      
      Differential Revision: https://reviews.llvm.org/D48085
      
      llvm-svn: 335761
      d052de85
  8. Jun 24, 2018
    • Sanjay Patel's avatar
      [DAGCombiner] eliminate setcc bool math when input is low-bit of some value · 962ee178
      Sanjay Patel authored
      This patch has the same motivating example as D48466:
      define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
          %c.0282 = and i32 %c.0282.in, 268435455
          %a16 = lshr i64 32508, %x
          %a17 = and i64 %a16, 1
          %tobool = icmp eq i64 %a17, 0
          %. = select i1 %tobool, i32 1, i32 2
          %.286 = select i1 %tobool, i32 27, i32 26
          %shr97 = lshr i32 %c.0282, %.
          %shl98 = shl i32 %c.0282.in, %.286
          %or99 = or i32 %shr97, %shl98
          %shr100 = lshr i32 %d.0280, %.
          %shl101 = shl i32 %d.0280, %.286
          %or102 = or i32 %shr100, %shl101
          store i32 %or99, i32* %ptr0
          store i32 %or102, i32* %ptr1
          ret void
      }
      
      ...but I'm trying to kill the setcc bool math sooner rather than later.
      
      By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, 
      we can create a universally good fold because we always eliminate the condition code 
      intermediate value.
      
      Here are Alive proofs for these (currently instcombine folds the 'add' variants, but 
      misses the 'sub' patterns):
      https://rise4fun.com/Alive/Gsyp
      
      Name: sub of zext cmp mask
        %a = and i8 %x, 1
        %c = icmp eq i8 %a, 0
        %z = zext i1 %c to i32
        %r = sub i32 C1, %z
        =>
        %optional_cast = zext i8 %a to i32
        %r = add i32 %optional_cast, C1-1
      
      Name: add of zext cmp mask
        %a = and i32 %x, 1
        %c = icmp eq i32 %a, 0
        %z = zext i1 %c to i8
        %r = add i8 %z, C1
        =>
        %optional_cast = trunc i32 %a to i8
        %r = sub i8 C1+1, %optional_cast
      
      All of the tests look like improvements or neutral to me. But it is possible that x86 
      test+set+bitop is better than what we now show here. I suspect we could do better by 
      adding another fold for the 'sub' variants.
      
      We start with select-of-constant in IR in the larger motivating test, so that's why I 
      included tests with selects. Proofs for those variants:
      https://rise4fun.com/Alive/Bx1
      
      Name: true const is bigger
      Pre: C2 == (C1 + 1)
        %a = and i8 %x, 1
        %c = icmp eq i8 %a, 0
        %r = select i1 %c, i64 C2, i64 C1
        =>
        %z = zext i8 %a to i64
        %r = sub i64 C2, %z
      
      Name: false const is bigger
      Pre: C2 == (C1 + 1)
        %a = and i8 %x, 1
        %c = icmp eq i8 %a, 0
        %r = select i1 %c, i64 C1, i64 C2
        =>
        %z = zext i8 %a to i64
        %r = add i64 C1, %z
      
      Differential Revision: https://reviews.llvm.org/D48466
      
      llvm-svn: 335433
      962ee178
  9. Jun 23, 2018
  10. Jun 22, 2018
  11. Jun 21, 2018
    • Mikael Holmen's avatar
      [DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property · 42f7bc96
      Mikael Holmen authored
      Summary:
      In some cases, these operands lacked the IsDebug property, which is meant to signal that
      they should not affect codegen. This patch adds a check for this property in the
      MachineVerifier and adds it where it was missing.
      
      This includes refactorings to use MachineInstrBuilder construction functions instead of
      manually setting up the intrinsic everywhere.
      
      Patch by: JesperAntonsson
      
      Reviewers: aprantl, rnk, echristo, javed.absar
      
      Reviewed By: aprantl
      
      Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48319
      
      llvm-svn: 335214
      42f7bc96
    • Alina Sbirlea's avatar
      Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. · dfd14ade
      Alina Sbirlea authored
      Summary:
      Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
      1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
      2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
      
      Prior to the patch:
      1. MergeBasicBlockIntoOnlyPred
      Updates either DomTree or DeferredDominance
      Moves all instructions from Pred to BB, deletes Pred
      Asserts BB has single predecessor
      If address was taken, replace the block address with constant 1 (?)
      
      2. MergeBlockIntoPredecessor
      Updates DomTree, LoopInfo and MemoryDependenceResults
      Moves all instruction from BB to Pred, deletes BB
      Returns if doesn't have a single predecessor
      Returns if BB's address was taken
      
      After the patch:
      Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
      Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
      Moves all instruction from BB to Pred, deletes BB
      Returns if doesn't have a single predecessor
      Returns if BB's address was taken
      
      Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
      
      1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
      Updated in this patch. No challenges.
      
      2. lib/CodeGen/CodeGenPrepare.cpp
      Updated in this patch.
        i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
        ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
      Some interesting aspects:
        - Since Pred is not deleted (BB is), the entry block does not need updating.
        - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
        - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
        - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
        - adding some test owner as subscribers for the interesting tests modified:
          test/CodeGen/X86/avx-cmp.ll
          test/CodeGen/AMDGPU/nested-loop-conditions.ll
          test/CodeGen/AMDGPU/si-annotate-cf.ll
          test/CodeGen/X86/hoist-spill.ll
          test/CodeGen/X86/2006-11-17-IllegalMove.ll
      
      3. lib/Transforms/Scalar/JumpThreading.cpp
      Not covered in this patch. It is the only use case using the DeferredDominance.
      I would defer to Brian Rzycki to make this replacement.
      
      Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
      
      Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48202
      
      llvm-svn: 335183
      dfd14ade
  12. Jun 20, 2018
  13. Jun 19, 2018
  14. Jun 18, 2018
  15. Jun 16, 2018
  16. Jun 08, 2018
  17. Jun 07, 2018
    • Hiroshi Inoue's avatar
      [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector · 01ef4c2c
      Hiroshi Inoue authored
      BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation.
      However, enforcing 32-bit instruction sometimes results in redundant generated code.
      For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF).
      
      uint64_t func(uint64_t a, uint64_t b) {
      	return (a & 0xFFFFFFFF) | (b << 32) ;
      }
      
      To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group.
      
      Differential Revision: https://reviews.llvm.org/D47867
      
      llvm-svn: 334195
      01ef4c2c
  18. Jun 06, 2018
    • Michael Berg's avatar
      guard fsqrt with fmf sub flags · cc1c4b69
      Michael Berg authored
      Summary:
      This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
      It contains only context for fsqrt.
      
      
      Reviewers: spatel, hfinkel, arsenm
      
      Reviewed By: spatel
      
      Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai
      
      Differential Revision: https://reviews.llvm.org/D47749
      
      llvm-svn: 334113
      cc1c4b69
  19. Jun 05, 2018
    • Michael Berg's avatar
      guard fneg with fmf sub flags · 96925fe0
      Michael Berg authored
      Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
      
      Reviewers: spatel, hfinkel
      
      Reviewed By: spatel
      
      Subscribers: nemanjai
      
      Differential Revision: https://reviews.llvm.org/D47389
      
      llvm-svn: 334037
      96925fe0
    • Michael Berg's avatar
      NFC: adding baseline fneg case for fmf · 8f6d6c81
      Michael Berg authored
      llvm-svn: 334035
      8f6d6c81
    • Hiroshi Inoue's avatar
      [PowerPC] reduce rotate in BitPermutationSelector · 955655f5
      Hiroshi Inoue authored
      BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers.
      Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation.
      
      For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation.
      This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5.
      
      This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first.
      
      Differential Revision: https://reviews.llvm.org/D47765
      
      llvm-svn: 334011
      955655f5
  20. May 29, 2018
  21. May 28, 2018
  22. May 24, 2018
    • Lei Huang's avatar
      [PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX · f4ec6782
      Lei Huang authored
      The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
      instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
      RA based on the register pressure. To avoid ambiguity, we need to remove the
      select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
      the same issue.
      
      Patch by Qing Shan Zhang (steven.zhang).
      
      Differential Revision: https://reviews.llvm.org/D47178
      
      llvm-svn: 333150
      f4ec6782
  23. May 23, 2018
  24. May 17, 2018
    • Sanjay Patel's avatar
      [PowerPC] preserve test intent by removing undef · 354842ab
      Sanjay Patel authored
      We need to clean up the DAG floating-point undef logic.
      This process is similar to how we handled integer undef
      logic in D43141.
      
      And as we did there, I'm trying to reduce the patch by
      changing tests that would probably become meaningless
      once we correct FP undef folding.
      
      llvm-svn: 332549
      354842ab
Loading