Skip to content
  1. Oct 09, 2005
  2. Oct 07, 2005
  3. Oct 03, 2005
    • Chris Lattner's avatar
      Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In · f07a587c
      Chris Lattner authored
      particular, it should realize that phi's use their values in the pred block
      not the phi block itself.  This change turns our em3d loop from this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_6    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
              addi r3, r2, 1
              blr
      LBB_test_6:     ; loopexit
              or r3, r2, r2
              blr
      
      into:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r2, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r6, r6
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              or r2, r6, r6
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r2, r2
              blr
      
      
      Unfortunately, this is actually worse code, because the register coallescer
      is getting confused somehow.  If it were doing its job right, it could turn the
      code into this:
      
      _test:
              cmpwi cr0, r4, 0
              bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
      LBB_test_1:     ; entry.loopexit_crit_edge
              li r6, 0
              b LBB_test_5    ; loopexit
      LBB_test_2:     ; entry.no_exit_crit_edge
              li r6, 0
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      LBB_test_5:     ; loopexit
              or r3, r6, r6
              blr
      
      ... which I'll work on next. :)
      
      llvm-svn: 23604
      f07a587c
    • Chris Lattner's avatar
      Refactor some code into a function · e4ed42a4
      Chris Lattner authored
      llvm-svn: 23603
      e4ed42a4
    • Chris Lattner's avatar
      This break is bogus and I have no idea why it was there. Basically it prevents · 360928db
      Chris Lattner authored
      memoizing code when IV's are used by phinodes outside of loops.  In a simple
      example, we were getting this code before (note that r6 and r7 are isomorphic
      IV's):
      
              li r6, 0
              or r7, r6, r6
      LBB_test_3:     ; no_exit
              lwz r2, 0(r3)
              cmpw cr0, r2, r5
              or r2, r7, r7
              beq cr0, LBB_test_5     ; loopexit
      LBB_test_4:     ; endif
              addi r2, r7, 1
              addi r7, r7, 1
              addi r3, r3, 4
              addi r6, r6, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      Now we get:
      
              li r6, 0
      LBB_test_3:     ; no_exit
              or r2, r6, r6
              lwz r6, 0(r3)
              cmpw cr0, r6, r5
              beq cr0, LBB_test_6     ; loopexit
      LBB_test_4:     ; endif
              addi r3, r3, 4
              addi r6, r2, 1
              cmpw cr0, r6, r4
              blt cr0, LBB_test_3     ; no_exit
      
      this was noticed in em3d.
      
      llvm-svn: 23602
      360928db
    • Chris Lattner's avatar
      when checking if we should move a split edge block outside of a loop, · 8fcce170
      Chris Lattner authored
      check the presplit pred, not the post-split pred.  This was causing us
      to make the wrong decision in some cases, leaving the critical edge block
      in the loop.
      
      llvm-svn: 23601
      8fcce170
  4. Oct 01, 2005
  5. Sep 29, 2005
  6. Sep 28, 2005
  7. Sep 27, 2005
    • Chris Lattner's avatar
      Avoid spilling stack slots... to stack slots. · e285f5ed
      Chris Lattner authored
      llvm-svn: 23478
      e285f5ed
    • Chris Lattner's avatar
      Completely rewrite 'correct' eh support. This changes how setjmp insertion · 87eb2493
      Chris Lattner authored
      is performed so it is only at most once per function that contains an invoke
      instead of once per invoke in the function.  This patch has the following perks:
      
      1. It fixes PR631, which complains about slowness.
      2. If fixes PR240, which complains about non-volatile vars being live across
         setjmp/longjmps.
      3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
         forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
      4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
         now about 4% faster than GCC.
      
      Further improvements are also possible.
      
      llvm-svn: 23477
      87eb2493
    • Chris Lattner's avatar
      Make the pass name simpler · 92233d21
      Chris Lattner authored
      llvm-svn: 23476
      92233d21
  8. Sep 26, 2005
  9. Sep 25, 2005
  10. Sep 18, 2005
    • Chris Lattner's avatar
      Refactor this code a bit and make it more general. This now compiles: · b4b2530a
      Chris Lattner authored
      struct S { unsigned int i : 6, j : 11, k : 15; } b;
      void plus2 (unsigned int x) { b.j += x; }
      
      To:
      
      _plus2:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r4, 0(r2)
              slwi r3, r3, 6
              add r3, r4, r3
              rlwimi r3, r4, 0, 26, 14
              stw r3, 0(r2)
              blr
      
      
      instead of:
      
      _plus2:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r4, 0(r2)
              rlwinm r5, r4, 26, 21, 31
              add r3, r5, r3
              rlwimi r4, r3, 6, 15, 25
              stw r4, 0(r2)
              blr
      
      by eliminating an 'and'.
      
      I'm pretty sure this is as small as we can go :)
      
      llvm-svn: 23386
      b4b2530a
    • Chris Lattner's avatar
      Compile · 797dee77
      Chris Lattner authored
      struct S { unsigned int i : 6, j : 11, k : 15; } b;
      void plus2 (unsigned int x) {
        b.j += x;
      }
      
      to:
      
      plus2:
              mov %EAX, DWORD PTR [b]
              mov %ECX, %EAX
              and %ECX, 131008
              mov %EDX, DWORD PTR [%ESP + 4]
              shl %EDX, 6
              add %EDX, %ECX
              and %EDX, 131008
              and %EAX, -131009
              or %EDX, %EAX
              mov DWORD PTR [b], %EDX
              ret
      
      instead of:
      
      plus2:
              mov %EAX, DWORD PTR [b]
              mov %ECX, %EAX
              shr %ECX, 6
              and %ECX, 2047
              add %ECX, DWORD PTR [%ESP + 4]
              shl %ECX, 6
              and %ECX, 131008
              and %EAX, -131009
              or %ECX, %EAX
              mov DWORD PTR [b], %ECX
              ret
      
      llvm-svn: 23385
      797dee77
    • Chris Lattner's avatar
      Generalize this transform, using MaskedValueIsZero, allowing us to compile: · 01f56c68
      Chris Lattner authored
      struct S { unsigned int i : 6, j : 11, k : 15; } b;
      void plus3 (unsigned int x) { b.k += x; }
      
      To:
      
      plus3:
              mov %EAX, DWORD PTR [%ESP + 4]
              shl %EAX, 17
              add DWORD PTR [b], %EAX
              ret
      
      instead of:
      
      plus3:
              mov %EAX, DWORD PTR [%ESP + 4]
              shl %EAX, 17
              mov %ECX, DWORD PTR [b]
              add %EAX, %ECX
              and %EAX, -131072
              and %ECX, 131071
              or %ECX, %EAX
              mov DWORD PTR [b], %ECX
              ret
      
      llvm-svn: 23384
      01f56c68
    • Chris Lattner's avatar
      fix typeo · 4ebc8ab4
      Chris Lattner authored
      llvm-svn: 23383
      4ebc8ab4
    • Chris Lattner's avatar
      Remove unintentionally committed code · e5b23a6d
      Chris Lattner authored
      llvm-svn: 23382
      e5b23a6d
    • Chris Lattner's avatar
      implement shift.ll:test25. This compiles: · 27cb9dbd
      Chris Lattner authored
      struct S { unsigned int i : 6, j : 11, k : 15; } b;
      void plus3 (unsigned int x) {
        b.k += x;
      }
      
      to:
      
      _plus3:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r3, 0(r2)
              rlwinm r4, r3, 0, 0, 14
              add r4, r4, r3
              rlwimi r4, r3, 0, 15, 31
              stw r4, 0(r2)
              blr
      
      instead of:
      
      _plus3:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r4, 0(r2)
              srwi r5, r4, 17
              add r3, r5, r3
              slwi r3, r3, 17
              rlwimi r3, r4, 0, 15, 31
              stw r3, 0(r2)
              blr
      
      llvm-svn: 23381
      27cb9dbd
    • Chris Lattner's avatar
      Implement add.ll:test29. Codegening: · af517574
      Chris Lattner authored
      struct S { unsigned int i : 6, j : 11, k : 15; } b;
      void plus1 (unsigned int x) {
        b.i += x;
      }
      
      as:
      _plus1:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r4, 0(r2)
              add r3, r4, r3
              rlwimi r3, r4, 0, 0, 25
              stw r3, 0(r2)
              blr
      
      instead of:
      
      _plus1:
              lis r2, ha16(L_b$non_lazy_ptr)
              lwz r2, lo16(L_b$non_lazy_ptr)(r2)
              lwz r4, 0(r2)
              rlwinm r5, r4, 0, 26, 31
              add r3, r5, r3
              rlwimi r3, r4, 0, 0, 25
              stw r3, 0(r2)
              blr
      
      llvm-svn: 23379
      af517574
    • Chris Lattner's avatar
      remove debug output · 027eaf01
      Chris Lattner authored
      llvm-svn: 23377
      027eaf01
    • Chris Lattner's avatar
      Implement or.ll:test21. This teaches instcombine to be able to turn this: · 15212989
      Chris Lattner authored
      struct {
         unsigned int bit0:1;
         unsigned int ubyte:31;
      } sdata;
      
      void foo() {
        sdata.ubyte++;
      }
      
      into this:
      
      foo:
              add DWORD PTR [sdata], 2
              ret
      
      instead of this:
      
      foo:
              mov %EAX, DWORD PTR [sdata]
              mov %ECX, %EAX
              add %ECX, 2
              and %ECX, -2
              and %EAX, 1
              or %EAX, %ECX
              mov DWORD PTR [sdata], %EAX
              ret
      
      llvm-svn: 23376
      15212989
  11. Sep 14, 2005
  12. Sep 13, 2005
  13. Sep 12, 2005
    • Chris Lattner's avatar
      Fix a regression from last night, which caused this pass to create invalid · 8048b85e
      Chris Lattner authored
      code for IV uses outside of loops that are not dominated by the latch block.
      We should only convert these uses to use the post-inc value if they ARE
      dominated by the latch block.
      
      Also use a new LoopInfo method to simplify some code.
      
      This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
      
      llvm-svn: 23318
      8048b85e
    • Chris Lattner's avatar
      _test: · a6764839
      Chris Lattner authored
              li r2, 0
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r2, 1
              stw r2, 0(r4)
              blr
      [zion ~/llvm]$ cat > ~/xx
      Uses of IV's outside of the loop should use hte post-incremented version
      of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
      which used to generate code like this (this is the code from the
      dont-hoist-simple-loop-constants.ll testcase):
      
      _test:
              li r2, 0                 **** IV starts at 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2            **** Copy for loop exit
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2           **** IV+2
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2       ****  IV+2
              stw r2, 0(r4)
              blr
      
      And now generated code like this:
      
      _test:
              li r2, 1               *** IV starts at 1
      LBB_test_1:     ; no_exit.2
              li r5, 0
              stw r5, 0(r3)
              addi r2, r2, 1
              addi r3, r3, 4
              cmpwi cr0, r2, 701     *** IV.postinc + 0
              blt cr0, LBB_test_1
      LBB_test_2:     ; loopexit.2.loopexit
              stw r2, 0(r4)          *** IV.postinc + 0
              blr
      
      llvm-svn: 23313
      a6764839
  14. Sep 10, 2005
    • Chris Lattner's avatar
      implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll. · 530fe6ab
      Chris Lattner authored
      We used to emit this code for it:
      
      _test:
              li r2, 1     ;; Value tying up a register for the whole loop
              li r5, 0
      LBB_test_1:     ; no_exit.2
              or r6, r5, r5
              li r5, 0
              stw r5, 0(r3)
              addi r5, r6, 1
              addi r3, r3, 4
              add r7, r2, r5  ;; should be addi r7, r5, 1
              cmpwi cr0, r7, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r6, 2
              stw r2, 0(r4)
              blr
      
      now we emit this:
      
      _test:
              li r2, 0
      LBB_test_1:     ; no_exit.2
              or r5, r2, r2
              li r2, 0
              stw r2, 0(r3)
              addi r3, r3, 4
              addi r2, r5, 1
              addi r6, r5, 2   ;; whoa, fold those adds!
              cmpwi cr0, r6, 701
              blt cr0, LBB_test_1     ; no_exit.2
      LBB_test_2:     ; loopexit.2.loopexit
              addi r2, r5, 2
              stw r2, 0(r4)
              blr
      
      more improvement coming.
      
      llvm-svn: 23306
      530fe6ab
  15. Sep 02, 2005
  16. Aug 24, 2005
  17. Aug 17, 2005
Loading