Skip to content
  1. Aug 16, 2005
  2. Aug 15, 2005
  3. Aug 14, 2005
    • Nate Begeman's avatar
      Fix last night's PPC32 regressions by · d5e739dc
      Nate Begeman authored
      1. Not selecting the false value of a select_cc in the false arm, which
         isn't legal for nested selects.
      2. Actually returning the node we created and Legalized in the FP_TO_UINT
         Expander.
      
      llvm-svn: 22789
      d5e739dc
    • Nate Begeman's avatar
      Fix last night's X86 regressions by putting code for SSE in the if(SSE) · e5394d45
      Nate Begeman authored
      block.  nur.
      
      llvm-svn: 22788
      e5394d45
    • Andrew Lenharth's avatar
      only build .a on alpha · ed072338
      Andrew Lenharth authored
      llvm-svn: 22787
      ed072338
    • Nate Begeman's avatar
      Fix FP_TO_UINT with Scalar SSE2 now that the legalizer can handle it. We · 4d959f66
      Nate Begeman authored
      now generate the relatively good code sequences:
      unsigned short foo(float a) { return a; }
      _foo:
              movss 4(%esp), %xmm0
              cvttss2si %xmm0, %eax
              movzwl %ax, %eax
              ret
      
      and
      unsigned bar(float a) { return a; }
      _bar:
              movss .CPI_bar_0, %xmm0
              movss 4(%esp), %xmm1
              movapd %xmm1, %xmm2
              subss %xmm0, %xmm2
              cvttss2si %xmm2, %eax
              xorl $-2147483648, %eax
              cvttss2si %xmm1, %ecx
              ucomiss %xmm0, %xmm1
              cmovb %ecx, %eax
              ret
      
      llvm-svn: 22786
      4d959f66
    • Nate Begeman's avatar
      Teach the legalizer how to legalize FP_TO_UINT. · 36853ee1
      Nate Begeman authored
      Teach the legalizer to promote FP_TO_UINT to FP_TO_SINT if the wider
        FP_TO_UINT is also illegal.  This allows us on PPC to codegen
        unsigned short foo(float a) { return a; }
      
      as:
      _foo:
      .LBB_foo_0:     ; entry
              fctiwz f0, f1
              stfd f0, -8(r1)
              lwz r2, -4(r1)
              rlwinm r3, r2, 0, 16, 31
              blr
      
      instead of:
      _foo:
      .LBB_foo_0:     ; entry
              fctiwz f0, f1
              stfd f0, -8(r1)
              lwz r2, -4(r1)
              lis r3, ha16(.CPI_foo_0)
              lfs f0, lo16(.CPI_foo_0)(r3)
              fcmpu cr0, f1, f0
              blt .LBB_foo_2  ; entry
      .LBB_foo_1:     ; entry
              fsubs f0, f1, f0
              fctiwz f0, f0
              stfd f0, -16(r1)
              lwz r2, -12(r1)
              xoris r2, r2, 32768
      .LBB_foo_2:     ; entry
              rlwinm r3, r2, 0, 16, 31
              blr
      
      llvm-svn: 22785
      36853ee1
    • Nate Begeman's avatar
      Make FP_TO_UINT Illegal. This allows us to generate significantly better · 83f6b98c
      Nate Begeman authored
      codegen for FP_TO_UINT by using the legalizer's SELECT variant.
      
      Implement a codegen improvement for SELECT_CC, selecting the false node in
      the MBB that feeds the phi node.  This allows us to codegen:
      void foo(int *a, int b, int c) { int d = (a < b) ? 5 : 9; *a = d; }
      as:
      _foo:
              li r2, 5
              cmpw cr0, r4, r3
              bgt .LBB_foo_2  ; entry
      .LBB_foo_1:     ; entry
              li r2, 9
      .LBB_foo_2:     ; entry
              stw r2, 0(r3)
              blr
      
      insted of:
      _foo:
              li r2, 5
              li r5, 9
              cmpw cr0, r4, r3
              bgt .LBB_foo_2  ; entry
      .LBB_foo_1:     ; entry
              or r2, r5, r5
      .LBB_foo_2:     ; entry
              stw r2, 0(r3)
              blr
      
      llvm-svn: 22784
      83f6b98c
  4. Aug 13, 2005
    • Andrew Lenharth's avatar
      Testing a variable before it is defined doesn't work so well. It is a fairly... · 107a0a76
      Andrew Lenharth authored
      Testing a variable before it is defined doesn't work so well.  It is a fairly small thing, so just let everyone build the .a file
      
      llvm-svn: 22783
      107a0a76
    • Chris Lattner's avatar
      Ooops, don't forget to clear this. The real inner loop is now: · 47d3ec35
      Chris Lattner authored
      .LBB_foo_3:     ; no_exit.1
              lfd f2, 0(r9)
              lfd f3, 8(r9)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r9)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfd f2, 0(r9)
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22782
      47d3ec35
    • Chris Lattner's avatar
      Recursively scan scev expressions for common subexpressions. This allows us · 5949d490
      Chris Lattner authored
      to handle nested loops much better, for example, by being able to tell that
      these two expressions:
      
      {( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>
      
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      Have the following common part that can be shared:
      {(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
      
      This allows us to codegen an important inner loop in 168.wupwise as:
      
      .LBB_foo_4:     ; no_exit.1
              lfd f2, 16(r9)
              fmul f3, f0, f2
              fmul f2, f1, f2
              fadd f4, f3, f2
              stfd f4, 8(r9)
              fsub f2, f3, f2
              stfd f2, 16(r9)
              addi r8, r8, 1
              addi r9, r9, 16
              cmpw cr0, r8, r4
              ble .LBB_foo_4  ; no_exit.1
      
      instead of:
      
      .LBB_foo_3:     ; no_exit.1
              lfdx f2, r6, r9
              add r10, r6, r9
              lfd f3, 8(r10)
              fmul f4, f1, f2
              fmadd f4, f0, f3, f4
              stfd f4, 8(r10)
              fmul f3, f1, f3
              fmsub f2, f0, f2, f3
              stfdx f2, r6, r9
              addi r9, r9, 16
              addi r8, r8, 1
              cmpw cr0, r8, r4
              ble .LBB_foo_3  ; no_exit.1
      
      llvm-svn: 22781
      5949d490
    • Nate Begeman's avatar
      Remove an unncessary argument to SimplifySelectCC and add an additional · dc3154ec
      Nate Begeman authored
      assert when creating a select_cc node.
      
      llvm-svn: 22780
      dc3154ec
    • Nate Begeman's avatar
      Fix the fabs regression on x86 by abstracting the select_cc optimization · b6651e81
      Nate Begeman authored
      out into SimplifySelectCC.  This allows both ISD::SELECT and ISD::SELECT_CC
      to use the same set of simplifying folds.
      
      llvm-svn: 22779
      b6651e81
    • Nate Begeman's avatar
      Remove support for 64b PPC, it's been broken for a long time. It'll be · a22bf778
      Nate Begeman authored
      back once a DAG->DAG ISel exists.
      
      llvm-svn: 22778
      a22bf778
    • Andrew Lenharth's avatar
      Fix oversized GOT problem with gcc-4 on alpha · 6b62b479
      Andrew Lenharth authored
      llvm-svn: 22777
      6b62b479
    • Chris Lattner's avatar
      Teach SplitCriticalEdge to update LoopInfo if it is alive. This fixes · 89c1dfc7
      Chris Lattner authored
      a problem in LoopStrengthReduction, where it would split critical edges
      then confused itself with outdated loop information.
      
      llvm-svn: 22776
      89c1dfc7
    • Chris Lattner's avatar
      remove dead code. The exit block list is computed on demand, thus does not · 79396539
      Chris Lattner authored
      need to be updated.  This code is a relic from when it did.
      
      llvm-svn: 22775
      79396539
    • Chris Lattner's avatar
      implement a couple of simple shift foldings. · 21381e84
      Chris Lattner authored
      e.g.  (X & 7) >> 3   -> 0
      
      llvm-svn: 22774
      21381e84
    • Jim Laskey's avatar
      · 35960708
      Jim Laskey authored
      Fix for 2005-08-12-rlwimi-crash.ll.  Make allowance for masks being shifted to
      zero.
      
      llvm-svn: 22773
      35960708
    • Jim Laskey's avatar
      · 461edda7
      Jim Laskey authored
      Added test cases to guarantee use of ORC and ANDC.
      
      llvm-svn: 22772
      461edda7
    • Jim Laskey's avatar
      · a5687006
      Jim Laskey authored
      1. This changes handles the cases of (~x)&y and x&(~y) yielding ANDC, and
         (~x)|y and x|(~y) yielding ORC.
      
      llvm-svn: 22771
      a5687006
    • Chris Lattner's avatar
      testcase that crashed the ppc backend, distilled from crafty · f6a762ad
      Chris Lattner authored
      llvm-svn: 22770
      f6a762ad
    • Chris Lattner's avatar
      When splitting critical edges, make sure not to leave the new block in the · 8447b495
      Chris Lattner authored
      middle of the loop.  This turns a critical loop in gzip into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_8 ; loopentry.loopexit_crit_edge
      .LBB_test_2:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
      .LBB_test_3:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
      .LBB_test_4:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      instead of this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_3:    ; shortcirc_next.0
              add r28, r3, r27
              lhz r28, 5(r28)
              add r26, r4, r27
              lhz r26, 5(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_5 ; shortcirc_next.1
      .LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
              add r2, r11, r27
              add r8, r12, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_5:    ; shortcirc_next.1
              add r28, r3, r27
              lhz r28, 7(r28)
              add r26, r4, r27
              lhz r26, 7(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_7 ; shortcirc_next.2
      .LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
              add r2, r9, r27
              add r8, r10, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_7:    ; shortcirc_next.2
              add r28, r3, r27
              lhz r26, 9(r28)
              add r28, r4, r27
              lhz r25, 9(r28)
              addi r28, r27, 8
              cmpw cr7, r26, r25
              mfcr r26, 1
              rlwinm r26, r26, 31, 31, 31
              add r25, r8, r27
              cmpw cr7, r25, r7
              mfcr r25, 1
              rlwinm r25, r25, 29, 31, 31
              and. r26, r26, r25
              bne .LBB_test_1 ; loopentry
      
      Next up, improve the code for the loop.
      
      llvm-svn: 22769
      8447b495
    • Chris Lattner's avatar
      Add a helper method · e09bbc80
      Chris Lattner authored
      llvm-svn: 22768
      e09bbc80
    • Chris Lattner's avatar
      add a helper method · 1344253e
      Chris Lattner authored
      llvm-svn: 22767
      1344253e
    • Chris Lattner's avatar
      Fix a FIXME: if we are inserting code for a PHI argument, split the critical · 4fec86d3
      Chris Lattner authored
      edge so that the code is not always executed for both operands.  This
      prevents LSR from inserting code into loops whose exit blocks contain
      PHI uses of IV expressions (which are outside of loops).  On gzip, for
      example, we turn this ugly code:
      
      .LBB_test_1:    ; loopentry
              add r27, r3, r28
              lhz r27, 3(r27)
              add r26, r4, r28
              lhz r26, 3(r26)
              add r25, r30, r28    ;; Only live if exiting the loop
              add r24, r29, r28    ;; Only live if exiting the loop
              cmpw cr0, r27, r26
              bne .LBB_test_5 ; loopexit
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_test_9   ; loopexit
      .LBB_test_2:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      into this:
      
      .LBB_test_1:    ; loopentry
              or r27, r28, r28
              add r28, r3, r27
              lhz r28, 3(r28)
              add r26, r4, r27
              lhz r26, 3(r26)
              cmpw cr0, r28, r26
              beq .LBB_test_3 ; shortcirc_next.0
      .LBB_test_2:    ; loopentry.loopexit_crit_edge
              add r2, r30, r27
              add r8, r29, r27
              b .LBB_t_3:    ; shortcirc_next.0
      .LBB_test_3:    ; shortcirc_next.0
              ...
              blt .LBB_test_1
      
      
      Next step: get the block out of the loop so that the loop is all
      fall-throughs again.
      
      llvm-svn: 22766
      4fec86d3
  5. Aug 12, 2005
Loading