Skip to content
  1. Mar 12, 2009
    • Chris Lattner's avatar
      Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" · 4147f08e
      Chris Lattner authored
      related transformations out of target-specific dag combine into the
      ARM backend.  These were added by Evan in r37685 with no testcases
      and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
      
      Add some simple X86-specific (for now) DAG combines that turn things
      like cond ? 8 : 0  -> (zext(cond) << 3).  This happens frequently
      with the recently added cp constant select optimization, but is a
      very general xform.  For example, we now compile the second example
      in const-select.ll to:
      
      _test:
              movsd   LCPI2_0, %xmm0
              ucomisd 8(%esp), %xmm0
              seta    %al
              movzbl  %al, %eax
              movl    4(%esp), %ecx
              movsbl  (%ecx,%eax,4), %eax
              ret
      
      instead of:
      
      _test:
              movl    4(%esp), %eax
              leal    4(%eax), %ecx
              movsd   LCPI2_0, %xmm0
              ucomisd 8(%esp), %xmm0
              cmovbe  %eax, %ecx
              movsbl  (%ecx), %eax
              ret
      
      This passes multisource and dejagnu.
      
      llvm-svn: 66779
      4147f08e
    • Chris Lattner's avatar
      improve comment. · a492d29c
      Chris Lattner authored
      llvm-svn: 66778
      a492d29c
    • Evan Cheng's avatar
      On x86, if the only use of a i64 load is a i64 store, generate a pair of... · ef0b7cc2
      Evan Cheng authored
      On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
      
      llvm-svn: 66776
      ef0b7cc2
    • Dan Gohman's avatar
      Revert r66024. The JIT encoding for CALLpcrel32 is wrong -- see PR3773, and the · 5637df37
      Dan Gohman authored
      assembly text output uses an indirect call ("call *") instead of a direct call.
      
      llvm-svn: 66735
      5637df37
  2. Mar 11, 2009
  3. Mar 10, 2009
  4. Mar 08, 2009
  5. Mar 07, 2009
    • Duncan Sands's avatar
      Introduce new linkage types linkonce_odr, weak_odr, common_odr · 12da8ce3
      Duncan Sands authored
      and extern_weak_odr.  These are the same as the non-odr versions,
      except that they indicate that the global will only be overridden
      by an *equivalent* global.  In C, a function with weak linkage can
      be overridden by a function which behaves completely differently.
      This means that IP passes have to skip weak functions, since any
      deductions made from the function definition might be wrong, since
      the definition could be replaced by something completely different
      at link time.   This is not allowed in C++, thanks to the ODR
      (One-Definition-Rule): if a function is replaced by another at
      link-time, then the new function must be the same as the original
      function.  If a language knows that a function or other global can
      only be overridden by an equivalent global, it can give it the
      weak_odr linkage type, and the optimizers will understand that it
      is alright to make deductions based on the function body.  The
      code generators on the other hand map weak and weak_odr linkage
      to the same thing.
      
      llvm-svn: 66339
      12da8ce3
    • Dan Gohman's avatar
      Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b
      Dan Gohman authored
      the same say the "test" instruction does in overflow cases,
      so eliminating the test is only safe when those bits aren't
      needed, as is the case for COND_E and COND_NE, or if it
      can be proven that no overflow will occur. For now, just
      restrict the optimization to COND_E and COND_NE and don't
      do any overflow analysis.
      
      llvm-svn: 66318
      ff659b5b
  6. Mar 05, 2009
  7. Mar 04, 2009
  8. Mar 03, 2009
  9. Feb 28, 2009
  10. Feb 27, 2009
    • Rafael Espindola's avatar
      Refactor TLS code and add some tests. The tests and expected results are: · 000421ea
      Rafael Espindola authored
       pic |  declaration | linkage  | visibility |
      
      !pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
       pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
      !pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
       pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic
      
      !pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
       pic |  declaration | external | hidden     | X                       | local dynamic
      !pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
       pic | !declaration | external | hidden     | X                       | local dynamic
      
      !pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
       pic |  declaration | internal | default    | X                       | local dynamic
      
      The ones marked with an X have not been implemented since local dynamic is not implemented.
      
      llvm-svn: 65632
      000421ea
  11. Feb 26, 2009
  12. Feb 25, 2009
  13. Feb 24, 2009
  14. Feb 23, 2009
    • Dan Gohman's avatar
      Fast-isel can't do TLS yet, so it should fall back to SDISel · 318d7376
      Dan Gohman authored
      if it sees TLS addresses.
      
      llvm-svn: 65341
      318d7376
    • Evan Cheng's avatar
      Only v1i16 (i.e. _m64) is returned via RAX / RDX. · 9f8fddee
      Evan Cheng authored
      llvm-svn: 65313
      9f8fddee
    • Nate Begeman's avatar
      Generate better code for v8i16 shuffles on SSE2 · e684da3e
      Nate Begeman authored
      Generate better code for v16i8 shuffles on SSE2 (avoids stack)
      Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
      Document the shuffle matching logic and add some FIXMEs for later further
        cleanups.
      New tests that test the above.
      
      Examples:
      
      New:
      _shuf2:
      	pextrw	$7, %xmm0, %eax
      	punpcklqdq	%xmm1, %xmm0
      	pshuflw	$128, %xmm0, %xmm0
      	pinsrw	$2, %eax, %xmm0
      
      Old:
      _shuf2:
      	pextrw	$2, %xmm0, %eax
      	pextrw	$7, %xmm0, %ecx
      	pinsrw	$2, %ecx, %xmm0
      	pinsrw	$3, %eax, %xmm0
      	movd	%xmm1, %eax
      	pinsrw	$4, %eax, %xmm0
      	ret
      
      =========
      
      New:
      _shuf4:
      	punpcklqdq	%xmm1, %xmm0
      	pshufb	LCPI1_0, %xmm0
      
      Old:
      _shuf4:
      	pextrw	$3, %xmm0, %eax
      	movsd	%xmm1, %xmm0
      	pextrw	$3, %xmm1, %ecx
      	pinsrw	$4, %ecx, %xmm0
      	pinsrw	$5, %eax, %xmm0
      
      ========
      
      New:
      _shuf1:
      	pushl	%ebx
      	pushl	%edi
      	pushl	%esi
      	pextrw	$1, %xmm0, %eax
      	rolw	$8, %ax
      	movd	%xmm0, %ecx
      	rolw	$8, %cx
      	pextrw	$5, %xmm0, %edx
      	pextrw	$4, %xmm0, %esi
      	pextrw	$3, %xmm0, %edi
      	pextrw	$2, %xmm0, %ebx
      	movaps	%xmm0, %xmm1
      	pinsrw	$0, %ecx, %xmm1
      	pinsrw	$1, %eax, %xmm1
      	rolw	$8, %bx
      	pinsrw	$2, %ebx, %xmm1
      	rolw	$8, %di
      	pinsrw	$3, %edi, %xmm1
      	rolw	$8, %si
      	pinsrw	$4, %esi, %xmm1
      	rolw	$8, %dx
      	pinsrw	$5, %edx, %xmm1
      	pextrw	$7, %xmm0, %eax
      	rolw	$8, %ax
      	movaps	%xmm1, %xmm0
      	pinsrw	$7, %eax, %xmm0
      	popl	%esi
      	popl	%edi
      	popl	%ebx
      	ret
      
      Old:
      _shuf1:
      	subl	$252, %esp
      	movaps	%xmm0, (%esp)
      	movaps	%xmm0, 16(%esp)
      	movaps	%xmm0, 32(%esp)
      	movaps	%xmm0, 48(%esp)
      	movaps	%xmm0, 64(%esp)
      	movaps	%xmm0, 80(%esp)
      	movaps	%xmm0, 96(%esp)
      	movaps	%xmm0, 224(%esp)
      	movaps	%xmm0, 208(%esp)
      	movaps	%xmm0, 192(%esp)
      	movaps	%xmm0, 176(%esp)
      	movaps	%xmm0, 160(%esp)
      	movaps	%xmm0, 144(%esp)
      	movaps	%xmm0, 128(%esp)
      	movaps	%xmm0, 112(%esp)
      	movzbl	14(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	22(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	42(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	50(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm1, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	77(%esp), %eax
      	movd	%eax, %xmm1
      	movzbl	84(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm1, %xmm2
      	movzbl	104(%esp), %eax
      	movd	%eax, %xmm1
      	punpcklbw	%xmm1, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	movaps	%xmm0, %xmm1
      	punpcklbw	%xmm3, %xmm1
      	movzbl	127(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	135(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	155(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	163(%esp), %eax
      	movd	%eax, %xmm3
      	punpcklbw	%xmm0, %xmm3
      	punpcklbw	%xmm2, %xmm3
      	movzbl	188(%esp), %eax
      	movd	%eax, %xmm0
      	movzbl	197(%esp), %eax
      	movd	%eax, %xmm2
      	punpcklbw	%xmm0, %xmm2
      	movzbl	217(%esp), %eax
      	movd	%eax, %xmm4
      	movzbl	225(%esp), %eax
      	movd	%eax, %xmm0
      	punpcklbw	%xmm4, %xmm0
      	punpcklbw	%xmm2, %xmm0
      	punpcklbw	%xmm3, %xmm0
      	punpcklbw	%xmm1, %xmm0
      	addl	$252, %esp
      	ret
      
      llvm-svn: 65311
      e684da3e
    • Scott Michel's avatar
      Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR · 9d31aca6
      Scott Michel authored
      instruction. The class also consolidates the code for detecting constant
      splats that's shared across PowerPC and the CellSPU backends (and might be
      useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
      generating new BUILD_VECTOR nodes.
      
      llvm-svn: 65296
      9d31aca6
  15. Feb 22, 2009
Loading