Skip to content
  1. May 23, 2006
    • Evan Cheng's avatar
      Better way to check for vararg. · 7068a93c
      Evan Cheng authored
      llvm-svn: 28440
      7068a93c
    • Evan Cheng's avatar
      Remove PreprocessCCCArguments and PreprocessFastCCArguments now that · 17e734f0
      Evan Cheng authored
      FORMAL_ARGUMENTS nodes include a token operand.
      
      llvm-svn: 28439
      17e734f0
    • Chris Lattner's avatar
      Implement an annoying part of the Darwin/X86 abi: the callee of a struct · 8be5be81
      Chris Lattner authored
      return argument pops the hidden struct pointer if present, not the caller.
      
      For example, in this testcase:
      
      struct X { int D, E, F, G; };
      struct X bar() {
        struct X a;
        a.D = 0;
        a.E = 1;
        a.F = 2;
        a.G = 3;
        return a;
      }
      void foo(struct X *P) {
        *P = bar();
      }
      
      We used to emit:
      
      _foo:
              subl $28, %esp
              movl 32(%esp), %eax
              movl %eax, (%esp)
              call _bar
              addl $28, %esp
              ret
      _bar:
              movl 4(%esp), %eax
              movl $0, (%eax)
              movl $1, 4(%eax)
              movl $2, 8(%eax)
              movl $3, 12(%eax)
              ret
      
      This is correct on Linux/X86 but not Darwin/X86.  With this patch, we now
      emit:
      
      _foo:
              subl $28, %esp
              movl 32(%esp), %eax
              movl %eax, (%esp)
              call _bar
      ***     addl $24, %esp
              ret
      _bar:
              movl 4(%esp), %eax
              movl $0, (%eax)
              movl $1, 4(%eax)
              movl $2, 8(%eax)
              movl $3, 12(%eax)
      ***     ret $4
      
      For the record, GCC emits (which is functionally equivalent to our new code):
      
      _bar:
              movl    4(%esp), %eax
              movl    $3, 12(%eax)
              movl    $2, 8(%eax)
              movl    $1, 4(%eax)
              movl    $0, (%eax)
              ret     $4
      _foo:
              pushl   %esi
              subl    $40, %esp
              movl    48(%esp), %esi
              leal    16(%esp), %eax
              movl    %eax, (%esp)
              call    _bar
              subl    $4, %esp
              movl    16(%esp), %eax
              movl    %eax, (%esi)
              movl    20(%esp), %eax
              movl    %eax, 4(%esi)
              movl    24(%esp), %eax
              movl    %eax, 8(%esi)
              movl    28(%esp), %eax
              movl    %eax, 12(%esi)
              addl    $40, %esp
              popl    %esi
              ret
      
      This fixes SingleSource/Benchmarks/CoyoteBench/fftbench with LLC and the
      JIT, and fixes the X86-backend portion of PR729.  The CBE still needs to
      be updated.
      
      llvm-svn: 28438
      8be5be81
  2. May 19, 2006
  3. May 17, 2006
  4. May 16, 2006
  5. May 12, 2006
  6. May 06, 2006
  7. May 05, 2006
  8. May 03, 2006
  9. Apr 28, 2006
  10. Apr 27, 2006
  11. Apr 26, 2006
  12. Apr 25, 2006
  13. Apr 24, 2006
  14. Apr 23, 2006
  15. Apr 22, 2006
    • Nate Begeman's avatar
      JumpTable support! What this represents is working asm and jit support for · 4ca2ea5b
      Nate Begeman authored
      x86 and ppc for 100% dense switch statements when relocations are non-PIC.
      This support will be extended and enhanced in the coming days to support
      PIC, and less dense forms of jump tables.
      
      llvm-svn: 27947
      4ca2ea5b
    • Evan Cheng's avatar
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor... · e728efdf
      Evan Cheng authored
      Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
      
      llvm-svn: 27946
      e728efdf
    • Evan Cheng's avatar
      Fix a performance regression. Use {p}shuf* when there are only two distinct... · 16ef94f4
      Evan Cheng authored
      Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
      
      llvm-svn: 27945
      16ef94f4
    • Evan Cheng's avatar
      Revamp build_vector lowering to take advantage of movss and movd instructions. · 14215c36
      Evan Cheng authored
      movd always clear the top 96 bits and movss does so when it's loading the
      value from memory.
      The net result is codegen for 4-wide shuffles is much improved. It is near
      optimal if one or more elements is a zero. e.g.
      
      __m128i test(int a, int b) {
        return _mm_set_epi32(0, 0, b, a);
      }
      
      compiles to
      
      _test:
      	movd 8(%esp), %xmm1
      	movd 4(%esp), %xmm0
      	punpckldq %xmm1, %xmm0
      	ret
      
      compare to gcc:
      
      _test:
      	subl	$12, %esp
      	movd	20(%esp), %xmm0
      	movd	16(%esp), %xmm1
      	punpckldq	%xmm0, %xmm1
      	movq	%xmm1, %xmm0
      	movhps	LC0, %xmm0
      	addl	$12, %esp
      	ret
      
      or icc:
      
      _test:
              movd      4(%esp), %xmm0                                #5.10
              movd      8(%esp), %xmm3                                #5.10
              xorl      %eax, %eax                                    #5.10
              movd      %eax, %xmm1                                   #5.10
              punpckldq %xmm1, %xmm0                                  #5.10
              movd      %eax, %xmm2                                   #5.10
              punpckldq %xmm2, %xmm3                                  #5.10
              punpckldq %xmm3, %xmm0                                  #5.10
              ret                                                     #5.10
      
      There are still room for improvement, for example the FP variant of the above example:
      
      __m128 test(float a, float b) {
        return _mm_set_ps(0.0, 0.0, b, a);
      }
      
      _test:
      	movss 8(%esp), %xmm1
      	movss 4(%esp), %xmm0
      	unpcklps %xmm1, %xmm0
      	xorps %xmm1, %xmm1
      	movlhps %xmm1, %xmm0
      	ret
      
      The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.
      
      llvm-svn: 27939
      14215c36
  16. Apr 21, 2006
  17. Apr 20, 2006
  18. Apr 19, 2006
  19. Apr 18, 2006
  20. Apr 17, 2006
Loading