Skip to content
  1. Jan 02, 2011
    • Chris Lattner's avatar
      Allow loop-idiom to run on multiple BB loops, but still only scan the loop · ddf58010
      Chris Lattner authored
      header for now for memset/memcpy opportunities.  It turns out that loop-rotate
      is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
      loops" into 2 basic block loops that loop-idiom was ignoring.
      
      With this fix, we form many *many* more memcpy and memsets than before, including
      on the "history" loops in the viterbi benchmark, which look like this:
      
              for (j=0; j<MAX_history; ++j) {
                history_new[i][j+1] = history[2*i][j];
              }
      
      Transforming these loops into memcpy's speeds up the viterbi benchmark from
      11.98s to 3.55s on my machine.  Woo.
      
      llvm-svn: 122685
      ddf58010
  2. Jan 01, 2011
  3. Dec 28, 2010
  4. Dec 23, 2010
  5. Dec 19, 2010
    • Chris Lattner's avatar
      recognize an unsigned add with overflow idiom into uadd. · 5e0c0c72
      Chris Lattner authored
      This resolves a README entry and technically resolves PR4916,
      but we still get poor code for the testcase in that PR because
      GVN isn't CSE'ing uadd with add, filed as PR8817.
      
      Previously we got:
      
      _test7:                                 ## @test7
      	addq	%rsi, %rdi
      	cmpq	%rdi, %rsi
      	movl	$42, %eax
      	cmovaq	%rsi, %rax
      	ret
      
      Now we get:
      
      _test7:                                 ## @test7
      	addq	%rsi, %rdi
      	movl	$42, %eax
      	cmovbq	%rsi, %rax
      	ret
      
      llvm-svn: 122182
      5e0c0c72
  6. Dec 15, 2010
  7. Dec 13, 2010
  8. Dec 11, 2010
  9. Nov 23, 2010
  10. Nov 22, 2010
  11. Nov 21, 2010
  12. Nov 11, 2010
  13. Nov 09, 2010
  14. Nov 07, 2010
  15. Nov 06, 2010
  16. Sep 30, 2010
  17. Sep 19, 2010
  18. Aug 08, 2010
  19. Jul 08, 2010
    • Benjamin Kramer's avatar
      Teach instcombine to transform · 2321e6a4
      Benjamin Kramer authored
      (X >s -1) ? C1 : C2 and (X <s  0) ? C2 : C1
      into ((X >>s 31) & (C2 - C1)) + C1, avoiding the conditional.
      
      This optimization could be extended to take non-const C1 and C2 but we better
      stay conservative to avoid code size bloat for now.
      
      for
      int sel(int n) {
           return n >= 0 ? 60 : 100;
      }
      
      we now generate
        sarl  $31, %edi
        andl  $40, %edi
        leal  60(%rdi), %eax
      
      instead of
        testl %edi, %edi
        movl  $60, %ecx
        movl  $100, %eax
        cmovnsl %ecx, %eax
      
      llvm-svn: 107866
      2321e6a4
  20. Jul 03, 2010
  21. Jun 30, 2010
  22. Jun 16, 2010
  23. Jun 12, 2010
  24. May 22, 2010
  25. May 03, 2010
  26. Apr 17, 2010
  27. Apr 15, 2010
    • Chris Lattner's avatar
      Implement rdar://7860110 (also in target/readme.txt) narrowing · 4041ab6e
      Chris Lattner authored
      a load/or/and/store sequence into a narrower store when it is
      safe.  Daniel tells me that clang will start producing this sort
      of thing with bitfields, and this does  trigger a few dozen times
      on 176.gcc produced by llvm-gcc even now.
      
      This compiles code like CodeGen/X86/2009-05-28-DAGCombineCrash.ll 
      into:
      
              movl    %eax, 36(%rdi)
      
      instead of:
      
              movl    $4294967295, %eax       ## imm = 0xFFFFFFFF
              andq    32(%rdi), %rax
              shlq    $32, %rcx
              addq    %rax, %rcx
              movq    %rcx, 32(%rdi)
      
      and each of the testcases into a single store.  Each of them used
      to compile into craziness like this:
      
      _test4:
      	movl	$65535, %eax            ## imm = 0xFFFF
      	andl	(%rdi), %eax
      	shll	$16, %esi
      	addl	%eax, %esi
      	movl	%esi, (%rdi)
      	ret
      
      llvm-svn: 101343
      4041ab6e
  28. Mar 10, 2010
Loading