- Jan 07, 2012
-
-
Evan Cheng authored
opportunities that only present themselves after late optimizations such as tail duplication .e.g. ## BB#1: movl %eax, %ecx movl %ecx, %eax ret The register allocator also leaves some of them around (due to false dep between copies from phi-elimination, etc.) This required some changes in codegen passes. Post-ra scheduler and the pseudo-instruction expansion passes have been moved after branch folding and tail merging. They were before branch folding before because it did not always update block livein's. That's fixed now. The pass change makes independently since we want to properly schedule instructions after branch folding / tail duplication. rdar://10428165 rdar://10640363 llvm-svn: 147716
-
Evan Cheng authored
exposed with an upcoming change will would delete the copy to return register because there is no use! It's amazing anything works. llvm-svn: 147715
-
Jakob Stoklund Olesen authored
This eliminates a lot of constant pool entries for -O0 builds of code with many global variable accesses. This speeds up -O0 codegen of consumer-typeset by 2x because the constant island pass no longer has to look at thousands of constant pool entries. <rdar://problem/10629774> llvm-svn: 147712
-
Andrew Trick authored
llvm-svn: 147711
-
Andrew Trick authored
llvm-svn: 147709
-
Andrew Trick authored
llvm-svn: 147707
-
Eric Christopher authored
Fixes rdar://10614894 llvm-svn: 147704
-
Andrew Trick authored
llvm-svn: 147703
-
Andrew Trick authored
llvm-svn: 147700
-
Chad Rosier authored
llvm-svn: 147696
-
Eric Christopher authored
to bleed from the eyes. llvm-svn: 147695
-
Eric Christopher authored
llvm-svn: 147694
-
Eric Christopher authored
llvm-svn: 147693
-
- Jan 06, 2012
-
-
Jakob Stoklund Olesen authored
Experiments show this to be a small speedup for modern ARM cores. llvm-svn: 147689
-
Andrew Trick authored
llvm-svn: 147686
-
Jakob Stoklund Olesen authored
llvm-svn: 147685
-
Andrew Trick authored
llvm-svn: 147683
-
Andrew Trick authored
llvm-svn: 147682
-
Chad Rosier authored
llvm-svn: 147679
-
Chad Rosier authored
llvm-svn: 147676
-
Chad Rosier authored
llvm-svn: 147675
-
Eric Christopher authored
lldb testsuite. rdar://10652330 llvm-svn: 147673
-
Kostya Serebryany authored
llvm-svn: 147667
-
Eli Bendersky authored
llvm-svn: 147654
-
Eric Christopher authored
the debug type accelerator tables to contain the tag and a flag stating whether or not a compound type is a complete type. rdar://10652330 llvm-svn: 147651
-
Dan Gohman authored
present in the bottom of the CFG triangle, as the transformation isn't ever valuable if the branch can't be eliminated. Also, unify some heuristics between SimplifyCFG's multiple if-converters, for consistency. This fixes rdar://10627242. llvm-svn: 147630
-
Eli Friedman authored
PR11705, part 2: globalopt shouldn't put inttoptr/ptrtoint operations into global initializers if there's an implied extension or truncation. llvm-svn: 147625
-
Rafael Espindola authored
System V Application Binary Interface. This lets us use -fvisibility-inlines-hidden with LTO. Fixes PR11697. llvm-svn: 147624
-
- Jan 05, 2012
-
-
Dan Gohman authored
code can incorrectly move the load across a store. This never happens in practice today, but only because the current heuristics accidentally preclude it. llvm-svn: 147623
-
Benjamin Kramer authored
llvm-svn: 147618
-
Nick Lewycky authored
Eliminate the dead test for it on each loop iteration. No functionality change. llvm-svn: 147616
-
Rafael Espindola authored
llvm-svn: 147615
-
Danil Malyshev authored
A small re-factored JIT/MCJIT::getPointerToNamedFunction(), so it could be called with the base class. llvm-svn: 147610
-
Sebastian Pop authored
llvm-svn: 147608
-
Chandler Carruth authored
llvm-svn: 147605
-
Chandler Carruth authored
a combined-away node and the result of the combine isn't substantially smaller than the input, it's just canonicalized. This is the first part of a significant (7%) performance gain for Snappy's hot decompression loop. llvm-svn: 147604
-
Craig Topper authored
llvm-svn: 147602
-
Victor Umansky authored
Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601
-
Andrew Trick authored
Minor postra scheduler cleanup. It could result in more precise antidependence latency on ARM in exceedingly rare cases. llvm-svn: 147594
-
Bill Wendling authored
This small bit of ASM code is sufficient to do what the old algorithm did: movq %rax, %xmm0 punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U } subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 } #ifdef __SSE3__ haddpd %xmm0, %xmm0 #else pshufd $0x4e, %xmm0, %xmm1 addpd %xmm1, %xmm0 #endif It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on all processors. <rdar://problem/7719814> llvm-svn: 147593
-