- Jun 15, 2012
-
-
Rafael Espindola authored
Patch extracted from a larger one by the PaX team. I added the testcases and tightened error handling a bit. llvm-svn: 158523
-
Duncan Sands authored
example degenerate phi nodes and binops that use themselves in unreachable code. Thanks to Charles Davis for the testcase that uncovered this can of worms. llvm-svn: 158508
-
Pete Cooper authored
Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access. Now with additional fix and test for indexing into a vector inside a struct llvm-svn: 158479
-
Rafael Espindola authored
globaldce. Globaldce was already removing linkonce globals, but globalopt was not. llvm-svn: 158476
-
- Jun 14, 2012
-
-
Akira Hatanaka authored
being used by Mips16 or Micro Mips 2. clean up a few lines too long encountered Patch by Reed Kotler. llvm-svn: 158470
-
Akira Hatanaka authored
the last instruction of a basic block. llvm-svn: 158468
-
Pete Cooper authored
This reverts commit 12c1f86ffa731e2952c80d2cc577000c96b8962c. llvm-svn: 158462
-
Pete Cooper authored
Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access. Now with additional fix and test for indexing into a vector inside a struct llvm-svn: 158454
-
Richard Barton authored
llvm-svn: 158445
-
Manman Ren authored
Sorry that I accidently checked in this file with my previous commit. llvm-svn: 158442
-
Manman Ren authored
uno && ueq was converted to ueq, it should be converted to uno. llvm-svn: 158441
-
Akira Hatanaka authored
llvm-svn: 158438
-
Akira Hatanaka authored
llvm-svn: 158435
-
- Jun 13, 2012
-
-
Akira Hatanaka authored
pattern: (add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt)) "tjt" is a TargetJumpTable node. llvm-svn: 158419
-
Akira Hatanaka authored
llvm-svn: 158414
-
Akira Hatanaka authored
llvm-svn: 158410
-
Richard Osborne authored
llvm-svn: 158409
-
Pete Cooper authored
Revert "Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access" This reverts commit 51786e0aaec76b973205066bd44f7f427b21969f. llvm-svn: 158408
-
Pete Cooper authored
Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access llvm-svn: 158407
-
Duncan Sands authored
combine to the absorbing element. Thanks to nbjoerg on IRC for pointing this out. llvm-svn: 158399
-
Craig Topper authored
Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396
-
Manman Ren authored
This patch extends FoldBranchToCommonDest to fold unconditional branches. For unconditional branches, we fold them if it is easy to update the phi nodes in the common successors. rdar://10554090 llvm-svn: 158392
-
Akira Hatanaka authored
until this directive is pushed in gas to open source fsf Patch by Reed Kotler. llvm-svn: 158381
-
Andrew Trick authored
For store->load dependencies that may alias, we should always use TrueMemOrderLatency, which may eventually become a subtarget hook. In effect, we should guarantee at least TrueMemOrderLatency on at least one DAG path from a store to a may-alias load. This should fix the standard mode as well as -enable-aa-sched-mi". llvm-svn: 158380
-
- Jun 12, 2012
-
-
Duncan Sands authored
POD type, causing memory corruption when mapping to APInts with bitwidth > 64. Merge another crash testcase into crash.ll while there. llvm-svn: 158369
-
Chad Rosier authored
Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 158368
-
Duncan Sands authored
topologies, it is quite possible for a leaf node to have huge multiplicity, for example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x raised to a vast power (the multiplicity, or weight, of x). This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value. It turns out that the weight for a value never needs more bits to represent than the value itself, so it is enough to represent weights as APInts of the same bitwidth and do the right overflow-avoiding dance steps when computing weights. As a side-effect it reduces the number of multiplies needed in some cases of large powers. While there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree static, pushing the rank computation out into users. This is progress towards fixing PR13021. llvm-svn: 158358
-
- Jun 11, 2012
-
-
Jakob Stoklund Olesen authored
The test is really checking the prolog/epilog load/store multiple formation. llvm-svn: 158328
-
Jakob Stoklund Olesen authored
Patch by James Benton! llvm-svn: 158316
-
Bill Wendling authored
We turned off the CMN instruction because it had semantics which we weren't getting correct. If we are comparing with an immediate, then it's okay to use the CMN instruction. <rdar://problem/7569620> llvm-svn: 158302
-
- Jun 10, 2012
-
-
Benjamin Kramer authored
This saves a cast, and zext is more expensive on platforms with subreg support than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750. On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the same performance now when not inlining either function. stupid_memchr: 323.0us bsd_memchr: 321.0us memchr: 479.0us where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time, I haven't fully understood the issue yet, something is grossly mangling the loop after inlining. llvm-svn: 158297
-
Hal Finkel authored
Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296
-
Nadav Rotem authored
Patch by Michael Kuperstein. llvm-svn: 158295
-
Hal Finkel authored
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283
-
- Jun 09, 2012
-
-
Craig Topper authored
llvm-svn: 158278
-
Hal Finkel authored
Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259
-
Jakob Stoklund Olesen authored
The fast register allocator is not supposed to work in the optimizing pipeline. It doesn't make sense to compute live intervals, run full copy coalescing, and then run RAFast. Fast register allocation in the optimizing pipeline is better done by RABasic. llvm-svn: 158242
-
Nuno Lopes authored
-%a + 42 into 42 - %a previously we were emitting: -(%a + 42) This fixes the infinite loop in PR12338. The generated code is still not perfect, though. Will work on that next llvm-svn: 158237
-
- Jun 08, 2012
-
-
Hal Finkel authored
Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223
-
Manman Ren authored
llvm-svn: 158218
-