Commits · 768b41c17afc1950813088b2950a562d0c7db3ca · Roger Ferrer / llvm-epi-0.8

Jun 15, 2012

Factor macro argument parsing into helper methods and add support for .irp. · 768b41c1
Rafael Espindola authored Jun 15, 2012
```
Patch extracted from a larger one by the PaX team. I added the testcases
and tightened error handling a bit.

llvm-svn: 158523
```
768b41c1

Fix issues (infinite loop and/or crash) with self-referential instructions, for · 7838603f

Duncan Sands authored Jun 15, 2012

example degenerate phi nodes and binops that use themselves in unreachable code.
Thanks to Charles Davis for the testcase that uncovered this can of worms.

llvm-svn: 158508

7838603f

Recommit r158407: Allow SROA to look at a vector type and see if the offset is... · 1d1fa728

Pete Cooper authored Jun 14, 2012

Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access.  Now with additional fix and test for indexing into a vector inside a struct

llvm-svn: 158479

1d1fa728

Implement the isSafeToDiscardIfUnused predicate and use it in globalopt and · def1b09b
Rafael Espindola authored Jun 14, 2012
```
globaldce. Globaldce was already removing linkonce globals, but globalopt was
not.

llvm-svn: 158476
```
def1b09b

Jun 14, 2012
- 1. introduce MipsPat in place of Pat in order to exclude those from · d8ab16b8
  Akira Hatanaka authored Jun 14, 2012
```
being used by Mips16 or Micro Mips
2. clean up a few lines too long encountered

Patch by Reed Kotler.

llvm-svn: 158470
```
  d8ab16b8
- Make machine verifier check the first instruction of the last bundle instead of · 1b420ac4
  Akira Hatanaka authored Jun 14, 2012
```
the last instruction of a basic block.

llvm-svn: 158468
```
  1b420ac4
- Revert r158454: Allow SROA to look at a vector type... Its breaking the vectorise buildbot · 5d19452f
  Pete Cooper authored Jun 14, 2012
```
This reverts commit 12c1f86ffa731e2952c80d2cc577000c96b8962c.

llvm-svn: 158462
```
  5d19452f
- Recommit r158407: Allow SROA to look at a vector type and see if the offset is... · a7e6d58a
  Pete Cooper authored Jun 14, 2012
```
Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access.  Now with additional fix and test for indexing into a vector inside a struct

llvm-svn: 158454
```
  a7e6d58a
- Replace assertion failure for badly formatted CPS instrution with error message. · b0ec375b
  Richard Barton authored Jun 14, 2012
```
llvm-svn: 158445
```
  b0ec375b
- Revert: test/CodeGen/ARM/iabs.ll in r158441 · 2764301a
  Manman Ren authored Jun 14, 2012
```
Sorry that I accidently checked in this file with my previous commit.

llvm-svn: 158442
```
  2764301a
- InstCombine: fix a bug when combining (fcmp cc0 x, y) && (fcmp cc1 x, y). · c2bc2d10
  Manman Ren authored Jun 14, 2012
```
uno && ueq was converted to ueq, it should be converted to uno.

llvm-svn: 158441
```
  c2bc2d10
- Test case for MIPS long branch pass. · c6496e2c
  Akira Hatanaka authored Jun 14, 2012
```
llvm-svn: 158438
```
  c6496e2c
- Fix test cases. · 843aca93
  Akira Hatanaka authored Jun 14, 2012
```
llvm-svn: 158435
```
  843aca93
Jun 13, 2012

Implement a DAGCombine in MipsISelLowering.cpp which transforms the following · df5205ef
Akira Hatanaka authored Jun 13, 2012
```
pattern:

(add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt))

"tjt" is a TargetJumpTable node. 

llvm-svn: 158419
```
df5205ef
Set a higher value for maxStoresPerMemcpy in MipsISelLowering.cpp. · 1daf8c2a
Akira Hatanaka authored Jun 13, 2012
```
llvm-svn: 158414
```
1daf8c2a
Implement fastcc calling convention for MIPS. · f0273603
Akira Hatanaka authored Jun 13, 2012
```
llvm-svn: 158410
```
f0273603
Fix pattern for MKMSK instruction. · ab7d788e
Richard Osborne authored Jun 13, 2012
```
llvm-svn: 158409
```
ab7d788e

Revert "Allow SROA to look at a vector type and see if the offset is out of... · e2fe8097

Pete Cooper authored Jun 13, 2012

Revert "Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access"

This reverts commit 51786e0aaec76b973205066bd44f7f427b21969f.

llvm-svn: 158408

e2fe8097

Allow SROA to look at a vector type and see if the offset is out of range to... · e1d4e8b5
Pete Cooper authored Jun 13, 2012
```
Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access

llvm-svn: 158407
```
e1d4e8b5
It is possible for several constants which aren't individually absorbing to · 409d8ae1
Duncan Sands authored Jun 13, 2012
```
combine to the absorbing element.  Thanks to nbjoerg on IRC for pointing this 
out.

llvm-svn: 158399
```
409d8ae1

Fix intrinsics for XOP frczss/sd instructions. These instructions only take... · 71dc02d6

Craig Topper authored Jun 13, 2012

Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them.

llvm-svn: 158396

71dc02d6

SimplifyCFG: fold unconditional branch to its predecessor if profitable. · d33f4efb

Manman Ren authored Jun 13, 2012

This patch extends FoldBranchToCommonDest to fold unconditional branches.
For unconditional branches, we fold them if it is easy to update the phi nodes 
in the common successors.

rdar://10554090

llvm-svn: 158392

d33f4efb

disable use of directive .set nomicromips · 5fa54123

Akira Hatanaka authored Jun 13, 2012

until this directive is pushed in gas to open source fsf

Patch by Reed Kotler.

llvm-svn: 158381

5fa54123

sched: fix latency of memory dependence chain edges for consistency. · 344fb64f

Andrew Trick authored Jun 13, 2012

For store->load dependencies that may alias, we should always use
TrueMemOrderLatency, which may eventually become a subtarget hook. In
effect, we should guarantee at least TrueMemOrderLatency on at least
one DAG path from a store to a may-alias load.

This should fix the standard mode as well as -enable-aa-sched-mi".

llvm-svn: 158380

344fb64f

Jun 12, 2012

Use std::map rather than SmallMap because SmallMap assumes that the value has · 67cd5919

Duncan Sands authored Jun 12, 2012

POD type, causing memory corruption when mapping to APInts with bitwidth > 64.
Merge another crash testcase into crash.ll while there.

llvm-svn: 158369

67cd5919

[arm-fast-isel] Add support for -arm-long-calls. · c6916f88
Chad Rosier authored Jun 12, 2012
```
Patch by Jush Lu <jush.msn@gmail.com>.

llvm-svn: 158368
```
c6916f88

Now that Reassociate's LinearizeExprTree can look through arbitrary expression · d7aeefeb

Duncan Sands authored Jun 12, 2012

topologies, it is quite possible for a leaf node to have huge multiplicity, for
example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x
raised to a vast power (the multiplicity, or weight, of x). This patch fixes
the computation of weights by correctly computing them no matter how big they
are, rather than just overflowing and getting a wrong value. It turns out that
the weight for a value never needs more bits to represent than the value itself,
so it is enough to represent weights as APInts of the same bitwidth and do the
right overflow-avoiding dance steps when computing weights. As a side-effect it
reduces the number of multiplies needed in some cases of large powers. While
there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree
static, pushing the rank computation out into users. This is progress towards
fixing PR13021.

llvm-svn: 158358

d7aeefeb

Jun 11, 2012

Fix test that depends on register allocation. · e782fa64
Jakob Stoklund Olesen authored Jun 11, 2012
```
The test is really checking the prolog/epilog load/store multiple
formation.

llvm-svn: 158328
```
e782fa64
Fix test case to work on ARM. · 4e287774
Jakob Stoklund Olesen authored Jun 11, 2012
```
Patch by James Benton!

llvm-svn: 158316
```
4e287774

Re-enable the CMN instruction. · 4b79647a

Bill Wendling authored Jun 11, 2012

We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>

llvm-svn: 158302

4b79647a

Jun 10, 2012

InstCombine: Turn (zext A) == (B & (1<<X)-1) into A == (trunc B), narrowing the compare. · 8b8a7697

Benjamin Kramer authored Jun 10, 2012

This saves a cast, and zext is more expensive on platforms with subreg support
than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
same performance now when not inlining either function.

stupid_memchr: 323.0us
bsd_memchr: 321.0us
memchr: 479.0us

where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
I haven't fully understood the issue yet, something is grossly mangling the
loop after inlining.

llvm-svn: 158297

8b8a7697

Enable ILP scheduling for all nodes by default on PPC. · 4e9f1a85

Hal Finkel authored Jun 10, 2012

Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).

Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)

Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%

llvm-svn: 158296

4e9f1a85

Add AutoUpgrade support for the SSE4 ptest intrinsics. · 17ee58a7
Nadav Rotem authored Jun 10, 2012
```
Patch by Michael Kuperstein.

llvm-svn: 158295
```
17ee58a7

Improve ext/trunc patterns on PPC64. · 2edfbddc

Hal Finkel authored Jun 09, 2012

The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.

Thanks to Jakob and Owen for their suggestions in this matter.

llvm-svn: 158283

2edfbddc

Jun 09, 2012

Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. · 3352ba55
Craig Topper authored Jun 09, 2012
```
llvm-svn: 158278
```
3352ba55

Enable tail merging on PPC. · eb50c2d4

Hal Finkel authored Jun 09, 2012

Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).

Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%

Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%

This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.

llvm-svn: 158259

eb50c2d4

Don't run RAFast in the optimizing regalloc pipeline. · 33a1b416

Jakob Stoklund Olesen authored Jun 08, 2012

The fast register allocator is not supposed to work in the optimizing
pipeline. It doesn't make sense to compute live intervals, run full copy
coalescing, and then run RAFast.

Fast register allocation in the optimizing pipeline is better done by
RABasic.

llvm-svn: 158242

33a1b416

canonicalize: · 2710f1b0

Nuno Lopes authored Jun 08, 2012

-%a + 42
into
42 - %a

previously we were emitting:
-(%a + 42)

This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next

llvm-svn: 158237

2710f1b0

Jun 08, 2012

Enable PPC CTR loop formation by default. · c6b5debb

Hal Finkel authored Jun 08, 2012

Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223

c6b5debb

Test case for r158160 · bf86b295
Manman Ren authored Jun 08, 2012
```
llvm-svn: 158218
```
bf86b295