Commits · 73f3d33dbbf2797d662eb12107b0ed5a8b91267f · Roger Ferrer / llvm-epi-0.8

Dec 05, 2013

Check hint registers for interference only once before evictions · 73f3d33d
Aditya Nandakumar authored Dec 05, 2013
```
llvm-svn: 196536
```
73f3d33d
Implemented vget/vset_lane_f16 intrinsics · 6b0a8c50
Ana Pazos authored Dec 05, 2013
```
llvm-svn: 196533
```
6b0a8c50
llvm-cov: Changed extension from .llcov to .gcov. · 9af3938b
Yuchen Wu authored Dec 05, 2013
```
llvm-svn: 196530
```
9af3938b
Revert part of GCC warning fix to fix debug build. · 79d55f5c
Matt Arsenault authored Dec 05, 2013
```
The typedef is used inside the DEBUG(), and apparently can't be moved
inside of it.

llvm-svn: 196528
```
79d55f5c
Fix minor GCC warnings. · c44a3ff6
Matt Arsenault authored Dec 05, 2013
```
Unused typedefs and unused variables.

llvm-svn: 196526
```
c44a3ff6

Change std::deque => std::vector. No functionality change. · 2bf0173b

Michael Gottesman authored Dec 05, 2013

There is no reason to use std::deque here over std::vector. Thus given the
performance differences inbetween the two it makes sense to change deque to
vector.

llvm-svn: 196524

2bf0173b

Fix non-deterministic behavior. · cdbde3aa

Rafael Espindola authored Dec 05, 2013

We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

llvm-svn: 196520

cdbde3aa

Rename DwarfUnits to DwarfFile to help avoid some naming confusion. · f8194853
Eric Christopher authored Dec 05, 2013
```
llvm-svn: 196519
```
f8194853

MI-Sched: Model "reserved" processor resources. · 5a22df49

Andrew Trick authored Dec 05, 2013

This allows a target to use MI-Sched as an in-order scheduler that
will model strict resource conflicts without defining a processor
itinerary. Instead, the target can now use the new per-operand machine
model and define in-order resources with BufferSize=0. For example,
this would allow restricting the type of operations that can be formed
into a dispatch group. (Normally NumMicroOps is sufficient to enforce
dispatch groups).

If the intent is to model latency in in-order pipeline, as opposed to
resource conflicts, then a resource with BufferSize=1 should be
defined instead.

This feature is only casually tested as there are no in-tree targets
using it yet. However, Hal will be experimenting with POWER7.

llvm-svn: 196517

5a22df49

MI-Sched: handle latency of in-order operations with the new machine model. · 880e573d

Andrew Trick authored Dec 05, 2013

The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516

880e573d

Fix the A9 machine model. VTRN writes two registers. · ff199a4b
Andrew Trick authored Dec 05, 2013
```
llvm-svn: 196514
```
ff199a4b
comment typo and reformat · bb1247b9
Andrew Trick authored Dec 05, 2013
```
llvm-svn: 196513
```
bb1247b9
Add a default constructor to get deterministic behavior. · 4cc2b873
Rafael Espindola authored Dec 05, 2013
```
Should fix the msan and valgrind bots.

llvm-svn: 196509
```
4cc2b873

SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use · 7ee53cac

Arnold Schwaighofer authored Dec 05, 2013

We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508

7ee53cac

[tsan] fix PR18146: sometimes a variable written into vptr could have an... · 2460c3fc

Kostya Serebryany authored Dec 05, 2013

[tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations)

llvm-svn: 196507

2460c3fc

[NVPTX] Fix off-by-one error when creating the VT list for an SDNode · 4459717b
Justin Holewinski authored Dec 05, 2013
```
llvm-svn: 196503
```
4459717b

[mips] Small code generation improvement for conditional operator (select) · a6beac1a

Matheus Almeida authored Dec 05, 2013

in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.

The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.

llvm-svn: 196498

a6beac1a

[mips] Add some comments related to the optimization performed in performSELECTCombine. · a611c0f4
Matheus Almeida authored Dec 05, 2013
```
The structure of the code was slightly modified so that the next patch is easier to read/review.

No functional changes.

llvm-svn: 196496
```
a611c0f4

[mips][msa] Fix issue with immediate fields of LD/ST instructions · 6b59c449

Matheus Almeida authored Dec 05, 2013

not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.

llvm-svn: 196494

6b59c449

ARM: fix yet another stack-folding bug · e4def5e2

Tim Northover authored Dec 05, 2013

We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493

e4def5e2

DwarfDebug/DwarfUnit: Push abbreviation structures down into DwarfUnits to reduce duplication · 0504cdaf
David Blaikie authored Dec 05, 2013
```
llvm-svn: 196479
```
0504cdaf
Use isIntrinsic() instead of checking for "llvm." · a68c9adc
Matt Arsenault authored Dec 05, 2013
```
llvm-svn: 196473
```
a68c9adc

Remove the isImplicitlyPrivate argument of getNameWithPrefix. · 117b20c4

Rafael Espindola authored Dec 05, 2013

getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472

117b20c4

Correct word hyphenations · f907b891

Alp Toker authored Dec 05, 2013

This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471

f907b891

Hide the stub created for MO_ExternalSymbol too. · 01d19d02

Rafael Espindola authored Dec 05, 2013

given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468

01d19d02

R600/SI: Add comments for number of used registers. · 89cc49fe
Matt Arsenault authored Dec 05, 2013
```
llvm-svn: 196467
```
89cc49fe

Try harder to get a consistent floating point results. · d50dbc78

Rafael Espindola authored Dec 05, 2013

This just extends the existing hack. It should be enough to get a reproducible bootstrap
on 32 bits.

I will open a bug to track getting a real fix for this.

llvm-svn: 196462

d50dbc78

For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64. · 65d8e342
Jiangning Liu authored Dec 05, 2013
```
llvm-svn: 196456
```
65d8e342

DwarfDebug: Avoid unnecessary abbreviation lookup when emitting DIEs · ff3ab2c2

David Blaikie authored Dec 05, 2013

DIEs already contain references directly to their DIEAbbrev, use that
instead of looking it up based on index.

llvm-svn: 196446

ff3ab2c2

DwarfDebug: Remove trivial function wrapper · 9a0b4029
David Blaikie authored Dec 05, 2013
```
llvm-svn: 196445
```
9a0b4029
80-column. · b9a69f61
Eric Christopher authored Dec 05, 2013
```
llvm-svn: 196442
```
b9a69f61
Remove special handling for DW_AT_ranges support by constructing the · c31fe2de
Eric Christopher authored Dec 05, 2013
```
values with the correct behavior.

llvm-svn: 196441
```
c31fe2de

[mc] Fix ELF st_other flag. · ee36595c

Logan Chien authored Dec 05, 2013

ELF_Other_Weakref and ELF_Other_ThumbFunc seems to be LLVM
internal ELF symbol flags.  These should not be emitted to
object file.

This commit defines ELF_STO_Shift for the target-defined
flags for st_other, and increase the value of
ELF_Other_Shift to 16.

llvm-svn: 196440

ee36595c

Fix comment. · 1c70b679
Eric Christopher authored Dec 05, 2013
```
llvm-svn: 196437
```
1c70b679
Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load. · 30bbb214
Cameron McInally authored Dec 05, 2013
```
Patch by Aleksey Bader.

llvm-svn: 196435
```
30bbb214
Fix typo. · 67c0bfea
Eric Christopher authored Dec 04, 2013
```
llvm-svn: 196434
```
67c0bfea
DwarfUnit: Correct comment by generalizing over all units, not just compilation units. · 6896e190
David Blaikie authored Dec 04, 2013
```
Code review feedback on r196394 by Paul Robinson.

llvm-svn: 196433
```
6896e190

Fix a bug in darwin's 32-bit X86 handling of evaluating fixups. · 86496a45

Kevin Enderby authored Dec 04, 2013

Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.

The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry.  The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.

rdar://15526046

llvm-svn: 196432

86496a45

Update comment. · ad10cb51
Eric Christopher authored Dec 04, 2013
```
llvm-svn: 196431
```
ad10cb51
Update comment. · 5d008fed
Eric Christopher authored Dec 04, 2013
```
llvm-svn: 196430
```
5d008fed