Commits · 01cfa94212a1e352145319554047aea3a637d709 · Roger Ferrer / llvm-epi-0.8

Dec 05, 2013

Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) · 01cfa942
Yi Jiang authored Dec 05, 2013
```
llvm-svn: 196544
```
01cfa942
Implemented vget/vset_lane_f16 intrinsics · 6b0a8c50
Ana Pazos authored Dec 05, 2013
```
llvm-svn: 196533
```
6b0a8c50

MI-Sched: handle latency of in-order operations with the new machine model. · 880e573d

Andrew Trick authored Dec 05, 2013

The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516

880e573d

Fix the A9 machine model. VTRN writes two registers. · ff199a4b
Andrew Trick authored Dec 05, 2013
```
llvm-svn: 196514
```
ff199a4b
Add a default constructor to get deterministic behavior. · 4cc2b873
Rafael Espindola authored Dec 05, 2013
```
Should fix the msan and valgrind bots.

llvm-svn: 196509
```
4cc2b873
[NVPTX] Fix off-by-one error when creating the VT list for an SDNode · 4459717b
Justin Holewinski authored Dec 05, 2013
```
llvm-svn: 196503
```
4459717b

[mips] Small code generation improvement for conditional operator (select) · a6beac1a

Matheus Almeida authored Dec 05, 2013

in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.

The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.

llvm-svn: 196498

a6beac1a

[mips] Add some comments related to the optimization performed in performSELECTCombine. · a611c0f4
Matheus Almeida authored Dec 05, 2013
```
The structure of the code was slightly modified so that the next patch is easier to read/review.

No functional changes.

llvm-svn: 196496
```
a611c0f4

[mips][msa] Fix issue with immediate fields of LD/ST instructions · 6b59c449

Matheus Almeida authored Dec 05, 2013

not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.

llvm-svn: 196494

6b59c449

ARM: fix yet another stack-folding bug · e4def5e2

Tim Northover authored Dec 05, 2013

We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493

e4def5e2

Remove the isImplicitlyPrivate argument of getNameWithPrefix. · 117b20c4

Rafael Espindola authored Dec 05, 2013

getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.

This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).

llvm-svn: 196472

117b20c4

Correct word hyphenations · f907b891

Alp Toker authored Dec 05, 2013

This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471

f907b891

Hide the stub created for MO_ExternalSymbol too. · 01d19d02

Rafael Espindola authored Dec 05, 2013

given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468

01d19d02

R600/SI: Add comments for number of used registers. · 89cc49fe
Matt Arsenault authored Dec 05, 2013
```
llvm-svn: 196467
```
89cc49fe
For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64. · 65d8e342
Jiangning Liu authored Dec 05, 2013
```
llvm-svn: 196456
```
65d8e342
Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load. · 30bbb214
Cameron McInally authored Dec 05, 2013
```
Patch by Aleksey Bader.

llvm-svn: 196435
```
30bbb214

Fix a bug in darwin's 32-bit X86 handling of evaluating fixups. · 86496a45

Kevin Enderby authored Dec 04, 2013

Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.

The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry.  The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.

rdar://15526046

llvm-svn: 196432

86496a45

Dec 04, 2013

Add support for parsing ARM symbol variants on ELF targets · 8ad70b35

David Peixotto authored Dec 04, 2013

ARM symbol variants are written with parens instead of @ like this:

  .word __GLOBAL_I_a(target1)

This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.

The MCAsmInfo parens flag is enabled only for ARM on ELF.

By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.

To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.

Updated Tests:
  Changed case of symbol variant to match the generic kind.
  test/CodeGen/ARM/tls-models.ll
  test/CodeGen/ARM/tls1.ll
  test/CodeGen/ARM/tls2.ll
  test/CodeGen/Thumb2/tls1.ll
  test/CodeGen/Thumb2/tls2.ll

PR18080

llvm-svn: 196424

8ad70b35

Fix assembly syntax for AVX512 vector blend instructions. · cbb51dac
Cameron McInally authored Dec 04, 2013
```
llvm-svn: 196393
```
cbb51dac

[X86] Check YMM31/ZMM31 as well · 9a0e3f48

Michael Liao authored Dec 04, 2013

- No test case as there's no calling convention preserve YMM31/ZMM31 only

llvm-svn: 196391

9a0e3f48

Update the UseFusedMAC definition to directly specify its dependence on having · 1d22b5d1
Chad Rosier authored Dec 04, 2013
```
VFP4.
Patch by Daniel Stewart!

llvm-svn: 196390
```
1d22b5d1

Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with... · c5f420e1

Cameron McInally authored Dec 04, 2013

Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.

Patch by Aleksey Bader.

llvm-svn: 196386

c5f420e1

[AArch64 Neon] Add ACLE intrinsic vceqz_f64. · afd095de
Kevin Qin authored Dec 04, 2013
```
llvm-svn: 196362
```
afd095de
[AArch64 NEON] Add missing compare intrinsics. · f9832e8d
Kevin Qin authored Dec 04, 2013
```
llvm-svn: 196360
```
f9832e8d
[Stackmap] Emit multi-byte nops for X86. · 17e0d9ee
Juergen Ributzka authored Dec 04, 2013
```
llvm-svn: 196334
```
17e0d9ee

final patch for very long conditional branches for mips16 constant islands. · 59975c2c

Reed Kotler authored Dec 03, 2013

this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.

llvm-svn: 196331

59975c2c

Dec 03, 2013

Fix mingw32 thiscall + sret. · 0a2baf8e

Rafael Espindola authored Dec 03, 2013

Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

llvm-svn: 196312

0a2baf8e

Addrspacecasts are no-ops on ARM. · 8a25992f
James Molloy authored Dec 03, 2013
```
Testcase added.

llvm-svn: 196269
```
8a25992f

[SystemZ] Fix choice of known-zero mask in insertion optimization · ccc2a7c1

Richard Sandiford authored Dec 03, 2013

The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.

Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy.  The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.

llvm-svn: 196267

ccc2a7c1

Enhance the fix of PR17631 · 14b02848

Michael Liao authored Dec 03, 2013

- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
  not be issued before 'call' insn. There're other cases where helper
  calls will be inserted not limited to epilog. These helper calls do
  not follow the standard calling convention and won't clobber any YMM
  registers. (So far, all call conventions will clobber any or part of
  YMM registers.)
  This patch enhances the previous fix to cover more cases 'vzerosupper' should
  not be inserted by checking if that function call won't clobber any YMM
  registers and skipping it if so.

llvm-svn: 196261

14b02848

[AArch64]Add missing floating point convert, round and misc intrinsics. · dca64f4a
Hao Liu authored Dec 03, 2013
```
E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn

llvm-svn: 196210
```
dca64f4a
AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions. · c250cbc0
Hao Liu authored Dec 03, 2013
```
E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm.

llvm-svn: 196208
```
c250cbc0
Whitespace. · bc815b2d
NAKAMURA Takumi authored Dec 03, 2013
```
llvm-svn: 196203
```
bc815b2d

AArch64: Add missing scalar pair intrinsics. · 21a46135

Hao Liu authored Dec 03, 2013

E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s".

llvm-svn: 196198

21a46135

Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends. · 3a541d46
Jiangning Liu authored Dec 03, 2013
```
llvm-svn: 196192
```
3a541d46
Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends. · 94a7bb21
Jiangning Liu authored Dec 03, 2013
```
llvm-svn: 196190
```
94a7bb21
Don't set PrivateGlobalPrefix for NVPTX and R600. · 20a8621e
Rafael Espindola authored Dec 03, 2013
```
These targets have special asm printers that don't use these.

llvm-svn: 196187
```
20a8621e

Remove PPCScoreboardHazardRecognizer · 563cc05c

Hal Finkel authored Dec 02, 2013

PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.

llvm-svn: 196171

563cc05c

Refactor the setting of PrivateGlobalPrefix. · 5113d166
Rafael Espindola authored Dec 02, 2013
```
No functionality change.

llvm-svn: 196170
```
5113d166
Don't set PrivateGlobalPrefix twice in the same function. · 5733d9bb
Rafael Espindola authored Dec 02, 2013
```
llvm-svn: 196169
```
5733d9bb