Commits · 01cfa94212a1e352145319554047aea3a637d709 · Lorenzo Albano / LLVM bpEVL

Dec 05, 2013

Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) · 01cfa942
Yi Jiang authored Dec 05, 2013
```
llvm-svn: 196544
```
01cfa942

Renato Golin authored Dec 05, 2013

Test is platform independent, but I don't want to force vector-width, or
that could spoil the pragma test.

llvm-svn: 196539

e593fea5

Add #pragma vectorize enable/disable to LLVM · 729a3ae9

Renato Golin authored Dec 05, 2013

The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537

729a3ae9

llvm-cov: Changed extension from .llcov to .gcov. · 9af3938b
Yuchen Wu authored Dec 05, 2013
```
llvm-svn: 196530
```
9af3938b

MI-Sched: handle latency of in-order operations with the new machine model. · 880e573d

Andrew Trick authored Dec 05, 2013

The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

llvm-svn: 196516

880e573d

SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use · 7ee53cac

Arnold Schwaighofer authored Dec 05, 2013

We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508

7ee53cac

[tsan] fix PR18146: sometimes a variable written into vptr could have an... · 2460c3fc

Kostya Serebryany authored Dec 05, 2013

[tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations)

llvm-svn: 196507

2460c3fc

[NVPTX] Fix off-by-one error when creating the VT list for an SDNode · 4459717b
Justin Holewinski authored Dec 05, 2013
```
llvm-svn: 196503
```
4459717b

[mips] Small code generation improvement for conditional operator (select) · a6beac1a

Matheus Almeida authored Dec 05, 2013

in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.

The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.

llvm-svn: 196498

a6beac1a

[mips][msa] Fix issue with immediate fields of LD/ST instructions · 6b59c449

Matheus Almeida authored Dec 05, 2013

not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.

llvm-svn: 196494

6b59c449

ARM: fix yet another stack-folding bug · e4def5e2

Tim Northover authored Dec 05, 2013

We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.

Should fix PR18136.

llvm-svn: 196493

e4def5e2

Correct word hyphenations · f907b891

Alp Toker authored Dec 05, 2013

This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471

f907b891

Hide the stub created for MO_ExternalSymbol too. · 01d19d02

Rafael Espindola authored Dec 05, 2013

given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468

01d19d02

R600/SI: Add comments for number of used registers. · 89cc49fe
Matt Arsenault authored Dec 05, 2013
```
llvm-svn: 196467
```
89cc49fe
Move llvm/test/MC/ELF/thumb-st_other.s to test/MC/ARM. · 57b20a7e
NAKAMURA Takumi authored Dec 05, 2013
```
llvm-svn: 196457
```
57b20a7e
For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64. · 65d8e342
Jiangning Liu authored Dec 05, 2013
```
llvm-svn: 196456
```
65d8e342
Add FileCheck statements for r196435. · 164097a6
Cameron McInally authored Dec 05, 2013
```
llvm-svn: 196449
```
164097a6
Make these two tests resilient in the face of compile unit size · c4dd56b9
Eric Christopher authored Dec 05, 2013
```
changes.

llvm-svn: 196444
```
c4dd56b9

[mc] Fix ELF st_other flag. · ee36595c

Logan Chien authored Dec 05, 2013

ELF_Other_Weakref and ELF_Other_ThumbFunc seems to be LLVM
internal ELF symbol flags.  These should not be emitted to
object file.

This commit defines ELF_STO_Shift for the target-defined
flags for st_other, and increase the value of
ELF_Other_Shift to 16.

llvm-svn: 196440

ee36595c

Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load. · 30bbb214
Cameron McInally authored Dec 05, 2013
```
Patch by Aleksey Bader.

llvm-svn: 196435
```
30bbb214

Fix a bug in darwin's 32-bit X86 handling of evaluating fixups. · 86496a45

Kevin Enderby authored Dec 04, 2013

Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.

The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry.  The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.

rdar://15526046

llvm-svn: 196432

86496a45

Dec 04, 2013

Add support for parsing ARM symbol variants on ELF targets · 8ad70b35

David Peixotto authored Dec 04, 2013

ARM symbol variants are written with parens instead of @ like this:

  .word __GLOBAL_I_a(target1)

This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.

The MCAsmInfo parens flag is enabled only for ARM on ELF.

By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.

To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.

Updated Tests:
  Changed case of symbol variant to match the generic kind.
  test/CodeGen/ARM/tls-models.ll
  test/CodeGen/ARM/tls1.ll
  test/CodeGen/ARM/tls2.ll
  test/CodeGen/Thumb2/tls1.ll
  test/CodeGen/Thumb2/tls2.ll

PR18080

llvm-svn: 196424

8ad70b35

DebugInfo: Improve test to use llvm-dwarfdump · 6a439adf
David Blaikie authored Dec 04, 2013
```
llvm-svn: 196396
```
6a439adf
Test fix for r196394 · 2deb6fd6
David Blaikie authored Dec 04, 2013
```
llvm-svn: 196395
```
2deb6fd6
Fix assembly syntax for AVX512 vector blend instructions. · cbb51dac
Cameron McInally authored Dec 04, 2013
```
llvm-svn: 196393
```
cbb51dac

Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with... · c5f420e1

Cameron McInally authored Dec 04, 2013

Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.

Patch by Aleksey Bader.

llvm-svn: 196386

c5f420e1

Un-revert r196358: "llvm-cov: Added support for function checksums." · 87a24d5c
Daniel Jasper authored Dec 04, 2013
```
And add the proper fix.

llvm-svn: 196367
```
87a24d5c

Revert r196358: "llvm-cov: Added support for function checksums." · c176b5d1

Daniel Jasper authored Dec 04, 2013

This currently breaks clang/test/CodeGen/code-coverage.c. The root cause
is that the newly introduced access to Funcs[j] is out of bounds.

llvm-svn: 196365

c176b5d1

[AArch64 Neon] Add ACLE intrinsic vceqz_f64. · afd095de
Kevin Qin authored Dec 04, 2013
```
llvm-svn: 196362
```
afd095de
[AArch64 NEON] Add missing compare intrinsics. · f9832e8d
Kevin Qin authored Dec 04, 2013
```
llvm-svn: 196360
```
f9832e8d

llvm-cov: Added support for function checksums. · 06655f35

Yuchen Wu authored Dec 04, 2013

The function checksums are hashed from the concatenation of the function
name and line number.

llvm-svn: 196358

06655f35

Produce deterministic coff files. · 9201e6b5
Rafael Espindola authored Dec 04, 2013
```
llvm-svn: 196341
```
9201e6b5
Add -mcpu=core2 to all llc invocations in this test. · 757ae647
Rafael Espindola authored Dec 04, 2013
```
Should fix the atom buildbot.

llvm-svn: 196340
```
757ae647
[Stackmap] Specify the triple and cpu to fix the unit test. · c4c9b371
Juergen Ributzka authored Dec 04, 2013
```
llvm-svn: 196339
```
c4c9b371
[Stackmap] Emit multi-byte nops for X86. · 17e0d9ee
Juergen Ributzka authored Dec 04, 2013
```
llvm-svn: 196334
```
17e0d9ee

final patch for very long conditional branches for mips16 constant islands. · 59975c2c

Reed Kotler authored Dec 03, 2013

this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.

llvm-svn: 196331

59975c2c

check-llvm: Ask llvm-config about assertion mode, instead of llc. · 303f0f5a
NAKAMURA Takumi authored Dec 03, 2013
```
Add --assertion-mode to llvm-config. It emits ON or OFF according to NDEBUG.

llvm-svn: 196329
```
303f0f5a

Dec 03, 2013

Use CHECK-LABEL to make this test more strict. · f4d34153
Rafael Espindola authored Dec 03, 2013
```
llvm-svn: 196321
```
f4d34153

Fix mingw32 thiscall + sret. · 0a2baf8e

Rafael Espindola authored Dec 03, 2013

Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

llvm-svn: 196312

0a2baf8e

llvm-cov: Another fix to llvm-cov test. · c8e0f81a

Yuchen Wu authored Dec 03, 2013

Copy all test files to temporary directory, not just test.* files. Tests
didn't fail because the missing files occurred in XFAILS.

llvm-svn: 196305

c8e0f81a