Commits · fbfd173447b64d9d73c44cf47bafeb68e8139e5a · Roger Ferrer / llvm-epi

Oct 19, 2016

[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables · fbfd1734

James Molloy authored Oct 19, 2016

The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

llvm-svn: 284580

fbfd1734

[DAGCombiner] Just call isConstOrConstSplat directly. NFCI. · 7dcb6e57

Simon Pilgrim authored Oct 19, 2016

This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.

llvm-svn: 284578

7dcb6e57

Fix line endings · 9122ac99
Simon Pilgrim authored Oct 19, 2016
```
llvm-svn: 284576
```
9122ac99

[DAGCombine] Generalize distributeTruncateThroughAnd to work with any... · b2ca2505

Simon Pilgrim authored Oct 19, 2016

[DAGCombine] Generalize distributeTruncateThroughAnd to work with any non-opaque constant or constant vector

llvm-svn: 284574

b2ca2505

Revert of r284571 because of failing tests. · 3f5111d3
Sjoerd Meijer authored Oct 19, 2016
```
llvm-svn: 284572
```
3f5111d3

Checking FP function attribute values and adding more build attribute tests. · a3187792

Sjoerd Meijer authored Oct 19, 2016

This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).

Differential Revision: https://reviews.llvm.org/D25625

llvm-svn: 284571

a3187792

DenseSet: Appease msc18 to define derived constructors explicitly. · da9dc6ad

NAKAMURA Takumi authored Oct 19, 2016

msc18 doesn't recognize "using BaseT::BaseT;"

llvm\include\llvm/ADT/DenseSet.h(213) : error C2875: using-declaration causes a multiple declaration of 'BaseT'
llvm\include\llvm/ADT/DenseSet.h(214) : see reference to class template instantiation 'llvm::DenseSet<ValueT,ValueInfoT>' being compiled
llvm\include\llvm/ADT/DenseSet.h(231) : error C2875: using-declaration causes a multiple declaration of 'BaseT'
llvm\include\llvm/ADT/DenseSet.h(232) : see reference to class template instantiation 'llvm::SmallDenseSet<ValueT,InlineBuckets,ValueInfoT>' being compiled

llvm-svn: 284570

da9dc6ad

[AVX-512] Teach isel lowering that a subvector broadcast being inserted into... · a4dc340c

Craig Topper authored Oct 19, 2016

[AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast.

Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.

New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.

There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.

The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.

Reviewers: RKSimon, delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25651

llvm-svn: 284567

a4dc340c

Update the section.ll to fix non-x86 failure. · 4b41571d
Dehao Chen authored Oct 19, 2016
```
llvm-svn: 284566
```
4b41571d

Revert r284545 again as the regression in ppc still exists. There is bug in... · 95fc4314

Dehao Chen authored Oct 19, 2016

Revert r284545 again as the regression in ppc still exists. There is bug in MBPI exposed by th patch.

Also update the section.ll to fix non-x86 failure.

llvm-svn: 284563

95fc4314

[asan] Replace std::to_string with llvm::to_string · 490fda33
Vitaly Buka authored Oct 19, 2016
```
llvm-svn: 284557
```
490fda33

[libFuzzer] extend -print_coverage to also print uncovered lines, functions, and files. · 95b1a434

Kostya Serebryany authored Oct 19, 2016

Example of output:
COVERAGE:
COVERED: in DSO2(int) /pathto/DSO2.cpp:6
COVERED: in DSO2(int) /pathto/DSO2.cpp:8
COVERED: in DSO1(int) /pathto/DSO1.cpp:6
COVERED: in DSO1(int) /pathto/DSO1.cpp:8
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:16
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:19
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:25
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:26
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO1.so
UNCOVERED_LINE: in DSO1(int) /pathto/DSO1.cpp:9
UNCOVERED_FUNC: in Uncovered1()
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO2.so
UNCOVERED_LINE: in DSO2(int) /pathto/DSO2.cpp:9
UNCOVERED_FUNC: in Uncovered2()
MODULE_WITH_COVERAGE: /pathto/LLVMFuzzer-DSOTest
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:21
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:27
UNCOVERED_FILE: /pathto/DSOTestExtra.cpp

Several things are not perfect here:
* we are using objdump+awk instead of sancov because sancov does not support DSOs yet.
* this breaks in the presence of ASAN_OPTIONS=strip_path_prefix=...
  (need to implement another API to get the module name by PC)

llvm-svn: 284554

95b1a434

[asan] Simplify calculation of stack frame layout extraction calculation of... · 5910a925

Vitaly Buka authored Oct 18, 2016

[asan] Simplify calculation of stack frame layout extraction calculation of stack description into separate function.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25754

llvm-svn: 284547

5910a925

[asan] Append line number to variable name if line is available and in the... · d88e5201

Vitaly Buka authored Oct 18, 2016

[asan] Append line number to variable name if line is available and in the same file as the function.

PR30498

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D25715

llvm-svn: 284546

d88e5201

Using branch probability to guide critical edge splitting. · f8ac3d26

Dehao Chen authored Oct 18, 2016

Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.

The performance impact on speccpu2006 on Intel sandybridge machines:

spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%

geometric mean +0.20%

Manually checked 473 and 471 to verify the diff is in the noise range.

Reviewers: rengolin, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24818

llvm-svn: 284545

f8ac3d26

revert r284541. · 62d0e64e
Dehao Chen authored Oct 18, 2016
```
llvm-svn: 284544
```
62d0e64e

Oct 18, 2016

Conditionally eliminate library calls where the result value is not used · 1c0e9b97

Rong Xu authored Oct 18, 2016

Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
   sqrt(val);
 is transformed to
   if (val < 0)
     sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.

These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.

Reviewers: hfinkel, mehdi_amini, davidxl

Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz

Differential Revision: https://reviews.llvm.org/D24414

llvm-svn: 284542

1c0e9b97

Using branch probability to guide critical edge splitting. · ea62ae98

Dehao Chen authored Oct 18, 2016

The performance impact on speccpu2006 on Intel sandybridge machines:

geometric mean +0.20%

Manually checked 473 and 471 to verify the diff is in the noise range.

Reviewers: rengolin, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24818

llvm-svn: 284541

ea62ae98

dwarfdump: add space missing from the type unit header description · 69494a98
David Blaikie authored Oct 18, 2016
```
llvm-svn: 284540
```
69494a98
dwarfdump: Include the name in the unit description, even in non-summarized mode · e4c3915a
David Blaikie authored Oct 18, 2016
```
(accidentally removed this from my previous change when I was rejecting
some clang-format formatting... )

llvm-svn: 284539
```
e4c3915a
Add target for test to fix regression introduced by r284533. · 83033e0b
Dehao Chen authored Oct 18, 2016
```
llvm-svn: 284538
```
83033e0b

dwarfdump: -summarize-types: print a short summary (unqualified type name,... · 50cc27ec

David Blaikie authored Oct 18, 2016

dwarfdump: -summarize-types: print a short summary (unqualified type name, hash, length) of type units rather than dumping contents

This is just a quick utility handy for getting rough summaries of types
in a given object or dwo file. I've been using it to investigate the
amount of type info redundancy across a project build, for example.

llvm-svn: 284537

50cc27ec

Improve ARM lowering for "icmp <2 x i64> eq". · c0a717ba

Eli Friedman authored Oct 18, 2016

The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.

Differential Revision: https://reviews.llvm.org/D25713

llvm-svn: 284536

c0a717ba

[GVN] Consistently use division instead of shift. NFCI. · 36efa684
Davide Italiano authored Oct 18, 2016
```
This is in line with other places of GVN (e.g. load coercion
logic).

llvm-svn: 284535
```
36efa684
[GVN] Remove dead code. NFC. · 64cd985e
Davide Italiano authored Oct 18, 2016
```
llvm-svn: 284534
```
64cd985e

Use profile info to set function section prefix to group hot/cold functions. · 302b69c9

Dehao Chen authored Oct 18, 2016

Summary:
The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation:
1. add a new metadata for function section prefix
2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function
3. output the section prefix set by CGP

Reviewers: davidxl, eraman

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D24989

llvm-svn: 284533

302b69c9

[AArch64] Fix test triplet · 4dd6c68d
Evandro Menezes authored Oct 18, 2016
```
llvm-svn: 284532
```
4dd6c68d

[AArch64] Avoid materializing 0.0 when generating FP SELECT · ce8d6015

Evandro Menezes authored Oct 18, 2016

Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.

Differential Revision: https://reviews.llvm.org/D24808

llvm-svn: 284531

ce8d6015

One more additional error check for invalid Mach-O files for a · 89baf99c

Kevin Enderby authored Oct 18, 2016

load command that use the MachO:: linkedit_data_command
type but is not used in llvm libObject code but used in llvm tool code.

This is for the LC_CODE_SIGNATURE load command.

llvm-svn: 284529

89baf99c

GlobalISel: translate the @llvm.objectsize intrinsic. · 6e904300
Tim Northover authored Oct 18, 2016
```
llvm-svn: 284527
```
6e904300

GlobalISel: select small binary operations on AArch64. · 55782222

Tim Northover authored Oct 18, 2016

AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.

llvm-svn: 284526

55782222

GlobalISel: translate memcpy intrinsics. · 3f18603c
Tim Northover authored Oct 18, 2016
```
llvm-svn: 284525
```
3f18603c

Remove unused typedef · 0f7f030c

Mandeep Singh Grang authored Oct 18, 2016

Summary: Unused: typedef SmallSetVector<RegionT *, 4> RegionSet

Reviewers: MatzeB, grosser

Subscribers: zinob

Differential Revision: https://reviews.llvm.org/D25744

llvm-svn: 284524

0f7f030c

GlobalISel: support floating-point constants on AArch64. · 4494d698
Tim Northover authored Oct 18, 2016
```
Patch from Ahmed Bougacha.

llvm-svn: 284523
```
4494d698
[Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges · 5bb417be
Krzysztof Parzyszek authored Oct 18, 2016
```
llvm-svn: 284522
```
5bb417be
Reduce global namespace pollution. NFC. · 4c2582ad
Benjamin Kramer authored Oct 18, 2016
```
llvm-svn: 284521
```
4c2582ad
[esan] Remove global variable. · ee042234
Benjamin Kramer authored Oct 18, 2016
```
It's not thread safe and completely unnecessary.

llvm-svn: 284520
```
ee042234

[X86][SSE] Added vector lshr/shl combine tests · 476560aa

Simon Pilgrim authored Oct 18, 2016

This doesn't cover all combines in DAGCombiner::visitSRL/visitSHL yet, but identifies several cases where we fail to combine vectors (or non-splatted) vectors

llvm-svn: 284518

476560aa

[InterleavedAccessPass] Remove global variable. · 1e425c9f

Benjamin Kramer authored Oct 18, 2016

This is a threading hazard and rightfully complained about by tsan. No
functionality change.

llvm-svn: 284515

1e425c9f

[libFuzzer] detect leaks after every run when executing fixed inputs (./fuzzer... · bb59ef77
Kostya Serebryany authored Oct 18, 2016
```
[libFuzzer] detect leaks after every run when executing fixed inputs (./fuzzer -runs=1000000 my-file)

llvm-svn: 284514
```
bb59ef77