Commits · 59cb7782cb5afa594b2dd3ed37ffc5e8264a7f4c · Roger Ferrer / llvm-epi

Apr 19, 2017

Allow suppressing host and target info in VersionPrinter · 59cb7782

Xin Tong authored Apr 19, 2017

Summary:
VersionPrinter by default outputs information about the Host CPU
and Default target. Printing this information requires linking in
a large amount of data, such as supported target triples as C
strings, which in turn bloats the binary size.

Enable a new CMake option LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO
which controls printing of the host and target info. This allows
the target triple names to be dead-code stripped. This is a nice
win for LLVM clients that wish to minimize their binary size, such
as graphics drivers.

By default this is ON, so there is no change in the default behavior.
Clients who wish to suppress this printing can do so by setting this
option to off via CMake.

A test app on Linux that uses ParseCommandLineOptions() shows a binary
size reduction of 23KB (from 149K to 126K) for a Release build, and 24KB
(from 135K to 111K) in a MinSizeRel build.

Reviewers: klimek, beanz, bogner, chandlerc, compnerd

Reviewed By: compnerd

Patch by pammon (Peter Ammon) !

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30904

llvm-svn: 300630

59cb7782

[AVR] Fix the build · eb24b850
Dylan McKay authored Apr 18, 2017
```
'PointerSize' was renamed to 'CodePointerSize'.

llvm-svn: 300629
```
eb24b850

[XRay][tools] Add option to llvm-xray extract to symbolize functions · 918802be

Dean Michael Berris authored Apr 18, 2017

Summary:
This allows us to, if the symbol names are available in the binary, be
able to provide the function name in the YAML output.

Reviewers: dblaikie, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32153

llvm-svn: 300624

918802be

[ConstantRange] Optimize APInt creation in getSignedMax/getSignedMin. · 88c64f32

Craig Topper authored Apr 18, 2017

We were creating an APInt at the top of these methods that isn't always returned. For ranges wider than 64-bits this results in an allocation and deallocation when its not used.

In getSignedMax we were creating Upper-1 to use in a compare and then creating it again for a return value. The compiler is unable to determine that these can be shared. So help it out and create the Upper-1 in a temporary that can be reused.

This provides a little compile time improvement.

llvm-svn: 300621

88c64f32

[x86] add tests for potential andn optimization; NFC · ff981f92
Sanjay Patel authored Apr 18, 2017
```
llvm-svn: 300617
```
ff981f92
Fix crash in AttributeList::addAttributes, add test · fe64c013
Reid Kleckner authored Apr 18, 2017
```
llvm-svn: 300614
```
fe64c013
Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC · f09c1e34
Sanjoy Das authored Apr 18, 2017
```
I will use this in a later change.

llvm-svn: 300613
```
f09c1e34

Apr 18, 2017

[MemoryBuiltins] Add isMallocOrCallocLikeFn so BasicAA can check for both at the same time · 09bb760b

Craig Topper authored Apr 18, 2017

BasicAA wants to know if a function is either a malloc or calloc like function. Currently we have to check both separately. This means both calls check if its an intrinsic, query TLI, check the nobuiltin attribute, scan the AllocationFnData, etc.

This patch adds a isMallocOrCallocLikeFn so we can go through all of the checks once per call.

This also changes the one other location I saw that called both together.

Differential Revision: https://reviews.llvm.org/D32188

llvm-svn: 300608

09bb760b

[LoopReroll] Prefer hasNUses/hasNUses or more as they're cheaper. NFCI. · 80fe987b
Davide Italiano authored Apr 18, 2017
```
llvm-svn: 300607
```
80fe987b
DAG: Make mayBeEmittedAsTailCall parameter const · 3138075d
Matt Arsenault authored Apr 18, 2017
```
llvm-svn: 300603
```
3138075d
Fix typo · aa31dce3
Matt Arsenault authored Apr 18, 2017
```
llvm-svn: 300597
```
aa31dce3
AMDGPU: Make MFI fields private · 161e2b42
Matt Arsenault authored Apr 18, 2017
```
llvm-svn: 300596
```
161e2b42
[MemoryBuiltins] Use ImmutableCallSite instead of CallSite to remove a... · eae6db0e
Craig Topper authored Apr 18, 2017
```
[MemoryBuiltins] Use ImmutableCallSite instead of CallSite to remove a const_cast and const correct. NFCI

llvm-svn: 300585
```
eae6db0e

NewGVN: Fix memory congruence verification. The return true should be a return... · 9d0042b4

Daniel Berlin authored Apr 18, 2017

NewGVN: Fix memory congruence verification. The return true should be a return false. Merge the appropriate if statements so it doesn't happen again.

llvm-svn: 300584

9d0042b4

[X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64. · 877923a8

Chih-Hung Hsieh authored Apr 18, 2017

Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.

Differential Revision: http://reviews.llvm.org/D32102

llvm-svn: 300583

877923a8

[APInt] Inline the single word case of lshrInPlace similar to what we do for <<=. · ae8bd67d
Craig Topper authored Apr 18, 2017
```
llvm-svn: 300577
```
ae8bd67d
[X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions · 9398649f
Simon Pilgrim authored Apr 18, 2017
```
llvm-svn: 300576
```
9398649f

[SLP vectorizer] Allow phi node reordering in tryToVectorizeList. · 76aba5f6

Easwaran Raman authored Apr 18, 2017

In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

llvm-svn: 300574

76aba5f6

[X86] Use for-range loop. NFCI. · e8ad1da4
Simon Pilgrim authored Apr 18, 2017
```
llvm-svn: 300567
```
e8ad1da4

[APInt] Use lshrInPlace to replace lshr where possible · fc947bcf

Craig Topper authored Apr 18, 2017

This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155

llvm-svn: 300566

fc947bcf

NewGVN: Don't waste time value numbering unreachable blocks · ec9deb7f
Daniel Berlin authored Apr 18, 2017
```
llvm-svn: 300565
```
ec9deb7f

[DAG] Improve store merge candidate pruning. · 855ef456

Nirav Dave authored Apr 18, 2017

Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

llvm-svn: 300561

855ef456

Add base-index-based store merge test · e50544cd
Nirav Dave authored Apr 18, 2017
```
llvm-svn: 300559
```
e50544cd
LoopRerollPass: Prefer Value::hasOneUse() over Value::getNumUses(). NFC. · d942397e
Zvi Rackover authored Apr 18, 2017
```
getNumUses() can be more expensive as it iterates over all list's elements.

llvm-svn: 300558
```
d942397e

[LV] Cache block mask values · fb1d915a

Gil Rapaport authored Apr 18, 2017

This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054

llvm-svn: 300557

fb1d915a

[ConstantRange] fix doxygen comment formatting; NFC · 78d163c7
Sanjay Patel authored Apr 18, 2017
```
llvm-svn: 300554
```
78d163c7

Make globalaa-retained.ll test catching more cases. · 95fc6441

Nikolai Bozhenov authored Apr 18, 2017

Summary:
* Add checks for store. That is needed because GlobalsAA is called
  twice in the current pipeline with different sets of Function passes
  following it. However, the loads are eliminated using instcombine
  which happens everywhere. On the other hand, DeadStoreElimination is
  performed only once so by checking for store we'll be able to catch
  more cases when GlobalsAA is invalidated unintentionally.
* Add empty function above/below the test so that we don't depend on
  the relative order of instcombine/dead-store-elimination and the
  pass that invalidates the analysis (inside the same
  FunctionPassManager).

Reviewers: kristof.beyls

Reviewed By: kristof.beyls

Subscribers: llvm-commits, n.bozhenov

Differential Revision: https://reviews.llvm.org/D32015
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

llvm-svn: 300553

95fc6441

[GVNHoist] Mark GlobalsAA as preserved by GVNHoist. · 9e4a1c39

Nikolai Bozhenov authored Apr 18, 2017

Reviewers: sebpop, hiraditya

Reviewed By: sebpop

Subscribers: n.bozhenov, llvm-commits

Differential Revision: https://reviews.llvm.org/D32158
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

llvm-svn: 300552

9e4a1c39

Add store Merge test. · b9776849
Nirav Dave authored Apr 18, 2017
```
llvm-svn: 300551
```
b9776849

[ARM] Add hardware build attributes in assembler · 7ad2e8aa

Oliver Stannard authored Apr 18, 2017

In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.

This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.

If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.

This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.

The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.

Differential revision: https://reviews.llvm.org/D31812

llvm-svn: 300547

7ad2e8aa

[ARM] GlobalISel: Add support for G_SUB · a3a0cccb

Diana Picus authored Apr 18, 2017

Support G_SUB throughout the GlobalISel pipeline. It is exactly the same
as G_ADD, nothing fancy.

llvm-svn: 300546

a3a0cccb

[SampleProfile] Don't assert when printing the DebugLoc of a branch. NFC. · 517e3fc3
Andrea Di Biagio authored Apr 18, 2017
```
llvm-svn: 300544
```
517e3fc3

[SampleProfile] Skip intrinsic calls when visiting callsites in InlineHotFunctions. · e3edef09

Andrea Di Biagio authored Apr 18, 2017

Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D32008

llvm-svn: 300541

e3edef09

Revert "[GlobalISel] Support vector-of-pointers in LLT" · a4e79cca

Kristof Beyls authored Apr 18, 2017

This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:

remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)

Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.

llvm-svn: 300538

a4e79cca

Fix gcc build after r300535. · c10e6250
Kristof Beyls authored Apr 18, 2017
```
llvm-svn: 300537
```
c10e6250

[ARM] Check for correct HW div when lowering divmod · e2626bb7

Diana Picus authored Apr 18, 2017

For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.

However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:

bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
                                      : Subtarget->hasDivideInARMMode();

In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).

This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.

Differential Revision: https://reviews.llvm.org/D32005

llvm-svn: 300536

e2626bb7

[GlobalISel] Support vector-of-pointers in LLT · fb73eb03

Kristof Beyls authored Apr 18, 2017

This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

llvm-svn: 300535

fb73eb03

test commit · d6fe0db8
Leslie Zhai authored Apr 18, 2017
```
llvm-svn: 300532
```
d6fe0db8

[APInt] Cleanup the reverseBits slow case a little. · 9eaef075

Craig Topper authored Apr 18, 2017

Use lshrInPlace. Use single bit extract and operator|=(uint64_t) to avoid a few temporary APInts.

llvm-svn: 300527

9eaef075

[APInt] Make operator<<= shift in place. Improve the implementation of... · a8a4f0db

Craig Topper authored Apr 18, 2017

[APInt] Make operator<<= shift in place. Improve the implementation of tcShiftLeft and use it to implement operator<<=.

llvm-svn: 300526

a8a4f0db