Commits · fa7278f18f7e91ad2dc4a0e687d0c67e427b608c · Roger Ferrer / llvm-epi

Apr 26, 2014

Trivial test commit. · fa7278f1
Dan Liew authored Apr 26, 2014
```
llvm-svn: 207328
```
fa7278f1
Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>. · 48d114be
Craig Topper authored Apr 26, 2014
```
llvm-svn: 207327
```
48d114be

Remove an unused version of getMemIntrinsicNode and getNode. Additionally,... · 963c5d5e

Craig Topper authored Apr 26, 2014

Remove an unused version of getMemIntrinsicNode and getNode. Additionally, these were calling makeVTList with the pointers passed in which would were unlikely to belong to SelectionDAG and likely would have just been stack pointers.

llvm-svn: 207326

963c5d5e

Include C++ source for debug info test case committed in r207323 · 9c34526c
David Blaikie authored Apr 26, 2014
```
llvm-svn: 207324
```
9c34526c

DWARF Type Units: Avoid emitting type units under fission if the type requires an address. · e12b49a6

David Blaikie authored Apr 26, 2014

Since there's no way to ensure the type unit in the .dwo and the type
unit skeleton in the .o are correlated, this cannot work.

This implementation is a bit inefficient for a few reasons, called out
in comments.

llvm-svn: 207323

e12b49a6

Print X86ISD::PMULDQ nodes properly in debug output. · c2ad8f3e
Benjamin Kramer authored Apr 26, 2014
```
llvm-svn: 207322
```
c2ad8f3e

DwarfDebug: Minor refactoring around type unit construction · f3de2ab4

David Blaikie authored Apr 26, 2014

Sinking addition of the declaration attribute down to where the
signature is added. So that if the signature is not added neither is the
declaration attribute (this will come in handy when aborting type unit
construction to instead emit the type into the CU directly in some
cases)

Pull out type unit identifier hashing just to simplify the function a
little, it'll be getting longer.

llvm-svn: 207321

f3de2ab4

X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now. · 7c372272
Benjamin Kramer authored Apr 26, 2014
```
Turn vectorization back on.

llvm-svn: 207320
```
7c372272
X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. · 6d2dff61
Benjamin Kramer authored Apr 26, 2014
```
llvm-svn: 207318
```
6d2dff61

X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. · c9827ab1

Benjamin Kramer authored Apr 26, 2014

This gets us pretty code for divs of i16 vectors. Turn the existing
intrinsics into the corresponding nodes.

llvm-svn: 207317

c9827ab1

Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner... · ad016870
Benjamin Kramer authored Apr 26, 2014
```
Rip out X86-specific vector SDIV lowering, make the corresponding DAGCombiner transform work on vectors.

llvm-svn: 207316
```
ad016870

DAGCombiner: Turn divs of vector splats into vectorized multiplications. · 4dae598b

Benjamin Kramer authored Apr 26, 2014

Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315

4dae598b

X86: Custom lower v4i32 UMUL_LOHI into 2 pmuludqs. · 29139d5c
Benjamin Kramer authored Apr 26, 2014
```
Test will follow soon.

llvm-svn: 207314
```
29139d5c
Revert r206749 till a final decision about the intrinsics is made. · 1a97a7bc
Michael Zolotukhin authored Apr 26, 2014
```
llvm-svn: 207313
```
1a97a7bc

[LCG] Rather than removing nodes from the SCC entry set when we process · 90821c2a

Chandler Carruth authored Apr 26, 2014

them, just skip over any DFS-numbered nodes when finding the next root
of a DFS. This allows the entry set to just be a vector as we populate
it from a uniqued source. It also removes the possibility for a linear
scan of the entry set to actually do the removal which can make things
go quadratic if we get unlucky.

llvm-svn: 207312

90821c2a

[LCG] Rotate the full SCC finding algorithm to avoid round-trips through · 5e2d70b9

Chandler Carruth authored Apr 26, 2014

the DFS stack for leaves in the call graph. As mentioned in my previous
commit, this is particularly interesting for graphs which have high fan
out but low connectivity resulting in many leaves. For such graphs, this
can remove a large % of the DFS stack traffic even though it doesn't
make the stack much smaller.

It's a bit easier to formulate this for the full algorithm because that
one stops completely for each SCC. For example, I was able to directly
eliminate the "Recurse" boolean used to continue an outer loop from the
inner loop.

llvm-svn: 207311

5e2d70b9

[LCG] Hoist the main DFS loop out of the edge removal function. This · aca48d04

Chandler Carruth authored Apr 26, 2014

makes working through the worklist much cleaner, and makes it possible
to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge
difference, but I think this is approaching as polished as I can make
it.

llvm-svn: 207310

aca48d04

RecursivelyDeleteTriviallyDeadInstructions() could remove · af7a87d2

Gerolf Hoflehner authored Apr 26, 2014

more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

Repaired r207302.

llvm-svn: 207309

af7a87d2

Restore CloneFunction.cpp which got accidently · 1da7cbd5
Gerolf Hoflehner authored Apr 26, 2014
```
overwritten by previous backout of r207303

llvm-svn: 207308
```
1da7cbd5

[LCG] In the incremental SCC re-formation, lift the node currently being · 680af7a7

Chandler Carruth authored Apr 26, 2014

processed in the DFS out of the stack completely. Keep it exclusively in
a variable. Re-shuffle some code structure to make this easier. This can
have a very dramatic effect in some cases because call graphs tend to
look like a high fan-out spanning tree. As a consequence, there are
a large number of leaf nodes in the graph, and this technique causes
leaf nodes to never even go into the stack. While this only reduces the
max depth by 1, it may cause the total number of round trips through the
stack to drop by a lot.

Now, most of this isn't really relevant for the incremental version. =]
But I wanted to prototype it first here as this variant is in ways more
complex. As long as I can get the code factored well here, I'll next
make the primary walk look the same. There are several refactorings this
exposes I think.

llvm-svn: 207306

680af7a7

[LCG] Special case the removal of self edges. These don't impact the SCC · a7205b61

Chandler Carruth authored Apr 26, 2014

graph in any way because we don't track edges in the SCC graph, just
nodes. This also lets us add a nice assert about the invariant that
we're working on at least a certain number of nodes within the SCC.

llvm-svn: 207305

a7205b61

[DAG] During DAG legalization keep opaque constants even after expanding. · a6bda8ba

Juergen Ributzka authored Apr 26, 2014

The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.

This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.

This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.

This fixes <rdar://problem/16718472>

llvm-svn: 207304

a6bda8ba

Revert commit r207302 since build failures · c46e9b04
Gerolf Hoflehner authored Apr 26, 2014
```
have been reported.

llvm-svn: 207303
```
c46e9b04

RecursivelyDeleteTriviallyDeadInstructions() could remove · 34210108

Gerolf Hoflehner authored Apr 26, 2014

more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

llvm-svn: 207302

34210108

[X86] Implement TargetLowering::getScalingFactorCost hook. · ea18933d

Quentin Colombet authored Apr 26, 2014

Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.

<rdar://problem/16730541>

llvm-svn: 207301

ea18933d

[LCG] Refactor the duplicated code I added in my last commit here into · 8f92d6db

Chandler Carruth authored Apr 26, 2014

a helper function. Also factor the other two places where we did the
same thing into the helper function. =] Much cleaner this way. NFC.

llvm-svn: 207300

8f92d6db

[InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift · 8cc9059c

Andrea Di Biagio authored Apr 26, 2014

right intrinsics.

A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.

llvm-svn: 207299

8cc9059c

Add missing include guards and missing #include, found by modules build. · 8d039e44
Richard Smith authored Apr 26, 2014
```
llvm-svn: 207298
```
8d039e44
Appease the almighty buildbots. · d71f110f
Filipe Cabecinhas authored Apr 26, 2014
```
llvm-svn: 207295
```
d71f110f

Optimization for certain shufflevector by using insertps. · 363b570d

Filipe Cabecinhas authored Apr 25, 2014

Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.

Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.

Reviewers: nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3475

llvm-svn: 207291

363b570d

Revert "blockfreq: Approximate irreducible control flow" · 42292cea

Duncan P. N. Exon Smith authored Apr 25, 2014

This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

llvm-svn: 207287

42292cea

blockfreq: Approximate irreducible control flow · 384d0e8a

Duncan P. N. Exon Smith authored Apr 25, 2014

Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

llvm-svn: 207286

384d0e8a

Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays. · 232897fe
Adrian Prantl authored Apr 25, 2014
```
llvm-svn: 207284
```
232897fe
Make sure that rangelists are also relative to the compile unit · ece0e90e
Eric Christopher authored Apr 25, 2014
```
low_pc similar to location lists.

Fixes PR19563

llvm-svn: 207283
```
ece0e90e

R600: Fix function name printing in LowerCall · de1c3410

Matt Arsenault authored Apr 25, 2014

v2: Check both ExternalSymbol and GlobalAddress

Patch by: Jan Vesely <jan.vesely@rutgers.edu>

llvm-svn: 207282

de1c3410

DwarfAccelTable: Store the string symbol in the accelerator table to avoid duplicate lookup. · 772ab8ae

David Blaikie authored Apr 25, 2014

This also avoids the need for subtly side-effecting calls to manifest
strings in the string table at the point where items are added to the
accelerator tables.

llvm-svn: 207281

772ab8ae

Apr 25, 2014

Add an -mattr option to the gold plugin to support subtarget features in LTO · fd1bc602

Tom Roeder authored Apr 25, 2014

This adds support for an -mattr option to the gold plugin and to llvm-lto. This
allows the caller to specify details of the subtarget architecture, like +aes,
or +ssse3 on x86.  Note that this requires a change to the include/llvm-c/lto.h
interface: it adds a function lto_codegen_set_attr and it increments the
version of the interface.

llvm-svn: 207279

fd1bc602

Fix missing include · b54d0f40
Alexey Samsonov authored Apr 25, 2014
```
llvm-svn: 207278
```
b54d0f40

Encapsulate the DWARF string pool in a separate type. · daefdbf3

David Blaikie authored Apr 25, 2014

Pulls out some more code from some of the rather monolithic DWARF
classes. Unlike the address table, the string table won't move up into
DwarfDebug - each DWARF file has its own string table (but there can be
only one address table).

llvm-svn: 207277

daefdbf3

[DWARF parser] Cleanup code in DWARFDebugAranges. · 001ecd9a
Alexey Samsonov authored Apr 25, 2014
```
No functionality change.

llvm-svn: 207276
```
001ecd9a