Commits · ced9226b0f79f85e4e2215b8d2b9746a7262dab1 · Roger Ferrer / llvm-epi-0.8

Feb 07, 2014

[Sparc] Emit correct encoding for atomic instructions. Also, add support for... · ced9226b

Venkatraman Govindaraju authored Feb 07, 2014

[Sparc] Emit correct encoding for atomic instructions. Also, add support for parsing CAS instructions to test the CAS encoding. 

llvm-svn: 200963

ced9226b

[Sparc] Emit relocations for Thread Local Storage (TLS) when integrated assembler is used. · fd07500d
Venkatraman Govindaraju authored Feb 07, 2014
```
llvm-svn: 200962
```
fd07500d
[Sparc] Emit correct relocations for PIC code when integrated assembler is used. · 104643d0
Venkatraman Govindaraju authored Feb 07, 2014
```
llvm-svn: 200961
```
104643d0

PGO branch weight: fix PR18752. · 37c92671

Manman Ren authored Feb 07, 2014

Fix a bug triggered in IfConverterTriangle when CvtBB has multiple predecessors
by getting the weights before removing a successor.

llvm-svn: 200958

37c92671

X86: Resolve a long standing FIXME and properly isel pextr[bw]. · e9008de6

Jim Grosbach authored Feb 07, 2014

Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.

The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.

Add test cases for SSE4 and AVX variants.

Resolves rdar://13414672.

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 200957

e9008de6

Convert test to FileCheck. · 803fb108
Rafael Espindola authored Feb 06, 2014
```
llvm-svn: 200955
```
803fb108

Feb 06, 2014

[CodeGenPrepare] Move away sign extensions that get in the way of addressing · 3a4bf040

Quentin Colombet authored Feb 06, 2014

mode.

Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep

Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep

That way the computation can be folded into the addressing mode.

This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.

<rdar://problem/15519855>

llvm-svn: 200947

3a4bf040

R600/SI: Add a MUBUF store pattern for Reg+Imm offsets · e2367945
Tom Stellard authored Feb 06, 2014
```
llvm-svn: 200935
```
e2367945
R600/SI: Add a MUBUF store pattern for Imm offsets · 2937cbc0
Tom Stellard authored Feb 06, 2014
```
llvm-svn: 200934
```
2937cbc0
R600/SI: Add a MUBUF load pattern for Reg+Imm offsets · 11624bc5
Tom Stellard authored Feb 06, 2014
```
llvm-svn: 200933
```
11624bc5

R600/SI: Use immediates offsets for SMRD instructions whenever possible · 044e418f

Tom Stellard authored Feb 06, 2014

There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.

llvm-svn: 200932

044e418f

X86: add costs for 64-bit vector ext/trunc & rebalance · f0e21616

Tim Northover authored Feb 06, 2014

The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.

It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.

Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.

rdar://problem/15981990

llvm-svn: 200928

f0e21616

A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! · 99384949
Nick Lewycky authored Feb 06, 2014
```
llvm-svn: 200907
```
99384949

[PM] Add a new "lazy" call graph analysis pass for the new pass manager. · bf71a34e

Chandler Carruth authored Feb 06, 2014

The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.

However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:

- Be lazy about scanning function definitions. The existing pass eagerly
  scans the entire module to build the initial graph. This new pass is
  significantly more lazy, and I plan to push this even further to
  maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
  indirect call from functions whose address is taken. This node creates
  a huge choke-point which would preclude good parallelization across
  the fanout of the SCC graph when we got to the point of looking at
  such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
  rather than value handles and tracking call instructions. This will
  require explicit update calls instead of some updates working
  transparently, but should end up being significantly more efficient.
  The explicit update calls ended up being needed in many cases for the
  existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
  for an SCC which is stable across updates. This is essential for the
  new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
  an SCC friendly order. This is a much simpler graph structure and
  should be more memory dense. It does limit the ways in which it is
  appropriate to use this analysis. I wish I had a better name than
  "call graph". I've commented extensively this aspect.

This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.

Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.

A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.

llvm-svn: 200903

bf71a34e

[DAG] Don't pull the binary operation though the shift if the operands have opaque constants. · fa0eba6c

Juergen Ributzka authored Feb 06, 2014

During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900

fa0eba6c

Set default of inlinecold-threshold to 225. · d4612449

Manman Ren authored Feb 06, 2014

225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.

As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.

r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.

The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.

Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.

llvm-svn: 200898

d4612449

Update the X86 assembler for .intel_syntax to accept · d6b10713
Kevin Enderby authored Feb 06, 2014
```
the << and >> bitwise operators.

rdar://15975725

llvm-svn: 200896
```
d6b10713

Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0

Paul Robinson authored Feb 06, 2014

Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892

af4e64d0

Feb 05, 2014

Inliner uses a smaller inline threshold for callees with cold attribute. · e8781b1a

Manman Ren authored Feb 05, 2014

Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.

llvm-svn: 200886

e8781b1a

[RegAlloc] Add a last chance recoloring mechanism when everything else failed to · 87769713

Quentin Colombet authored Feb 05, 2014

find a register.

The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
  the neighbors of the current allocated variable.

E.g.,
vA can use {R1, R2    }
vB can use {    R2, R3}
vC can use {R1        }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.

vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.

Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.

<rdar://problem/15947839>

llvm-svn: 200883

87769713

Remove support for not using .loc directives. · b4eec1da
Rafael Espindola authored Feb 05, 2014
```
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
```
b4eec1da

[mips] Add NaCl target and forbid indexed loads and stores for it · 9725016a

Petar Jovanovic authored Feb 05, 2014

This patch adds NaCl target for Mips. It also forbids indexed loads and
stores if the target is NaCl.

Patch by Sasa Stankovic.

Differential Revision: http://llvm-reviews.chandlerc.com/D2690

llvm-svn: 200855

9725016a

mips: XFAIL non-extern-addend-smallcodemodel test · f3873878

Petar Jovanovic authored Feb 05, 2014

Small code model (and default reloc model) set Reloc::PIC_ in this test,
and PIC is not yet supported in MCJIT for MIPS.

llvm-svn: 200852

f3873878

AVX-512: optimized icmp -> sext -> icmp pattern · 0b79be8a
Elena Demikhovsky authored Feb 05, 2014
```
llvm-svn: 200849
```
0b79be8a

ARM: Resolve thumb_bl fixup in same MCFragment. · d5c48aa3

Logan Chien authored Feb 05, 2014

In Thumb1 mode, bl instruction might be selected for branches between
basic blocks in the function if the offset is greater than 2KB.
However, this might cause SEGV because the destination symbol
is not marked as thumb function and the execution mode will be reset
to ARM mode.

Since we are sure that these symbols are in the same data fragment, we
can simply resolve these local symbols, and don't emit any relocation
information for this bl instruction.

llvm-svn: 200842

d5c48aa3

AVX-512: fixed a bug in EVEX encoding (the bug appeared after r200624) · a38114c4
Elena Demikhovsky authored Feb 05, 2014
```
llvm-svn: 200837
```
a38114c4

R600/SI: Add pattern for zero-extending i1 to i32 · 5d26fdfc

Michel Danzer authored Feb 05, 2014

Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469



Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830

5d26fdfc

ARM: Enable use of relocation type tlsldo in debug info for tls data. · 382c1405
Kai Nacke authored Feb 05, 2014
```
This fixes PR18554.

Reviewers: Renato Golin, Keith Walker
llvm-svn: 200826
```
382c1405

AVX-512: Added intrinsic for cvtph2ps. · a30e4376

Elena Demikhovsky authored Feb 05, 2014

Added VPTESTNM instruction.
Added a pattern to vselect (lit tests will follow).

llvm-svn: 200823

a30e4376

Add a test for printing absolute symbols in ELF. · 02eac9a2
Rafael Espindola authored Feb 05, 2014
```
llvm-svn: 200818
```
02eac9a2
Small fix for llvm-nm handling of weak symbols on ELF (print 'v'). · f42c58d2
Rafael Espindola authored Feb 04, 2014
```
llvm-svn: 200808
```
f42c58d2
Update testing case for r200806. · 81f136f9
Manman Ren authored Feb 04, 2014
```
llvm-svn: 200807
```
81f136f9
Add a test for common symbols in coff. · fb66ef05
Rafael Espindola authored Feb 04, 2014
```
llvm-svn: 200803
```
fb66ef05

Feb 04, 2014

SimplifyLibCalls: Push TLI through the exp2->ldexp transform. · 34f460ed
Benjamin Kramer authored Feb 04, 2014
```
For the odd case of platforms with exp2 available but not ldexp.

llvm-svn: 200795
```
34f460ed

[mips] Implement %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions · a5da588b

Petar Jovanovic authored Feb 04, 2014

Patch implements %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions for MIPS
by creating target expression class MipsMCExpr.

Patch by Sasa Stankovic.

Differential Revision: http://llvm-reviews.chandlerc.com/D2592

llvm-svn: 200783

a5da588b

Fix PR18345: ldr= pseudo instruction produces incorrect code when using in inline assembly · b9b7362c

David Peixotto authored Feb 04, 2014

This patch fixes the ldr-pseudo implementation to work when used in
inline assembly.  The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.

Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.

An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.

Differential Revision: http://llvm-reviews.chandlerc.com/D2638

llvm-svn: 200777

b9b7362c

R600/SI: Custom lower i64 ISD::SELECT · 0ec134f3
Tom Stellard authored Feb 04, 2014
```
llvm-svn: 200774
```
0ec134f3

R600: Enable vector fpow. · bfebd1fc

Tom Stellard authored Feb 04, 2014



The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."

Patch by: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773

bfebd1fc

OS X: the correct function is __sincospif_stret, not __sincospi_stretf · 103e648d
Tim Northover authored Feb 04, 2014
```
rdar://problem/13729466

llvm-svn: 200771
```
103e648d

ARM & AArch64: merge NEON absolute compare intrinsics · fdbdb4b6

Tim Northover authored Feb 04, 2014

There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.

llvm-svn: 200768

fdbdb4b6