Commits · efbcf4943c268b6e5d6cf093b3560d989d4bddec · Roger Ferrer / llvm-epi-0.8

Feb 06, 2014

Yet another patch to reduce compile time for small programs: · efbcf494

Puyan Lotfi authored Feb 06, 2014

The aim in this patch is to reduce work that VirtRegRewriter needs to do when
telling MachineRegisterInfo which physregs are in use. Up until now
VirtRegRewriter::rewrite has been doing rewriting and populating def info and
then proceeding to set whether a physreg is used based this info for every
physreg that the target provides. This can be expensive when a target has an
unusually high number of supported physregs, and is a noticeable chunk of
compile time for small programs on such targets.

So to reduce compile time, this patch simply adds the use of a SparseSet to the
rewrite function that is used to flag each physreg that is encountered in a
MachineFunction. Afterward, rather than iterating over the set of all physregs
for a given target to set the physregs used in MachineRegisterInfo, the new way
is to iterate over the set of physregs that were actually encountered and set
in the SparseSet. This improves compile time because the existing rewrite
function was iterating over all MachineOperands already, and because the
iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.

llvm-svn: 200919

efbcf494

X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodes · 546b57b0

Tim Northover authored Feb 06, 2014

I believe VZEXT_MOVL means "zero all vector elements except the first" (and
should have identical input & output types) whereas VZEXT means "zero extend
each element of a vector (discarding higher elements if necessary)".

For example:
    (v4i32 (vzext (v16i8 ...)))

should zero extend the low 4 bytes of the incoming vector to 32-bits,
discarding higher bytes.

However, somewhere in the past, these two concepts had become confused, even
leading to a nonsensical VSEXT_MOVL.

This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL
-> VZEXT when it's an actual extension).

rdar://problem/15981990

llvm-svn: 200918

546b57b0

The following patch' purpose is to reduce compile time for compilation of small · 5eb10048

Puyan Lotfi authored Feb 06, 2014

programs on targets with large register files. The root of the compile time
overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
std::vector uses the faster __platform_bzero to zero out primitive buffers when
assign is called, while SmallVector uses an iterator.

The fix for this was simply to replace the SmallVector with a dynamically
allocated buffer and to initialize or reinitialize the buffer based on the
total registers that the target architecture requires. The changes support
cases where a pass manager may be reused for different targets, and note that
the PhysRegEntries is allocated using calloc mainly for good for, and also to
quite tools like Valgrind (see comments for more info on this).

There is an rdar to track the fact that SmallVector doesn't have platform
specific speedup optimizations inside of it for things like this, and I'll
create a bugzilla entry at some point soon as well.

TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
char>::assign(N, 0) with a call to calloc for N bytes which is much faster
because SmallVector's assign uses iterators.

llvm-svn: 200917

5eb10048

This small change reduces compile time for small programs on targets that have · 12ae04bd

Puyan Lotfi authored Feb 06, 2014

large register files. The omission of Queries.clear() is perfectly safe because
LiveIntervalUnion::Query doesn't contain any data that needs freeing and
because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
holding Queries every time it is run, so there's no need to zero out the
queries either. Not having to do this for very large numbers of physregs
is a noticeable constant cost reduction in compilation of small programs.

llvm-svn: 200913

12ae04bd

A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! · 99384949
Nick Lewycky authored Feb 06, 2014
```
llvm-svn: 200907
```
99384949
Delete all of the CodeGenInstructions from CodeGenTarget destructor. · f1aab450
Craig Topper authored Feb 06, 2014
```
llvm-svn: 200906
```
f1aab450

[PM] Fix horrible typos that somehow didn't cause a failure in a C++11 · d1ba2efb

Chandler Carruth authored Feb 06, 2014

build but spectacularly changed behavior of the C++98 build. =]

This shows my one problem with not having unittests -- basic API
expectations aren't well exercised by the integration tests because they
*happen* to not come up, even though they might later. I'll probably add
a basic unittest to complement the integration testing later, but
I wanted to revive the bots.

llvm-svn: 200905

d1ba2efb

[PM] Add a new "lazy" call graph analysis pass for the new pass manager. · bf71a34e

Chandler Carruth authored Feb 06, 2014

The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.

However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:

- Be lazy about scanning function definitions. The existing pass eagerly
  scans the entire module to build the initial graph. This new pass is
  significantly more lazy, and I plan to push this even further to
  maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
  indirect call from functions whose address is taken. This node creates
  a huge choke-point which would preclude good parallelization across
  the fanout of the SCC graph when we got to the point of looking at
  such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
  rather than value handles and tracking call instructions. This will
  require explicit update calls instead of some updates working
  transparently, but should end up being significantly more efficient.
  The explicit update calls ended up being needed in many cases for the
  existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
  for an SCC which is stable across updates. This is essential for the
  new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
  an SCC friendly order. This is a much simpler graph structure and
  should be more memory dense. It does limit the ways in which it is
  appropriate to use this analysis. I wish I had a better name than
  "call graph". I've commented extensively this aspect.

This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.

Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.

A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.

llvm-svn: 200903

bf71a34e

[PM] Back out one hunk of the patch in r200901 that was *supposed* to go · e309d376
Chandler Carruth authored Feb 06, 2014
```
in my next patch. Sorry for the breakage.

llvm-svn: 200902
```
e309d376

[PM] Wire up the analysis managers in the opt driver. This isn't really · c68d0824

Chandler Carruth authored Feb 06, 2014

necessary until we add analyses to the driver, but I have such an
analysis ready and wanted to split this out. This is actually exercised
by the existing tests of the new pass manager as the analysis managers
are cross-checked and validated by the function and module managers.

llvm-svn: 200901

c68d0824

[DAG] Don't pull the binary operation though the shift if the operands have opaque constants. · fa0eba6c

Juergen Ributzka authored Feb 06, 2014

During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900

fa0eba6c

Set default of inlinecold-threshold to 225. · d4612449

Manman Ren authored Feb 06, 2014

225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.

As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.

r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.

The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.

Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.

llvm-svn: 200898

d4612449

Update the X86 assembler for .intel_syntax to accept · d6b10713
Kevin Enderby authored Feb 06, 2014
```
the << and >> bitwise operators.

rdar://15975725

llvm-svn: 200896
```
d6b10713

don't set HasReliableSymbolDifference for ELF. · 6a383f9a

Rafael Espindola authored Feb 06, 2014

It is only used in MachObjectWriter.cpp. Another leftover from early days
of ELF in MC.

llvm-svn: 200895

6a383f9a

doesSectionRequireSymbols is meaningless on ELF, remove. · 12f04984

Rafael Espindola authored Feb 06, 2014

This is a nop. doesSectionRequireSymbols is only used from
isSymbolLinkerVisible. isSymbolLinkerVisible only use from ELF was in

if (!Asm.isSymbolLinkerVisible(Symbol) && !Symbol.isUndefined())
  return false;

if (Symbol.isTemporary())
  return false;

If the symbol is a temporary this code returns false and it is irrelevant if
we take the first if or not. If the symbol is not a temporary,
Asm.isSymbolLinkerVisible returns true without ever calling
doesSectionRequireSymbols.

This was an horrible leftover from when support for ELF was first added.

llvm-svn: 200894

12f04984

Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0

Paul Robinson authored Feb 06, 2014

Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892

af4e64d0

Just returning false is the default. · 4998280f
Rafael Espindola authored Feb 06, 2014
```
llvm-svn: 200890
```
4998280f
Pass address space to allowsUnalignedMemoryAccesses · 1b55dd9a
Matt Arsenault authored Feb 05, 2014
```
llvm-svn: 200888
```
1b55dd9a
Add address space argument to allowsUnalignedMemoryAccess. · 25793a3f
Matt Arsenault authored Feb 05, 2014
```
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
```
25793a3f

Feb 05, 2014

Inliner uses a smaller inline threshold for callees with cold attribute. · e8781b1a

Manman Ren authored Feb 05, 2014

Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.

llvm-svn: 200886

e8781b1a

Fix layering StringRef copy using BumpPtrAllocator. · 4d6d9812

Nick Kledzik authored Feb 05, 2014

Now to copy a string into a BumpPtrAllocator and get a StringRef to the copy:

   StringRef myCopy = myStr.copy(myAllocator);
   

llvm-svn: 200885

4d6d9812

[RegAlloc] Add a last chance recoloring mechanism when everything else failed to · 87769713

Quentin Colombet authored Feb 05, 2014

find a register.

The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
  the neighbors of the current allocated variable.

E.g.,
vA can use {R1, R2    }
vB can use {    R2, R3}
vC can use {R1        }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.

vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.

Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.

<rdar://problem/15947839>

llvm-svn: 200883

87769713

[PM] Don't require analysis results to be const in the new pass manager. · eedf9fca

Chandler Carruth authored Feb 05, 2014

I think this was just over-eagerness on my part. The analysis results
need to often be non-const because they need to (in some cases at least)
be updated by the transformation pass in order to remain correct. It
also makes lazy analyses (a common case) needlessly annoying to write in
order to make their entire state mutable.

llvm-svn: 200881

eedf9fca

Remove support for not using .loc directives. · b4eec1da
Rafael Espindola authored Feb 05, 2014
```
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
```
b4eec1da
Revert "Fix an invalid check for duplicate option categories." · 0bca63a3
Rafael Espindola authored Feb 05, 2014
```
This reverts commit r200853.

It was causing clang/Analysis/checker-plugins.c to crash.

llvm-svn: 200858
```
0bca63a3

[mips] Add NaCl target and forbid indexed loads and stores for it · 9725016a

Petar Jovanovic authored Feb 05, 2014

This patch adds NaCl target for Mips. It also forbids indexed loads and
stores if the target is NaCl.

Patch by Sasa Stankovic.

Differential Revision: http://llvm-reviews.chandlerc.com/D2690

llvm-svn: 200855

9725016a

Fix an invalid check for duplicate option categories. · e88421b6

Alexander Kornienko authored Feb 05, 2014

Summary:
The check performed in the comparator is invalid, as some STL
implementations enforce strict weak ordering by calling the comparator with the
same value. This check was also in a wrong place: the assertion would only fire
when -help was used. The new check is performed each time the category is
registered (we are not going to have thousands of them, so it's fine to do it in
O(N^2)).

Reviewers: jordan_rose

Reviewed By: jordan_rose

CC: cfe-commits, alexmc

Differential Revision: http://llvm-reviews.chandlerc.com/D2699

llvm-svn: 200853

e88421b6

mips: XFAIL non-extern-addend-smallcodemodel test · f3873878

Petar Jovanovic authored Feb 05, 2014

Small code model (and default reloc model) set Reloc::PIC_ in this test,
and PIC is not yet supported in MCJIT for MIPS.

llvm-svn: 200852

f3873878

AVX-512: optimized icmp -> sext -> icmp pattern · 0b79be8a
Elena Demikhovsky authored Feb 05, 2014
```
llvm-svn: 200849
```
0b79be8a
Test commit · 0394c1e6
Alon Mishne authored Feb 05, 2014
```
llvm-svn: 200843
```
0394c1e6

ARM: Resolve thumb_bl fixup in same MCFragment. · d5c48aa3

Logan Chien authored Feb 05, 2014

In Thumb1 mode, bl instruction might be selected for branches between
basic blocks in the function if the offset is greater than 2KB.
However, this might cause SEGV because the destination symbol
is not marked as thumb function and the execution mode will be reset
to ARM mode.

Since we are sure that these symbols are in the same data fragment, we
can simply resolve these local symbols, and don't emit any relocation
information for this bl instruction.

llvm-svn: 200842

d5c48aa3

AVX-512: fixed a bug in EVEX encoding (the bug appeared after r200624) · a38114c4
Elena Demikhovsky authored Feb 05, 2014
```
llvm-svn: 200837
```
a38114c4

R600/SI: Add pattern for zero-extending i1 to i32 · 5d26fdfc

Michel Danzer authored Feb 05, 2014

Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469



Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830

5d26fdfc

Shrink the size of CodeGenInstruction a little bit by using bitfields. 32 bools seemed excessive. · bc9486be
Craig Topper authored Feb 05, 2014
```
llvm-svn: 200829
```
bc9486be

Get rid of a vector copy by just making a pointer out of the reference... · 4c6129af

Craig Topper authored Feb 05, 2014

Get rid of a vector copy by just making a pointer out of the reference returned by getInstructionsByEnumValue instead of assigning it to a new vector.

llvm-svn: 200828

4c6129af

Fix a vector that was passed by value instead of reference. · 65efcb46
Craig Topper authored Feb 05, 2014
```
llvm-svn: 200827
```
65efcb46
ARM: Enable use of relocation type tlsldo in debug info for tls data. · 382c1405
Kai Nacke authored Feb 05, 2014
```
This fixes PR18554.

Reviewers: Renato Golin, Keith Walker
llvm-svn: 200826
```
382c1405
Fix a doxygen comment referencing the wrong method name. · 1129d452
Craig Topper authored Feb 05, 2014
```
llvm-svn: 200825
```
1129d452

Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns... · 7ee16384

Craig Topper authored Feb 05, 2014

Move matching for x86 BMI BLSI/BLSMSK/BLSR instructions to isel patterns instead of DAG combine. This weakens the ability to fold loads with them because we aren't able to match patterns that load the same thing twice. But maybe we should fix that if we care. The peephole optimizer will be able to fold some loads in its absense.

llvm-svn: 200824

7ee16384

AVX-512: Added intrinsic for cvtph2ps. · a30e4376

Elena Demikhovsky authored Feb 05, 2014

Added VPTESTNM instruction.
Added a pattern to vselect (lit tests will follow).

llvm-svn: 200823

a30e4376