Commits · efbcf4943c268b6e5d6cf093b3560d989d4bddec · Roger Ferrer / llvm-epi-0.8

Feb 06, 2014

Yet another patch to reduce compile time for small programs: · efbcf494

Puyan Lotfi authored Feb 06, 2014

The aim in this patch is to reduce work that VirtRegRewriter needs to do when
telling MachineRegisterInfo which physregs are in use. Up until now
VirtRegRewriter::rewrite has been doing rewriting and populating def info and
then proceeding to set whether a physreg is used based this info for every
physreg that the target provides. This can be expensive when a target has an
unusually high number of supported physregs, and is a noticeable chunk of
compile time for small programs on such targets.

So to reduce compile time, this patch simply adds the use of a SparseSet to the
rewrite function that is used to flag each physreg that is encountered in a
MachineFunction. Afterward, rather than iterating over the set of all physregs
for a given target to set the physregs used in MachineRegisterInfo, the new way
is to iterate over the set of physregs that were actually encountered and set
in the SparseSet. This improves compile time because the existing rewrite
function was iterating over all MachineOperands already, and because the
iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data.

llvm-svn: 200919

efbcf494

The following patch' purpose is to reduce compile time for compilation of small · 5eb10048

Puyan Lotfi authored Feb 06, 2014

programs on targets with large register files. The root of the compile time
overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which
resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast
std::vector uses the faster __platform_bzero to zero out primitive buffers when
assign is called, while SmallVector uses an iterator.

The fix for this was simply to replace the SmallVector with a dynamically
allocated buffer and to initialize or reinitialize the buffer based on the
total registers that the target architecture requires. The changes support
cases where a pass manager may be reused for different targets, and note that
the PhysRegEntries is allocated using calloc mainly for good for, and also to
quite tools like Valgrind (see comments for more info on this).

There is an rdar to track the fact that SmallVector doesn't have platform
specific speedup optimizations inside of it for things like this, and I'll
create a bugzilla entry at some point soon as well.

TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned
char>::assign(N, 0) with a call to calloc for N bytes which is much faster
because SmallVector's assign uses iterators.

llvm-svn: 200917

5eb10048

This small change reduces compile time for small programs on targets that have · 12ae04bd

Puyan Lotfi authored Feb 06, 2014

large register files. The omission of Queries.clear() is perfectly safe because
LiveIntervalUnion::Query doesn't contain any data that needs freeing and
because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr
holding Queries every time it is run, so there's no need to zero out the
queries either. Not having to do this for very large numbers of physregs
is a noticeable constant cost reduction in compilation of small programs.

llvm-svn: 200913

12ae04bd

[DAG] Don't pull the binary operation though the shift if the operands have opaque constants. · fa0eba6c

Juergen Ributzka authored Feb 06, 2014

During DAGCombine visitShiftByConstant assumes that certain binary operations
with only constant operands can always be folded successfully. This is no longer
true when the constant is opaque. This commit fixes visitShiftByConstant by not
performing the optimization for opaque constants. Otherwise we would end up in
an infinite DAGCombine loop.

llvm-svn: 200900

fa0eba6c

Pass address space to allowsUnalignedMemoryAccesses · 1b55dd9a
Matt Arsenault authored Feb 05, 2014
```
llvm-svn: 200888
```
1b55dd9a
Add address space argument to allowsUnalignedMemoryAccess. · 25793a3f
Matt Arsenault authored Feb 05, 2014
```
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
```
25793a3f

Feb 05, 2014

[RegAlloc] Add a last chance recoloring mechanism when everything else failed to · 87769713

Quentin Colombet authored Feb 05, 2014

find a register.

The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
  the neighbors of the current allocated variable.

E.g.,
vA can use {R1, R2    }
vB can use {    R2, R3}
vC can use {R1        }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.

vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.

Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.

<rdar://problem/15947839>

llvm-svn: 200883

87769713

Remove support for not using .loc directives. · b4eec1da
Rafael Espindola authored Feb 05, 2014
```
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
```
b4eec1da
Add CheckChildInteger to ISelMatcher operations. Removes nearly 2000 bytes from X86 matcher table. · 7ca1d180
Craig Topper authored Feb 05, 2014
```
llvm-svn: 200821
```
7ca1d180

Feb 04, 2014

Use the default values. · 7b514969
Rafael Espindola authored Feb 04, 2014
```
llvm-svn: 200781
```
7b514969
RegAllocGreedy.cpp: Use more simple value as Hysteresis, to suppress -mfpmath-dependent behavior. · a71003ae
NAKAMURA Takumi authored Feb 04, 2014
```
llvm-svn: 200738
```
a71003ae

DebugInfo: Remove some unneeded conditionals now that DIBuilder no longer... · 5e390e4d

David Blaikie authored Feb 04, 2014

DebugInfo: Remove some unneeded conditionals now that DIBuilder no longer emits zero-length arrays as {i32 0}

A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.

llvm-svn: 200731

5e390e4d

Feb 03, 2014

Expand vector bswap in LegalizeVectorOps · 5c968d94

Hal Finkel authored Feb 03, 2014

ISD::BSWAP was missing from the list of node types that should be expanded
element-wise.

llvm-svn: 200705

5c968d94

Feb 01, 2014

Remove some unused #includes · fc49d198
Eli Bendersky authored Feb 01, 2014
```
llvm-svn: 200611
```
fc49d198

[stackprotector] Implement the sspstrong rules for stack layout. · 24c7f063

Josh Magee authored Feb 01, 2014

This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.

The sspstrong layout rules are:
 1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
 2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
 3. Variables that have had their address taken are 3rd closest to the
protector.


Differential Revision: http://llvm-reviews.chandlerc.com/D2546

llvm-svn: 200601

24c7f063

Implement inalloca codegen for x86 with the new inalloca design · f5b76518

Reid Kleckner authored Jan 31, 2014

Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.

Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work.  As a result these changes are pretty
minimal.

Reviewers: echristo

Differential Revision: http://llvm-reviews.chandlerc.com/D2637

llvm-svn: 200596

f5b76518

Don't put non-static allocas in the static alloca map · dfbed59c

Reid Kleckner authored Jan 31, 2014

Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block.  Also add an
assertion in x86 fastisel.

llvm-svn: 200593

dfbed59c

Remove a redundant call to hasRawTextSupport. · 499a748b
Rafael Espindola authored Jan 31, 2014
```
The code path it was guarding was already using emitRawComment.

llvm-svn: 200591
```
499a748b

Jan 31, 2014

If we're not producing DWARF accel tables, don't waste memory · 3878a781
Paul Robinson authored Jan 31, 2014
```
keeping track of those entries.

llvm-svn: 200572
```
3878a781

Add support for DW_FORM_flag and DW_FORM_flag_present to the DIE hashing · 4b1cf580

Eric Christopher authored Jan 31, 2014

algorithm. Sink the 'A' + Attribute hash into each form so we don't
have to check valid forms before deciding whether or not we're going
to hash which will let the default be to return without doing anything.

llvm-svn: 200571

4b1cf580

DebugInfo: Flag type unit references as declarations · 322d79b4

David Blaikie authored Jan 31, 2014

This ensures DWARF consumers don't confuse these references for
definitions. I'd argue it might be nice to improve debuggers so we don't
need this, but it's just one field in an abbreviation anyway - so it
doesn't seem worth the fight.

llvm-svn: 200569

322d79b4

This patch teaches the DAGCombiner how to fold insert_subvector nodes · 413a6cb4

Manman Ren authored Jan 31, 2014

when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506

413a6cb4

DAGCombine should not produce ISD::OR nodes after operation legalization if they're not legal. · 60a4678c
Owen Anderson authored Jan 31, 2014
```
llvm-svn: 200503
```
60a4678c

PGO branch weight: update edge weights in SelectionDAGBuilder. · 4ece7452

Manman Ren authored Jan 31, 2014

When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".

Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.

llvm-svn: 200502

4ece7452

Jan 30, 2014

[Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic. · fb4d6482
Juergen Ributzka authored Jan 30, 2014
```
Re-applying the patch, but this time without using AsmPrinter methods.

Reviewed by Andy

llvm-svn: 200481
```
fb4d6482

Revert "[Stackmaps] Record the stack size of each function that contains a... · f6f0ce90

Juergen Ributzka authored Jan 30, 2014

Revert "[Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic."

This reverts commit r200444 to unbreak buildbots.

llvm-svn: 200445

f6f0ce90

[Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic. · aece7583
Juergen Ributzka authored Jan 30, 2014
```
Reviewed by Andy

llvm-svn: 200444
```
aece7583
Reland r200340 - 'Add line table debug info to COFF files when using a win32 triple' · f166f6c8
Timur Iskhodzhanov authored Jan 30, 2014
```
This incorporates a couple of fixes reviewed at http://llvm-reviews.chandlerc.com/D2651

llvm-svn: 200440
```
f166f6c8
Revert r200431 due to bot failures. · 7407e0e3
Manman Ren authored Jan 30, 2014
```
llvm-svn: 200434
```
7407e0e3

PGO branch weight: update edge weights in SelectionDAGBuilder. · 104e0c80

Manman Ren authored Jan 30, 2014

When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

llvm-svn: 200431

104e0c80

PGO branch weight: update edge weights in IfConverter. · b681918d

Manman Ren authored Jan 29, 2014

This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)

An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.

llvm-svn: 200428

b681918d

Move range handling for a function to endFunction rather than · 1a972150
Eric Christopher authored Jan 29, 2014
```
when we create the subprogram DIE.

llvm-svn: 200426
```
1a972150

Jan 29, 2014

If we use DW_AT_ranges we need to specify a base address that ranges · 8873adaa

Eric Christopher authored Jan 29, 2014

are relative to in the compile unit. Currently let's just use 0...

Thanks to Greg Clayton for the catch!

llvm-svn: 200425

8873adaa

Turn on CU ranges if we've got multiple compile units in the same · fb8dd008

Eric Christopher authored Jan 29, 2014

module since there's no range guarantee that we could make given
output order. This also fixes up the testcases that have multiple
CUs to have the correct range offset.

llvm-svn: 200422

fb8dd008

Make the compile unit map a MapVector so that we can assume a stable · 179fba19
Eric Christopher authored Jan 29, 2014
```
output ordering.

llvm-svn: 200421
```
179fba19
Fix formatting of comment. · 95531b69
Eric Christopher authored Jan 29, 2014
```
llvm-svn: 200420
```
95531b69

Enable EHABI by default · 8cea6e8f

Renato Golin authored Jan 29, 2014

After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.

This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.

Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.

The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.

llvm-svn: 200388

8cea6e8f

Revert r200340, "Add line table debug info to COFF files when using a win32 triple." · b366f01f
NAKAMURA Takumi authored Jan 29, 2014
```
It was incompatible with --target=i686-win32.

llvm-svn: 200375
```
b366f01f
Change MCStreamer EmitInstruction interface to take subtarget info · e6c13e4a
David Woodhouse authored Jan 28, 2014
```
llvm-svn: 200345
```
e6c13e4a

Jan 28, 2014
- Add line table debug info to COFF files when using a win32 triple. · 2c659648
  Timur Iskhodzhanov authored Jan 28, 2014
```
Reviewed at http://llvm-reviews.chandlerc.com/D2232

llvm-svn: 200340
```
  2c659648