Commits · 139e5c5b75aed5353c3941aa6e8dad1ebb51e2e5 · Roger Ferrer / llvm-epi-0.8

Aug 23, 2013

PrintVRegOrUnit · 475a9911
Andrew Trick authored Aug 23, 2013
```
llvm-svn: 189124
```
475a9911
Rename to RegPressure API parameters RegUnits. · e4c1ba76
Andrew Trick authored Aug 23, 2013
```
llvm-svn: 189123
```
e4c1ba76
Simplify RegPressure helpers. · 01bc2164
Andrew Trick authored Aug 23, 2013
```
llvm-svn: 189122
```
01bc2164
Add a convenient PSetIterator for visiting pressure sets affected by a register. · 86a7061e
Andrew Trick authored Aug 23, 2013
```
llvm-svn: 189121
```
86a7061e

Adds cyclic critical path computation and heuristics, temporarily disabled. · c01b0040

Andrew Trick authored Aug 23, 2013

Estimate the cyclic critical path within a single block loop. If the
acyclic critical path is longer, then the loop will exhaust OOO
resources after some number of iterations. If lag between the acyclic
critical path and cyclic critical path is longer the the time it takes
to issue those loop iterations, then aggressively schedule for
latency.

llvm-svn: 189120

c01b0040

MI Sched: record local vreg uses. · 8dd26f00

Andrew Trick authored Aug 23, 2013

This will be used to compute the cyclic critical path and to
update precomputed per-node pressure differences.
In the longer term, it could also be used to speed up LiveInterval
update by avoiding visiting all global vreg users.

llvm-svn: 189118

8dd26f00

mi-sched: Don't call MBB.size() in initSUnits. The driver already has instr count. · a53e1016
Andrew Trick authored Aug 23, 2013
```
This fixes a pathological compile time problem with very large blocks
and lots of scheduling boundaries.

llvm-svn: 189116
```
a53e1016

Turn MipsOptimizeMathLibCalls into a target-independent scalar transform · 37cd6cfb

Richard Sandiford authored Aug 23, 2013

...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.

llvm-svn: 189097

37cd6cfb

[stack protector] Work around an issue with the BMOVPCB_CALL instruction on... · 20f25eb9

Michael Gottesman authored Aug 22, 2013

[stack protector] Work around an issue with the BMOVPCB_CALL instruction on ARM by disabling does not return on __stack_chk_fail.

This is to fix the bots while I look to see if there is something I can do here.

rdar://14811848

llvm-svn: 189076

20f25eb9

Aug 22, 2013

Check only if we have this attribute. If it's not an attribute, then it's assumed false. · fe88aea7
Bill Wendling authored Aug 22, 2013
```
llvm-svn: 189063
```
fe88aea7

[stackprotector] When finding the split point to splice off the end of a... · 1adac358

Michael Gottesman authored Aug 22, 2013

[stackprotector] When finding the split point to splice off the end of a parentmbb into a successmbb, include any DBG_VALUE MI.

Fix for PR16954.

llvm-svn: 188987

1adac358

SelectionDAG: Make sure stores are always added to the LegalizedNodes list · 1b2c2d84

Tom Stellard authored Aug 21, 2013

When truncated vector stores were being custom lowered in
VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair
was not being added to LegalizedNodes list.  Instead of the legalized
result being passed to VectorLegalizer::TranslateLegalizeResult(),
the result was being passed back into VectorLegalizer::LegalizeOp(),
which ended up adding a (new, new) pair to the list instead.

This was causing an assertion failure when a custom lowered truncated
vector store was the last instruction a basic block and the VectorLegalizer
was unable to find it in the LegalizedNodes list when updating the
DAG root.

llvm-svn: 188953

1b2c2d84

Aug 21, 2013

Teach BaseIndexOffset::match to identify base pointers in loops. · 3db39dc1

Juergen Ributzka authored Aug 21, 2013

The small utility function that pattern matches Base + Index +
Offset patterns for loads and stores fails to recognize the base
pointer for loads/stores from/into an array at offset 0 inside a
loop. As a result DAGCombiner::MergeConsecutiveStores was not able
to merge all stores.

This commit fixes the issue by adding an additional pattern match
and also a test case.

Reviewer: Nadav
llvm-svn: 188936

3db39dc1

DebugInfo: Do not use the DWARF Version for the .debug_pubnames or .debug_pubtypes version field · ed89b5c6

David Majnemer authored Aug 21, 2013

Summary:
LLVM would generate DWARF with version 3 in the .debug_pubname and
.debug_pubtypes version fields.  This would lead SGI dwarfdump to fail
parsing the DWARF with (in the instance of .debug_pubnames) would exit
with:
dwarfdump ERROR:  dwarf_get_globals: DW_DLE_PUBNAMES_VERSION_ERROR (123)

This fixes PR16950.

Reviewers: echristo, dblaikie

Reviewed By: echristo

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1454

llvm-svn: 188869

ed89b5c6

Aug 20, 2013

[SystemZ] Use SRST to optimize memchr · 6f6d5516

Richard Sandiford authored Aug 20, 2013

SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779

6f6d5516

Fix overly pessimistic shortcut in post-RA MachineLICM · 96aa93d5

Richard Sandiford authored Aug 20, 2013

Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs.  When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too.  E.g. the final candidate loop just has:

    unsigned Def = Candidates[i].Def;
    if (!PhysRegClobbers.test(Def) && ...) {

to test whether register Def is multiply defined.

However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1.  If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).

This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.

llvm-svn: 188774

96aa93d5

[stackprotector] Small cleanup. · dc985ef0
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188772
```
dc985ef0
[stackprotector] Small Bit of computation hoisting. · 76c44be1
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188771
```
76c44be1

[stackprotector] Added significantly longer comment to FindPotentialTailCall... · 1977d15e

Michael Gottesman authored Aug 20, 2013

[stackprotector] Added significantly longer comment to FindPotentialTailCall to make clear its relationship to llvm::isInTailCallPosition.

llvm-svn: 188770

1977d15e

Removed trailing whitespace. · 62c5d714
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188769
```
62c5d714
[stackprotector] Removed stale TODO. · 56e246b1
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188768
```
56e246b1
[stackprotector] Added support for emitting the llvm intrinsic stack protector check. · 5e57068b
Michael Gottesman authored Aug 20, 2013
```
rdar://13935163

llvm-svn: 188766
```
5e57068b

[stackprotector] Refactor out the end of isInTailCallPosition into the... · ce0e4c26

Michael Gottesman authored Aug 20, 2013

[stackprotector] Refactor out the end of isInTailCallPosition into the function returnTypeIsEligibleForTailCall.

This allows me to use returnTypeIsEligibleForTailCall in the stack protector pass.

rdar://13935163

llvm-svn: 188765

ce0e4c26

Remove unused variables that crept in. · f7e1203d
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188761
```
f7e1203d

Teach selectiondag how to handle the stackprotectorcheck intrinsic. · b27f0f1f

Michael Gottesman authored Aug 20, 2013

Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).

Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!

A few goals in solving this problem were:

1. Preserve the architecture independence of stack protector generation.

2. Preserve the normal IR level stack protector check for platforms like
OpenBSD for which we support platform specific stack protector
generation.

The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:

1. The decision on whether or not to perform a sibling call on certain
platforms (for instance i386) requires lower level information
related to available registers that can not be known at the IR level.

2. Even if the previous point were not true, the decision on whether to
perform a tail call is done in LowerCallTo in SelectionDAG which
occurs after the Stack Protector Pass. As a result, one would need to
put the relevant callinst into the stack protector check success
basic block (where the return inst is placed) and then move it back
later at SelectionDAG/MI time before the stack protector check if the
tail call optimization failed. The MI level option was nixed
immediately since it would require platform specific pattern
matching. The SelectionDAG level option was nixed because
SelectionDAG only processes one IR level basic block at a time
implying one could not create a DAG Combine to move the callinst.

To get around this problem a few things were realized:

1. While one can not handle multiple IR level basic blocks at the
SelectionDAG Level, one can generate multiple machine basic blocks
for one IR level basic block. This is how we handle bit tests and
switches.

2. At the MI level, tail calls are represented via a special return
MIInst called "tcreturn". Thus if we know the basic block in which we
wish to insert the stack protector check, we get the correct behavior
by always inserting the stack protector check right before the return
statement. This is a "magical transformation" since no matter where
the stack protector check intrinsic is, we always insert the stack
protector check code at the end of the BB.

Given the aforementioned constraints, the following solution was devised:

1. On platforms that do not support SelectionDAG stack protector check
generation, allow for the normal IR level stack protector check
generation to continue.

2. On platforms that do support SelectionDAG stack protector check
generation:

a. Use the IR level stack protector pass to decide if a stack
protector is required/which BB we insert the stack protector check
in by reusing the logic already therein. If we wish to generate a
stack protector check in a basic block, we place a special IR
intrinsic called llvm.stackprotectorcheck right before the BB's
returninst or if there is a callinst that could potentially be
sibling call optimized, before the call inst.

b. Then when a BB with said intrinsic is processed, we codegen the BB
normally via SelectBasicBlock. In said process, when we visit the
stack protector check, we do not actually emit anything into the
BB. Instead, we just initialize the stack protector descriptor
class (which involves stashing information/creating the success
mbbb and the failure mbb if we have not created one for this
function yet) and export the guard variable that we are going to
compare.

c. After we finish selecting the basic block, in FinishBasicBlock if
the StackProtectorDescriptor attached to the SelectionDAGBuilder is
initialized, we first find a splice point in the parent basic block
before the terminator and then splice the terminator of said basic
block into the success basic block. Then we code-gen a new tail for
the parent basic block consisting of the two loads, the comparison,
and finally two branches to the success/failure basic blocks. We
conclude by code-gening the failure basic block if we have not
code-gened it already (all stack protector checks we generate in
the same function, use the same failure basic block).

llvm-svn: 188755

b27f0f1f

Add a llvm.copysign intrinsic · 0c5c01aa

Hal Finkel authored Aug 19, 2013

This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728

0c5c01aa

Aug 19, 2013

Use less verbose code and update comments. · 574b5c88
Eric Christopher authored Aug 19, 2013
```
llvm-svn: 188711
```
574b5c88

Turn on pubnames by default on linux. · 7da24888

Eric Christopher authored Aug 19, 2013

Until gdb supports the new accelerator tables we should add the
pubnames section so that gdb_index can be generated from gold
at link time. On darwin we already emit the accelerator tables
and so don't need to worry about pubnames.

llvm-svn: 188708

7da24888

Improve the widening of integral binary vector operations · 62f840f4

Paul Redmond authored Aug 19, 2013

- split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap
  - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations
    that can trap
  - WidenVecRes_Binary simply widens the operation and improves codegen for
    3-element vectors by allowing widening and promotion on x86 (matches the
    behaviour of unary and ternary operation widening)
- use WidenVecRes_Binary for operations on integers.

Reviewed by: nrotem

llvm-svn: 188699

62f840f4

Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions · e4eb7818

Hal Finkel authored Aug 19, 2013

We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node
because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the
sum of two doubles, and the first double must have the larger magnitude, we
can take the sign from the first double. As a result, in addition to fixing the
crash, this is also an optimization.

llvm-svn: 188655

e4eb7818

DebugInfo: don't emit zero-length names for parameters · 715528be

David Blaikie authored Aug 19, 2013

We check this in many/all other cases, just missed this one it seems.
Perhaps it'd be worth unifying this so we never emit zero-length
DW_AT_names.

llvm-svn: 188649

715528be

Aug 17, 2013

ARM: Fix more fast-isel verifier failures. · 06c2a681

Jim Grosbach authored Aug 16, 2013

Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.

rdar://12594152

llvm-svn: 188593

06c2a681

Aug 16, 2013

DebugInfo: Allow the addition of other (such as static data) members to a... · d4e106e3

David Blaikie authored Aug 16, 2013

DebugInfo: Allow the addition of other (such as static data) members to a record type after construction

Plus a type cleanup & minor fix to enumerate members of declarations.

llvm-svn: 188577

d4e106e3

[SystemZ] Use SRST to implement strlen and strnlen · 0dec06a2
Richard Sandiford authored Aug 16, 2013
```
It would also make sense to use it for memchr; I'm working on that now.

llvm-svn: 188547
```
0dec06a2
[SystemZ] Use MVST to implement strcpy and stpcpy · bb83a50f
Richard Sandiford authored Aug 16, 2013
```
llvm-svn: 188546
```
bb83a50f
[SystemZ] Use CLST to implement strcmp · ca232710
Richard Sandiford authored Aug 16, 2013
```
llvm-svn: 188544
```
ca232710

[SystemZ] Fix handling of 64-bit memcmp results · e3827751

Richard Sandiford authored Aug 16, 2013

Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did.  I've split this out into a
subroutine so that it can be used for other upcoming patches.

I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads.  I don't have a testcase for that because
we don't do any interesting scheduling on z yet.

llvm-svn: 188540

e3827751

Aug 15, 2013
- Make a few more things const. · 33fae693
  Bill Wendling authored Aug 15, 2013
```
llvm-svn: 188484
```
  33fae693
- Use a reference instead of making an unnecessary copy. Also use 'const'. · 2d092f05
  Bill Wendling authored Aug 15, 2013
```
llvm-svn: 188483
```
  2d092f05
- Replace getValueType().getSimpleVT() with getSimpleValueType(). · d9c2783d
  Craig Topper authored Aug 15, 2013
```
llvm-svn: 188442
```
  d9c2783d