Commits · 8234d40843d545e80d98147e30e2917169d30840 · Roger Ferrer / llvm-epi-0.8

Aug 01, 2013
- Only enable SLP-vectorization on O3 builds. · 9153b387
  Nadav Rotem authored Aug 01, 2013
```
llvm-svn: 187595
```
  9153b387
- 80-col · 25f15358
  Nadav Rotem authored Jul 31, 2013
```
llvm-svn: 187535
```
  25f15358
Jul 31, 2013
- Preserve fast-math flags when folding (fsub x, (fneg y)) to (fadd x, y). · c7be519d
  Owen Anderson authored Jul 30, 2013
```
llvm-svn: 187462
```
  c7be519d
Jul 30, 2013

Change behavior of calling bitcasted alias functions. · cacbb237

Matt Arsenault authored Jul 30, 2013

It will now only convert the arguments / return value and call
the underlying function if the types are able to be bitcasted.
This avoids using fp<->int conversions that would occur before.

llvm-svn: 187444

cacbb237

Jul 29, 2013

SLPVectorier: update the debug location for the new instructions. · d9c74cc6
Nadav Rotem authored Jul 29, 2013
```
llvm-svn: 187363
```
d9c74cc6

Teach the AllocaPromoter which is wrapped around the SSAUpdater · cd7c8cdf

Chandler Carruth authored Jul 29, 2013

infrastructure to do promotion without a domtree the same smarts about
looking through GEPs, bitcasts, etc., that I just taught mem2reg about.
This way, if SROA chooses to promote an alloca which still has some
noisy instructions this code can cope with them.

I've not used as principled of an approach here for two reasons:
1) This code doesn't really need it as we were already set up to zip
   through the instructions used by the alloca.
2) I view the code here as more of a hack, and hopefully a temporary one.

The SSAUpdater path in SROA is a real sore point for me. It doesn't make
a lot of architectural sense for many reasons:
- We're likely to end up needing the domtree anyways in a subsequent
  pass, so why not compute it earlier and use it.
- In the future we'll likely end up needing the domtree for parts of the
  inliner itself.
- If we need to we could teach the inliner to preserve the domtree. Part
  of the re-work of the pass manager will allow this to be very powerful
  even in large SCCs with many functions.
- Ultimately, computing a domtree has gotten significantly faster since
  the original SSAUpdater-using code went into ScalarRepl. We no longer
  use domfrontiers, and much of domtree is lazily done based on queries
  rather than eagerly.
- At this point keeping the SSAUpdater-based promotion saves a total of
  0.7% on a build of the 'opt' tool for me. That's not a lot of
  performance given the complexity!

So I'm leaving this a bit ugly in the hope that eventually we just
remove all of this nonsense.

I can't even readily test this because this code isn't reachable except
through SROA. When I re-instate the patch that fast-tracks allocas
already suitable for promotion, I'll add a testcase there that failed
before this change. Before that, SROA will fix any test case I give it.

llvm-svn: 187347

cd7c8cdf

Don't vectorize when the attribute NoImplicitFloat is used. · 750e42cb
Nadav Rotem authored Jul 29, 2013
```
llvm-svn: 187340
```
750e42cb
Fix -Wdocumentation warnings. · caa776be
Rafael Espindola authored Jul 28, 2013
```
llvm-svn: 187336
```
caa776be

Update comments for SSAUpdater to use the modern doxygen comment · 6b55dbea

Chandler Carruth authored Jul 28, 2013

standards for LLVM. Remove duplicated comments on the interface from the
implementation file (implementation comments are left there of course).
Also clean up, re-word, and fix a few typos and errors in the commenst
spotted along the way.

This is in preparation for changes to these files and to keep the
uninteresting tidying in a separate commit.

llvm-svn: 187335

6b55dbea

Jul 28, 2013

Temporarily revert r187323 until I update SSAUpdater to match mem2reg. · d31370e0
Chandler Carruth authored Jul 28, 2013
```
I forgot that we had two totally independent things here. :: sigh ::

llvm-svn: 187327
```
d31370e0

Now that mem2reg understands how to cope with a slightly wider set of · 9d96100f

Chandler Carruth authored Jul 28, 2013

uses of an alloca, we can pre-compute promotability while analyzing an
alloca for splitting in SROA. That lets us short-circuit the common case
of a bunch of trivially promotable allocas. This cuts 20% to 30% off the
run time of SROA for typical frontend-generated IR sequneces I'm seeing.
It gets the new SROA to within 20% of ScalarRepl for such code. My
current benchmark for these numbers is PR15412, but it fits the general
pattern of IR emitted by Clang so it should be widely applicable.

llvm-svn: 187323

9d96100f

Thread DataLayout through the callers and into mem2reg. This will be · d5b806a2

Chandler Carruth authored Jul 28, 2013

useful in a subsequent patch, but causes an unfortunate amount of noise,
so I pulled it out into a separate patch.

llvm-svn: 187322

d5b806a2

Update the comment · 3e50c689
Nadav Rotem authored Jul 27, 2013
```
llvm-svn: 187316
```
3e50c689

Jul 27, 2013

Don't use all the #ifdefs to hide the stats counters and instead rely on · 8e3c4dc5

Chandler Carruth authored Jul 27, 2013

their being optimized out in debug mode. Realistically, this just isn't
going to be the slow part anyways. This also fixes unused variable
warnings that are breaking LLD build bots. =/ I didn't see these at
first, and kept losing track of the fact that they were broken.

llvm-svn: 187297

8e3c4dc5

Merge the removal of dead instructions and lifetime markers with the · e8f5812a

Chandler Carruth authored Jul 27, 2013

analysis of the alloca. We don't need to visit all the users twice for
this. We build up a kill list during the analysis and then just process
it afterward. This recovers the tiny bit of performance lost by moving
to the visitor based analysis system as it removes one entire use-list
walk from mem2reg. In some cases, this is now faster than mem2reg was
previously.

llvm-svn: 187296

e8f5812a

Reimplement isPotentiallyReachable to make nocapture deduction much stronger. · 0b68245e

Nick Lewycky authored Jul 27, 2013

Adds unit tests for it too.

Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.

llvm-svn: 187283

0b68245e

SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions · 8b1e021e

Tom Stellard authored Jul 27, 2013

Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

llvm-svn: 187278

8b1e021e

SLP Vectorier: Don't vectorize really short chains because they are already... · cfd40da9

Nadav Rotem authored Jul 26, 2013

SLP Vectorier:  Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

llvm-svn: 187267

cfd40da9

SLP Vectorizer: Disable the vectorization of non power of two chains, such as... · 9ce0f779

Nadav Rotem authored Jul 26, 2013

SLP Vectorizer: Disable the vectorization of non power of two chains, such as <3 x float>, because we dont have a good cost model for these types.

llvm-svn: 187265

9ce0f779

Fix variable name. · d6d4da09
Owen Anderson authored Jul 26, 2013
```
llvm-svn: 187253
```
d6d4da09

Jul 26, 2013

When InstCombine tries to fold away (fsub x, (fneg y)) into (fadd x, y), it is · e37c2e4d
Owen Anderson authored Jul 26, 2013
```
also worthwhile for it to look through FP extensions and truncations, whose
application commutes with fneg.

llvm-svn: 187249
```
e37c2e4d
Correct case of m_UIToFp to m_UIToFP to match instruction name, add m_SIToFP for consistency. · 4ef13872
Stephen Lin authored Jul 26, 2013
```
llvm-svn: 187225
```
4ef13872

Re-implement the analysis of uses in mem2reg to be significantly more · 9af38fc2

Chandler Carruth authored Jul 26, 2013

robust. It now uses an InstVisitor and worklist to actually walk the
uses of the Alloca transitively and detect the pattern which we can
directly promote: loads & stores of the whole alloca and instructions we
can completely ignore.

Also, with this new implementation teach both the predicate for testing
whether we can promote and the promotion engine itself to use the same
code so we no longer have strange divergence between the two code paths.

I've added some silly test cases to demonstrate that we can handle
slightly more degenerate code patterns now. See the below for why this
is even interesting.

Performance impact: roughly 1% regression in the performance of SROA or
ScalarRepl on a large C++-ish test case where most of the allocas are
basically ready for promotion. The reason is because of silly redundant
work that I've left FIXMEs for and which I'll address in the next
commit. I wanted to separate this commit as it changes the behavior.
Once the redundant work in removing the dead uses of the alloca is
fixed, this code appears to be faster than the old version. =]

So why is this useful? Because the previous requirement for promotion
required a *specific* visit pattern of the uses of the alloca to verify:
we *had* to look for no more than 1 intervening use. The end goal is to
have SROA automatically detect when an alloca is already promotable and
directly hand it to the mem2reg machinery rather than trying to
partition and rewrite it. This is a 25% or more performance improvement
for SROA, and a significant chunk of the delta between it and
ScalarRepl. To get there, we need to make mem2reg actually capable of
promoting allocas which *look* promotable to SROA without have SROA do
tons of work to massage the code into just the right form.

This is actually the tip of the iceberg. There are tremendous potential
savings we can realize here by de-duplicating work between mem2reg and
SROA.

llvm-svn: 187191

9af38fc2

[PowerPC] Support powerpc64le as a syntax-checking target. · 0a9170d9

Bill Schmidt authored Jul 26, 2013

This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code.  Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing.  Code generation will otherwise be the same as
powerpc64 (big-endian), for now.

The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.

The LLVM portions of this patch simply add ppc64le coverage everywhere
that ppc64 coverage currently exists.  There is nothing of any import
worth testing until such time as little-endian code generation is
implemented.  In the corresponding Clang patch, there is a new test
case variant to ensure that correct built-in defines for little-endian
code are generated.

llvm-svn: 187179

0a9170d9

Jul 25, 2013

Respect llvm.used in Internalize. · 17600e29

Rafael Espindola authored Jul 25, 2013

The language reference says that:

"If a symbol appears in the @llvm.used list, then the compiler,
assembler, and linker are required to treat the symbol as if there is
a reference to the symbol that it cannot see"

Since even the linker cannot see the reference, we must assume that
the reference can be using the symbol table. For example, a user can add
__attribute__((used)) to a debug helper function like dump and use it from
a debugger.

llvm-svn: 187103

17600e29

Check that TD isn't NULL before dereferencing it down this path. · 5b15037f
Nick Lewycky authored Jul 25, 2013
```
llvm-svn: 187099
```
5b15037f
Make these methods const correct. · ec2375fb
Rafael Espindola authored Jul 25, 2013
```
Thanks to Nick Lewycky for noticing it.

llvm-svn: 187098
```
ec2375fb

Jul 24, 2013

TRE: Move class into anonymous namespace. · 328da33d
Benjamin Kramer authored Jul 24, 2013
```
While there shrink a dangerously large SmallPtrSet.

llvm-svn: 187050
```
328da33d

Fix a problem I introduced in r187029 where we would over-eagerly · 58e25d39

Chandler Carruth authored Jul 24, 2013

schedule an alloca for another iteration in SROA. This only showed up
with a mixture of promotable and unpromotable selects and phis. Added
a test case for this.

llvm-svn: 187031

58e25d39

Fix PR16687 where we were incorrectly promoting an alloca that had · 83ea195d

Chandler Carruth authored Jul 24, 2013

pending speculation for a phi node. The problem here is that we were
using growth of the specluation set as an indicator of whether
speculation would occur, and if the phi node is already in the set we
don't see it grow. This is a symptom of the fact that this signal is
a total hack.

Unfortunately, I couldn't really come up with a non-hacky way of
signaling that promotion remains valid *after* speculation occurs, such
that we only speculate when all else looks good for promotion. In the
end, I went with at least a much more explicit approach of doing the
work of queuing inside the phi and select processing and setting
a preposterously named flag to convey that we're in the special state of
requiring speculating before promotion.

Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
a testcase for this from a pretty giant, nasty assert in a big
application. =] The testcase was excellent.

llvm-svn: 187029

83ea195d

Fix spelling · f64212b2
Matt Arsenault authored Jul 23, 2013
```
llvm-svn: 186997
```
f64212b2

Jul 23, 2013
- Remove extraneous null statement. No functionality change! · 6ab9d936
  Nick Lewycky authored Jul 22, 2013
```
llvm-svn: 186893
```
  6ab9d936
- Use switch instead of if. No functionality change. · d4d94065
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186892
```
  d4d94065
- Remove trailing spaces. · 8e1a6e7d
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186890
```
  8e1a6e7d
- When we vectorize across multiple basic blocks we may vectorize PHINodes that... · cf0dcdc7
  Nadav Rotem authored Jul 22, 2013
```
When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can.

llvm-svn: 186883
```
  cf0dcdc7
- OldPtr is llvm::Instruction. Remove unneeded cast<>. · cb132fac
  Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186880
```
  cb132fac
Jul 22, 2013

Change tabs to spaces. · 6b36db08
Jakub Staszak authored Jul 22, 2013
```
llvm-svn: 186877
```
6b36db08
Fix spelling and grammar · fb183238
Matt Arsenault authored Jul 22, 2013
```
llvm-svn: 186858
```
fb183238

Fix an obvious typo in the loop vectorizer where the cost model uses the wrong... · 8c45d4b2

Nadav Rotem authored Jul 22, 2013

Fix an obvious typo in the loop vectorizer where the cost model uses the wrong variable. The variable BlockCost is ignored.
We don't have tests for the effect of if-conversion loops because it requires a big test (that includes if-converted loops) and it is difficult to find and balance a loop to do the right thing.

llvm-svn: 186845

8c45d4b2

Delete unused helper functions. · d7ff88a8
Nadav Rotem authored Jul 22, 2013
```
llvm-svn: 186808
```
d7ff88a8