Commits · 99c46b980f006737ebe7d6965ffd87e8e40b35ea · Roger Ferrer / llvm-epi-0.8

Jun 24, 2013

Revert "LoopVectorize: Use the dependence test utility class" · 58ca945f

Arnold Schwaighofer authored Jun 24, 2013

This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

llvm-svn: 184690

58ca945f

[APFloat] Rename llvm::exponent_t => llvm::APFloat::ExponentType. · 9dc98338

Michael Gottesman authored Jun 24, 2013

exponent_t is only used internally in APFloat and no exponent_t values are
exposed via the APFloat API. In light of such conditions it does not make any
sense to gum up the llvm namespace with said type. Plus it makes it clearer that
exponent_t is associated with APFloat.

llvm-svn: 184686

9dc98338

LoopVectorize: Use the dependence test utility class · b914a7e2

Arnold Schwaighofer authored Jun 24, 2013

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

llvm-svn: 184685

b914a7e2

LoopVectorize: Add utility class for checking dependency among accesses · d5179767

Arnold Schwaighofer authored Jun 24, 2013

This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.

The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.

llvm-svn: 184684

d5179767

LoopVectorize: Add utility class for building sets of dependent accesses · d5741969

Arnold Schwaighofer authored Jun 24, 2013

Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.

llvm-svn: 184683

d5741969

SLP Vectorizer: Add support for vectorizing parts of the tree. · 210e86d7

Nadav Rotem authored Jun 24, 2013

Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree.  With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.

Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.

llvm-svn: 184681

210e86d7

Fix tail merging to assign the (more) correct BasicBlock when splitting. · 97a1d7c4

Andrew Trick authored Jun 24, 2013

This makes it possible to write unit tests that are less susceptible
to minor code motion, particularly copy placement. block-placement.ll
covers this case with -pre-RA-sched=source which will soon be
default. One incorrectly named block is already fixed, but without
this fix, enabling new coalescing and scheduling would cause more
failures.

llvm-svn: 184680

97a1d7c4

Jun 23, 2013
- SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. · 0323925d
  Nadav Rotem authored Jun 23, 2013
```
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.

llvm-svn: 184674
```
  0323925d
- SLP Vectorizer: Erase instructions outside the vectorizeTree method. · 78428401
  Nadav Rotem authored Jun 23, 2013
```
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.

llvm-svn: 184671
```
  78428401
- DebugInfo: PR14404: Avoid truncating 64 bit values into 32 bits for ULEB128/SLEB128 generation · 5acff7e6
  David Blaikie authored Jun 23, 2013
```
llvm-svn: 184669
```
  5acff7e6
- Add MI-Sched support for x86 macro fusion. · 47740deb
  Andrew Trick authored Jun 23, 2013
```
This is an awful implementation of the target hook. But we don't have
abstractions yet for common machine ops, and I don't see any quick way
to make it table-driven.

llvm-svn: 184664
```
  47740deb
- SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. · eb65e67e
  Nadav Rotem authored Jun 23, 2013
```
llvm-svn: 184660
```
  eb65e67e
Jun 22, 2013

SLP Vectorizer: Implement multi-block slp-vectorization. · 80de0a28

Nadav Rotem authored Jun 22, 2013

Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).

llvm-svn: 184647

80de0a28

DebugInfo: Support (using GNU extensions) for template template parameters and parameter packs · 2b380232
David Blaikie authored Jun 22, 2013
```
llvm-svn: 184643
```
2b380232
The getRegForInlineAsmConstraint function should only accept MVT value types. · 295bd43a
Chad Rosier authored Jun 22, 2013
```
llvm-svn: 184642
```
295bd43a
Revert "FunctionAttrs: Merge attributes once instead of doing it for every argument." · 40d7f354
Benjamin Kramer authored Jun 22, 2013
```
It doesn't work as I intended it to.  This reverts commit r184638.

llvm-svn: 184641
```
40d7f354
FunctionAttrs: Merge attributes once instead of doing it for every argument. · 76b7bd0e
Benjamin Kramer authored Jun 22, 2013
```
It has become an expensive operation. No functionality change.

llvm-svn: 184638
```
76b7bd0e

[yaml2obj][ELF] Make symbol table top-level key. · 82177573

Sean Silva authored Jun 22, 2013

Although in reality the symbol table in ELF resides in a section, the
standard requires that there be no more than one SHT_SYMTAB. To enforce
this constraint, it is cleaner to group all the symbols under a
top-level `Symbols` key on the object file.

llvm-svn: 184627

82177573

Prevent LiveRangeEdit from deleting bundled instructions. · cbd7305d

Andrew Trick authored Jun 22, 2013

We have no targets on trunk that bundle before regalloc. However, we
have been advertising regalloc as bundle safe for use with out-of-tree
targets. We need to at least contain the parts of the code that are
still unsafe.

llvm-svn: 184620

cbd7305d

DebugInfo: Don't lose unreferenced non-trivial by-value parameters · 97c6c5bd

David Blaikie authored Jun 21, 2013

A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.

Reviewed/discussion by Evan Cheng & Dan Gohman.

llvm-svn: 184604

97c6c5bd

Jun 21, 2013

[objc-arc-opts] Make IsTrackingImpreciseReleases a const method. · 9799cf7f
Michael Gottesman authored Jun 21, 2013
```
Thanks to Bill Wendling for pointing this out!

llvm-svn: 184593
```
9799cf7f

Improve the time it takes to generating dwarf for assembly source files · 0fd064c1

Kevin Enderby authored Jun 21, 2013

that have been run through the 'C' pre-processor.

The implementation of SrcMgr.FindLineNumber() is slow but OK if
it uses its cache when called multiple times with an SMLoc that is
forward of the previous call.

In the case of generating dwarf for assembly source files that have
been run through the 'C' pre-processor we need to calculate the
logical line number based on the last parsed cpp hash file line
comment.  And the current code calls SrcMgr.FindLineNumber()
twice to do this causing its cache not to work and results in very
slow compile times:

% time /Volumes/SandBox/build-llvm/Debug+Asserts/bin/llvm-mc -triple thumbv7-apple-ios -filetype=obj -o /tmp/x.o mscorlib.dll.E -g
672.542u 0.299s 11:13.15 99.9%	0+0k 0+2io 2106pf+0w

So we save the info from the last parsed cpp hash file line comment
to avoid making the second call to SrcMgr.FindLineNumber() most times
and end up with compile times like:

% time /Volumes/SandBox/build-llvm/Debug+Asserts/bin/llvm-mc -triple thumbv7-apple-ios -filetype=obj -o /tmp/x.o mscorlib.dll.E -g
3.404u 0.104s 0:03.80 92.1%	0+0k 0+3io 2105pf+0w

rdar://14156934

llvm-svn: 184592

0fd064c1

Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a... · bfb84d0b

Benjamin Kramer authored Jun 21, 2013

Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability."

This reverts commit r184584. Breaks PPC selfhost.

llvm-svn: 184590

bfb84d0b

[objc-arc-opts] Now that PtrState.RRI is encapsulated in PtrState, make... · e3943d05

Michael Gottesman authored Jun 21, 2013

[objc-arc-opts] Now that PtrState.RRI is encapsulated in PtrState, make PtrState.RRI private and delete the TODO.

llvm-svn: 184587

e3943d05

[objc-arc-opts] Encapsulated PtrState.RRI.{Calls,ReverseInsertPts} into... · 4f6ef117
Michael Gottesman authored Jun 21, 2013
```
[objc-arc-opts] Encapsulated PtrState.RRI.{Calls,ReverseInsertPts} into several methods on PtrState.

llvm-svn: 184586
```
4f6ef117

BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability. · bd0f1079

Benjamin Kramer authored Jun 21, 2013

Zero is used by BlockFrequencyInfo as a special "don't know" value. It also
causes a sink for frequencies as you can't ever get off a zero frequency with
more multiplies.

This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency
was propagated into an inner loop causing excessive spilling.

PR16402.

llvm-svn: 184584

bd0f1079

[objcarcopts] Encapsulated PtrState.RRI.IsTrackingImpreciseRelease() =>... · f0401181
Michael Gottesman authored Jun 21, 2013
```
[objcarcopts] Encapsulated PtrState.RRI.IsTrackingImpreciseRelease() => PtrState.IsTrackingImpreciseRelease().

llvm-svn: 184583
```
f0401181

[objcarcopts] Encapsulate PtrState.RRI.CFGHazardAfflicted via methods... · 2f294597

Michael Gottesman authored Jun 21, 2013

[objcarcopts] Encapsulate PtrState.RRI.CFGHazardAfflicted via methods PtrState.{IsCFGHazardAfflicted,SetCFGHazardAfflicted}.

llvm-svn: 184582

2f294597

[NVPTX] Add support for selecting CUDA vs OCL mode based on triple · b6e6cd35
Justin Holewinski authored Jun 21, 2013
```
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"

llvm-svn: 184579
```
b6e6cd35

Fix PR16360 · 62ebfd87

Michael Liao authored Jun 21, 2013

When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the
high bits are not cleared. Add 'and' to clear off them.

llvm-svn: 184575

62ebfd87

Update physreg live intervals during remat. · 5749b8be
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184574
```
5749b8be
Added -precompute-phys-liveness for testing LiveIntervals updates. · 8d02e917
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184573
```
8d02e917

Handle more cases in LiveRangeEdit::eliminateDeadDefs. · 6b9c49a2

Andrew Trick authored Jun 21, 2013

Live intervals for dead physregs may be created during coalescing. We
need to update these in the event that their instruction goes away.

crash.ll is the unit test that catches it when MI sched is enabled on
X86.

llvm-svn: 184572

6b9c49a2

Refactor LiveRangeEdit::eliminateDeadDefs. · 530fc1f4
Andrew Trick authored Jun 21, 2013
```
I want to add logic to handle more cases.

llvm-svn: 184571
```
530fc1f4
whitespace · 7df3f017
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184570
```
7df3f017
Fix a -join-globalcopies bug; handle undef operands. · 714aec02
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184569
```
714aec02

Modify the -join-globalcopies option (off by default). · 75961ecc

Andrew Trick authored Jun 21, 2013

Always coalesce in forward order to propagate rematerialization.
I'm fixing this option so I can enable it by default soon.

llvm-svn: 184568

75961ecc

Make rematerialization in the coalescer less sensitive to LRG order. · 3a851a27
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184567
```
3a851a27
Fix IMULX machine model. Multiple def operands require multiple SchedWrites. · 7201f4f7
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184566
```
7201f4f7
MI-Sched: cleanup DEBUG output. · b55db58e
Andrew Trick authored Jun 21, 2013
```
llvm-svn: 184565
```
b55db58e