Commits · f989929cf0c0f0c59fcdaaefc9ba2ebf552a5898 · Roger Ferrer / llvm-epi-0.8

Jun 24, 2013

[APFloat] Added support for parsing float strings which contain {inf,-inf,NaN,-NaN}. · 40e8a187
Michael Gottesman authored Jun 24, 2013
```
llvm-svn: 184713
```
40e8a187
[APFloat] Added make{Zero,Inf} methods and implemented get{Zero,Inf} on top of them. · c4facdf3
Michael Gottesman authored Jun 24, 2013
```
llvm-svn: 184712
```
c4facdf3

[APFloat] Removed a assert from significandParts() which says that one can... · f0e8cd1a

Michael Gottesman authored Jun 24, 2013

[APFloat] Removed a assert from significandParts() which says that one can only access the significand of FiniteNonZero/NaN floats.

The method significandParts() is a helper method meant to ease access to
APFloat's significand by allowing the user to not need to be aware of whether or
not the APFloat is using memory allocated in the instance itself or in an
external array.

This assert says that one can only access the significand of FiniteNonZero/NaN
floats. This makes it cumbersome and more importantly dangerous when one wishes
to zero out the significand of a zero/infinity value since one will have to deal
with the aforementioned quandary related to how the memory in APFloat is
allocated.

llvm-svn: 184711

f0e8cd1a

[APFloat] Rename macro convolve => PackCategoriesIntoKey so that it is clear... · 9b877e18

Michael Gottesman authored Jun 24, 2013

[APFloat] Rename macro convolve => PackCategoriesIntoKey so that it is clear what APFloat is actually using said macro for.

In the context of APFloat, seeing a macro called convolve suggests that APFloat
is using said value in some sort of convolution somewhere in the source code.
This is misleading.

I also added a documentation comment to the macro.

llvm-svn: 184710

9b877e18

ARM: check predicate bits for thumb instructions · 8449c0d5

Amaury de la Vieuville authored Jun 24, 2013

When encoded to thumb, VFP instruction and VMOV/VDUP between scalar and
core registers, must have their predicate bit to 0b1110.

llvm-svn: 184707

8449c0d5

ARM: rGPR is meant to be unpredictable, not undefined · 8175bda3
Amaury de la Vieuville authored Jun 24, 2013
```
llvm-svn: 184706
```
8175bda3

Temporarily enable MI-Sched on X86. · 5a1e0af8

Andrew Trick authored Jun 24, 2013

Sorry for the unit test churn. I'll try to make the change permanently
next time.

llvm-svn: 184705

5a1e0af8

ARM: fix thumb1 nop decoding · f2f00b4e

Amaury de la Vieuville authored Jun 24, 2013

In thumb1, NOP is a pseudo-instruction equivalent to mov r8, r8.
However the disassembler should not use this alias.

llvm-svn: 184703

f2f00b4e

ARM: fix IT decoding · 2f0ac8d9
Amaury de la Vieuville authored Jun 24, 2013
```
mask == 0 -> UNPRED

llvm-svn: 184702
```
2f0ac8d9
ARM: enable decoding of pc-relative PLD/PLI · 4b6c076d
Amaury de la Vieuville authored Jun 24, 2013
```
llvm-svn: 184701
```
4b6c076d

Add a flag to defer vectorization into a phase after the inliner and its · 08e1b874

Chandler Carruth authored Jun 24, 2013

CGSCC pass manager. This should insulate the inlining decisions from the
vectorization decisions, however it may have both compile time and code
size problems so it is just an experimental option right now.

Adding this based on a discussion with Arnold and it seems at least
worth having this flag for us to both run some experiments to see if
this strategy is workable. It may solve some of the regressions seen
with the loop vectorizer.

llvm-svn: 184698

08e1b874

Revert "LoopVectorize: Use the dependence test utility class" · 58ca945f

Arnold Schwaighofer authored Jun 24, 2013

This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

llvm-svn: 184690

58ca945f

[APFloat] Rename llvm::exponent_t => llvm::APFloat::ExponentType. · 9dc98338

Michael Gottesman authored Jun 24, 2013

exponent_t is only used internally in APFloat and no exponent_t values are
exposed via the APFloat API. In light of such conditions it does not make any
sense to gum up the llvm namespace with said type. Plus it makes it clearer that
exponent_t is associated with APFloat.

llvm-svn: 184686

9dc98338

LoopVectorize: Use the dependence test utility class · b914a7e2

Arnold Schwaighofer authored Jun 24, 2013

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

llvm-svn: 184685

b914a7e2

LoopVectorize: Add utility class for checking dependency among accesses · d5179767

Arnold Schwaighofer authored Jun 24, 2013

This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.

The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.

llvm-svn: 184684

d5179767

LoopVectorize: Add utility class for building sets of dependent accesses · d5741969

Arnold Schwaighofer authored Jun 24, 2013

Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.

llvm-svn: 184683

d5741969

SLP Vectorizer: Add support for vectorizing parts of the tree. · 210e86d7

Nadav Rotem authored Jun 24, 2013

Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree.  With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.

Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.

llvm-svn: 184681

210e86d7

Fix tail merging to assign the (more) correct BasicBlock when splitting. · 97a1d7c4

Andrew Trick authored Jun 24, 2013

This makes it possible to write unit tests that are less susceptible
to minor code motion, particularly copy placement. block-placement.ll
covers this case with -pre-RA-sched=source which will soon be
default. One incorrectly named block is already fixed, but without
this fix, enabling new coalescing and scheduling would cause more
failures.

llvm-svn: 184680

97a1d7c4

Jun 23, 2013
- SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences. · 0323925d
  Nadav Rotem authored Jun 23, 2013
```
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.

llvm-svn: 184674
```
  0323925d
- SLP Vectorizer: Erase instructions outside the vectorizeTree method. · 78428401
  Nadav Rotem authored Jun 23, 2013
```
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.

llvm-svn: 184671
```
  78428401
- DebugInfo: PR14404: Avoid truncating 64 bit values into 32 bits for ULEB128/SLEB128 generation · 5acff7e6
  David Blaikie authored Jun 23, 2013
```
llvm-svn: 184669
```
  5acff7e6
- Add MI-Sched support for x86 macro fusion. · 47740deb
  Andrew Trick authored Jun 23, 2013
```
This is an awful implementation of the target hook. But we don't have
abstractions yet for common machine ops, and I don't see any quick way
to make it table-driven.

llvm-svn: 184664
```
  47740deb
- SLP Vectorizer: Implement a simple CSE optimization for the gather sequences. · eb65e67e
  Nadav Rotem authored Jun 23, 2013
```
llvm-svn: 184660
```
  eb65e67e
Jun 22, 2013

SLP Vectorizer: Implement multi-block slp-vectorization. · 80de0a28

Nadav Rotem authored Jun 22, 2013

Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).

llvm-svn: 184647

80de0a28

DebugInfo: Support (using GNU extensions) for template template parameters and parameter packs · 2b380232
David Blaikie authored Jun 22, 2013
```
llvm-svn: 184643
```
2b380232
The getRegForInlineAsmConstraint function should only accept MVT value types. · 295bd43a
Chad Rosier authored Jun 22, 2013
```
llvm-svn: 184642
```
295bd43a
Revert "FunctionAttrs: Merge attributes once instead of doing it for every argument." · 40d7f354
Benjamin Kramer authored Jun 22, 2013
```
It doesn't work as I intended it to.  This reverts commit r184638.

llvm-svn: 184641
```
40d7f354
FunctionAttrs: Merge attributes once instead of doing it for every argument. · 76b7bd0e
Benjamin Kramer authored Jun 22, 2013
```
It has become an expensive operation. No functionality change.

llvm-svn: 184638
```
76b7bd0e

[yaml2obj][ELF] Make symbol table top-level key. · 82177573

Sean Silva authored Jun 22, 2013

Although in reality the symbol table in ELF resides in a section, the
standard requires that there be no more than one SHT_SYMTAB. To enforce
this constraint, it is cleaner to group all the symbols under a
top-level `Symbols` key on the object file.

llvm-svn: 184627

82177573

Prevent LiveRangeEdit from deleting bundled instructions. · cbd7305d

Andrew Trick authored Jun 22, 2013

We have no targets on trunk that bundle before regalloc. However, we
have been advertising regalloc as bundle safe for use with out-of-tree
targets. We need to at least contain the parts of the code that are
still unsafe.

llvm-svn: 184620

cbd7305d

DebugInfo: Don't lose unreferenced non-trivial by-value parameters · 97c6c5bd

David Blaikie authored Jun 21, 2013

A FastISel optimization was causing us to emit no information for such
parameters & when they go missing we end up emitting a different
function type. By avoiding that shortcut we not only get types correct
(very important) but also location information (handy) - even if it's
only live at the start of a function & may be clobbered later.

Reviewed/discussion by Evan Cheng & Dan Gohman.

llvm-svn: 184604

97c6c5bd

Jun 21, 2013

[objc-arc-opts] Make IsTrackingImpreciseReleases a const method. · 9799cf7f
Michael Gottesman authored Jun 21, 2013
```
Thanks to Bill Wendling for pointing this out!

llvm-svn: 184593
```
9799cf7f

Improve the time it takes to generating dwarf for assembly source files · 0fd064c1

Kevin Enderby authored Jun 21, 2013

that have been run through the 'C' pre-processor.

The implementation of SrcMgr.FindLineNumber() is slow but OK if
it uses its cache when called multiple times with an SMLoc that is
forward of the previous call.

In the case of generating dwarf for assembly source files that have
been run through the 'C' pre-processor we need to calculate the
logical line number based on the last parsed cpp hash file line
comment.  And the current code calls SrcMgr.FindLineNumber()
twice to do this causing its cache not to work and results in very
slow compile times:

% time /Volumes/SandBox/build-llvm/Debug+Asserts/bin/llvm-mc -triple thumbv7-apple-ios -filetype=obj -o /tmp/x.o mscorlib.dll.E -g
672.542u 0.299s 11:13.15 99.9%	0+0k 0+2io 2106pf+0w

So we save the info from the last parsed cpp hash file line comment
to avoid making the second call to SrcMgr.FindLineNumber() most times
and end up with compile times like:

% time /Volumes/SandBox/build-llvm/Debug+Asserts/bin/llvm-mc -triple thumbv7-apple-ios -filetype=obj -o /tmp/x.o mscorlib.dll.E -g
3.404u 0.104s 0:03.80 92.1%	0+0k 0+3io 2105pf+0w

rdar://14156934

llvm-svn: 184592

0fd064c1

Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a... · bfb84d0b

Benjamin Kramer authored Jun 21, 2013

Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability."

This reverts commit r184584. Breaks PPC selfhost.

llvm-svn: 184590

bfb84d0b

[objc-arc-opts] Now that PtrState.RRI is encapsulated in PtrState, make... · e3943d05

Michael Gottesman authored Jun 21, 2013

[objc-arc-opts] Now that PtrState.RRI is encapsulated in PtrState, make PtrState.RRI private and delete the TODO.

llvm-svn: 184587

e3943d05

[objc-arc-opts] Encapsulated PtrState.RRI.{Calls,ReverseInsertPts} into... · 4f6ef117
Michael Gottesman authored Jun 21, 2013
```
[objc-arc-opts] Encapsulated PtrState.RRI.{Calls,ReverseInsertPts} into several methods on PtrState.

llvm-svn: 184586
```
4f6ef117

BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability. · bd0f1079

Benjamin Kramer authored Jun 21, 2013

Zero is used by BlockFrequencyInfo as a special "don't know" value. It also
causes a sink for frequencies as you can't ever get off a zero frequency with
more multiplies.

This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency
was propagated into an inner loop causing excessive spilling.

PR16402.

llvm-svn: 184584

bd0f1079

[objcarcopts] Encapsulated PtrState.RRI.IsTrackingImpreciseRelease() =>... · f0401181
Michael Gottesman authored Jun 21, 2013
```
[objcarcopts] Encapsulated PtrState.RRI.IsTrackingImpreciseRelease() => PtrState.IsTrackingImpreciseRelease().

llvm-svn: 184583
```
f0401181

[objcarcopts] Encapsulate PtrState.RRI.CFGHazardAfflicted via methods... · 2f294597

Michael Gottesman authored Jun 21, 2013

[objcarcopts] Encapsulate PtrState.RRI.CFGHazardAfflicted via methods PtrState.{IsCFGHazardAfflicted,SetCFGHazardAfflicted}.

llvm-svn: 184582

2f294597

[NVPTX] Add support for selecting CUDA vs OCL mode based on triple · b6e6cd35
Justin Holewinski authored Jun 21, 2013
```
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"

llvm-svn: 184579
```
b6e6cd35