Commits · e6a6d9ae07540e99bed9abd28690579ef6116855 · Roger Ferrer / llvm-epi-0.8

Dec 11, 2012

Some enhancements for memcpy / memset inline expansion. · 79e2ca90

Evan Cheng authored Dec 10, 2012

1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791

79e2ca90

Optimistically analyse Phi cycles · edd62b14

Arnold Schwaighofer authored Dec 10, 2012

Analyse Phis under the starting assumption that they are NoAlias. Recursively
look at their inputs.
If they MayAlias/MustAlias there must be an input that makes them so.

Addresses bug 14351.

llvm-svn: 169788

edd62b14

Dec 10, 2012

Defer call to InitSections until after MCContext has been initialized. If · 517fc8b2

Lang Hames authored Dec 10, 2012

InitSections is called before the MCContext is initialized it could cause
duplicate temporary symbols to be emitted later (after context initialization
resets the temporary label counter).

llvm-svn: 169785

517fc8b2

Rearrange vars and make comments more obvious. · 0aa4a670
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169780
```
0aa4a670
Remove blank line at top of file. · 81d091ee
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169779
```
81d091ee
Fix a coding style nit. · 200dd760
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169776
```
200dd760
Enable the loop vectorizer only on O2 and above. (Still disabled by default) · 36cdd826
Nadav Rotem authored Dec 10, 2012
```
llvm-svn: 169774
```
36cdd826
LegalizeDAG: Allow type promotion of scalar loads · 30e2aa50
Tom Stellard authored Dec 10, 2012
```
llvm-svn: 169773
```
30e2aa50
LegalizeDAG: Allow type promotion for scalar stores · b785bd77
Tom Stellard authored Dec 10, 2012
```
llvm-svn: 169772
```
b785bd77
Split the LoopVectorizer into H and CPP. · 07df5ac1
Nadav Rotem authored Dec 10, 2012
```
llvm-svn: 169771
```
07df5ac1
Cleanup formatting, comments and naming. · 4c7296fd
Eli Bendersky authored Dec 10, 2012
```
llvm-svn: 169762
```
4c7296fd
[mips] Set HWEncoding field of registers. Use delete function · 5d6faed1
Akira Hatanaka authored Dec 10, 2012
```
getMipsRegisterNumbering and use MCRegisterInfo::getEncodingValue instead.

llvm-svn: 169760
```
5d6faed1
Use the somewhat semantic term "split dwarf" it more matches what's · cdf218d6
Eric Christopher authored Dec 10, 2012
```
going on and makes a lot of the terminology in comments make more sense.

llvm-svn: 169758
```
cdf218d6
Delete the FissionCU. · 8afd7b60
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169757
```
8afd7b60
Reorder fission variables. · d79f5480
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169756
```
d79f5480

Don't use a red zone for code coverage if the user specified `-mno-red-zone'. · 74f334e4

Bill Wendling authored Dec 10, 2012

The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>

llvm-svn: 169754

74f334e4

Add support for reverse induction variables. For example: · 7b5b55c1
Nadav Rotem authored Dec 10, 2012
```
while (i--)
 sum+=A[i];

llvm-svn: 169752
```
7b5b55c1

This patch adds statistics for other non-DWARF fragments emitted by · c01322ee

Eli Bendersky authored Dec 10, 2012

the assembler. This is useful in order to know how the numbers add up,
since in particular the Align fragments account for a non-trivial
portion of the emitted fragments (especially on -O0 which sets
relax-all).

llvm-svn: 169747

c01322ee

Use GetUnderlyingObjects in misched · 66859ae0

Hal Finkel authored Dec 10, 2012

misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.

This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.

Andy reviewed, tested and simplified this patch; Thanks!

llvm-svn: 169744

66859ae0

Revert "Make '-mtune=x86_64' assume fast unaligned memory accesses." · 867c7bff
Chandler Carruth authored Dec 10, 2012
```
Accidental commit... git svn betrayed me. Sorry for the noise.

llvm-svn: 169741
```
867c7bff

Make '-mtune=x86_64' assume fast unaligned memory accesses. · 7eaa45c7

Chandler Carruth authored Dec 10, 2012

Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.

Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D195

llvm-svn: 169740

7eaa45c7

Fix a typo in my previous commit -- bloomfield is 0x1A not 0x2A. · 17f25c4e
Chandler Carruth authored Dec 10, 2012
```
Thanks to the PaX folks for noticing in review! We need some tests here,
any sugestions welcome...

llvm-svn: 169739
```
17f25c4e

Address a FIXME and update the fast unaligned memory feature for newer · 0f585581

Chandler Carruth authored Dec 10, 2012

Intel chips.

The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.

No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.

llvm-svn: 169730

0f585581

Add a new visitor for walking the uses of a pointer value. · e41e7b79

Chandler Carruth authored Dec 10, 2012

This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.

However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
   tracking isn't needed.
3) It doesn't support non-instruction pointer values.

The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.

Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.

The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.

Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.

llvm-svn: 169728

e41e7b79

Teach DAG combine to handle vector add/sub with vectors of all 0s. · d8005db4
Craig Topper authored Dec 10, 2012
```
llvm-svn: 169727
```
d8005db4
[CMake] Update dependencies to intrinsics_gen corresponding to r169711. · 6b819c5f
NAKAMURA Takumi authored Dec 10, 2012
```
llvm-svn: 169724
```
6b819c5f

Fix PR14548: SROA was crashing on a mixture of i1 and i8 loads and stores. · e45f4658

Chandler Carruth authored Dec 10, 2012

When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.

In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.

The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.

llvm-svn: 169719

e45f4658

Dec 09, 2012
- Reorganize FastMathFlags to be a wrapper around unsigned, and streamline some interfaces. · 65f1435a
  Michael Ilseman authored Dec 09, 2012
```
llvm-svn: 169712
```
  65f1435a
- LoopVectorize: support vectorizing intrinsic calls · 2adb13c1
  Paul Redmond authored Dec 09, 2012
```
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.

Reviewed by: Nadav

llvm-svn: 169711
```
  2adb13c1
- Have the bitcode reader/writer just use FPMathOperator's fast math enum directly · 6d2ffa18
  Michael Ilseman authored Dec 09, 2012
```
llvm-svn: 169710
```
  6d2ffa18
- test commit. · f7cd6b39
  Paul Redmond authored Dec 09, 2012
```
llvm-svn: 169709
```
  f7cd6b39
- Use m_OneUse pattern instead of hasOneUse() method. · 8432185e
  Jakub Staszak authored Dec 09, 2012
```
No functionality change.

llvm-svn: 169703
```
  8432185e
- Remove trailing spaces. · 538e3861
  Jakub Staszak authored Dec 09, 2012
```
llvm-svn: 169701
```
  538e3861
- Switch SROA to pop Uses off the back of its visitors' queues. · 93ff2447
  Chandler Carruth authored Dec 09, 2012
```
This will more closely match the behavior of the new PtrUseVisitor that
I am adding. Hopefully this will not change the actual behavior in any
way, but by making the processing order more similar help in debugging.

llvm-svn: 169697
```
  93ff2447
- Remove extra blank line. · 5ea3bdd7
  Craig Topper authored Dec 09, 2012
```
llvm-svn: 169692
```
  5ea3bdd7
- - Re-enable population count loop idiom recognization · 95de7c37
  Shuxin Yang authored Dec 09, 2012
```
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
```
  95de7c37
Dec 08, 2012

Teach DAG combine to handle vector logical operations with vectors of all 1s... · a183ddb0

Craig Topper authored Dec 08, 2012

Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined.

llvm-svn: 169684

a183ddb0

Revert the patches adding a popcount loop idiom recognition pass. · 91e47532

Chandler Carruth authored Dec 08, 2012

There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683

91e47532

Simplify code. Sort includes. No functionality change. · f242d8c3
Benjamin Kramer authored Dec 08, 2012
```
llvm-svn: 169676
```
f242d8c3
Fix an inadvertent typo error. · 9c5c9764
Shuxin Yang authored Dec 08, 2012
```
llvm-svn: 169671
```
9c5c9764