Commits · 256e013dd7dcf28c5c53f918227e30c1798fdf59 · Roger Ferrer / llvm-epi-0.8

Dec 11, 2012

llvm/tools: Add #include "llvm/TargetTransformInfo.h" · 256e013d
NAKAMURA Takumi authored Dec 11, 2012
```
llvm-svn: 169817
```
256e013d
Use multiclass for new-value store instructions with MEMri operand. · 92e71918
Jyotsna Verma authored Dec 11, 2012
```
llvm-svn: 169814
```
92e71918
Fix PR14565. Don't if-convert loops that have switch statements in them. · dbb33281
Nadav Rotem authored Dec 11, 2012
```
llvm-svn: 169813
```
dbb33281
Change some functions to take const pointers. · 6ee19d28
Rafael Espindola authored Dec 11, 2012
```
llvm-svn: 169812
```
6ee19d28
Stylistic tweak. · c2bd620f
Evan Cheng authored Dec 11, 2012
```
llvm-svn: 169811
```
c2bd620f
Add a triple to this test. · d4c0c6cb
Chad Rosier authored Dec 11, 2012
```
llvm-svn: 169803
```
d4c0c6cb

Fix a miscompile in the DAG combiner. Previously, we would incorrectly · b27041c5

Chandler Carruth authored Dec 11, 2012

try to reduce the width of this load, and would end up transforming:

  (truncate (lshr (sextload i48 <ptr> as i64), 32) to i32)
to
  (truncate (zextload i32 <ptr+4> as i64) to i32)

We lost the sext attached to the load while building the narrower i32
load, and replaced it with a zext because lshr always zext's the
results. Instead, bail out of this combine when there is a conflict
between a sextload and a zext narrowing. The rest of the DAG combiner
still optimize the code down to the proper single instruction:

  movswl 6(...),%eax

Which is exactly what we wanted. Previously we read past the end *and*
missed the sign extension:

  movl 6(...), %eax

llvm-svn: 169802

b27041c5

move X86-specific test · c4550d49

Paul Redmond authored Dec 11, 2012

This test case uses -mcpu=corei7 so it belongs in CodeGen/X86

Reviewed by: Nadav

llvm-svn: 169801

c4550d49

Fix grammar-o. · ceb1577b
Bill Wendling authored Dec 11, 2012
```
llvm-svn: 169798
```
ceb1577b

Fall back to the selection dag isel to select tail calls. · df42cf39

Chad Rosier authored Dec 11, 2012

This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles.  Testing with the external/internal nightly
test suite reveals no change in compile time performance.  Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures.  All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.

In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally.  While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.

Part of rdar://12553082

llvm-svn: 169796

df42cf39

Refactor out the abbreviation handling into a separate class that · c8a310ed

Eric Christopher authored Dec 10, 2012

controls each of the abbreviation sets (only a single one at the
moment) and computes offsets separately as well for each set
of DIEs.

No real function change, ordering of abbreviations for the skeleton
CU changed but only because we're computing in a separate order. Fix
the testcase not to care.

llvm-svn: 169793

c8a310ed

Some enhancements for memcpy / memset inline expansion. · 79e2ca90

Evan Cheng authored Dec 10, 2012

1. Teach it to use overlapping unaligned load / store to copy / set the trailing
   bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
   x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
   if it's not possible to materialize an integer immediate with a single
   instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
   are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
   Also increase the threshold to something reasonable (8 for memset, 4 pairs
   for memcpy).

This significantly improves Dhrystone, up to 50% on ARM iOS devices.

rdar://12760078

llvm-svn: 169791

79e2ca90

Optimistically analyse Phi cycles · edd62b14

Arnold Schwaighofer authored Dec 10, 2012

Analyse Phis under the starting assumption that they are NoAlias. Recursively
look at their inputs.
If they MayAlias/MustAlias there must be an input that makes them so.

Addresses bug 14351.

llvm-svn: 169788

edd62b14

Dec 10, 2012

Defer call to InitSections until after MCContext has been initialized. If · 517fc8b2

Lang Hames authored Dec 10, 2012

InitSections is called before the MCContext is initialized it could cause
duplicate temporary symbols to be emitted later (after context initialization
resets the temporary label counter).

llvm-svn: 169785

517fc8b2

Fix PR14568: Avoid the DFA packetizer from making an invalid read · 3923e286

Anshuman Dasgupta authored Dec 10, 2012

beyond array bounds.

No test case since I cannot reproduce an ICE with this bug. According
to Carlos -- the bug reporter -- a segfault occurs only when LLVM is
compiled with a specific version of GCC.

llvm-svn: 169783

3923e286

Rearrange vars and make comments more obvious. · 0aa4a670
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169780
```
0aa4a670
Remove blank line at top of file. · 81d091ee
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169779
```
81d091ee
Fix a coding style nit. · 200dd760
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169776
```
200dd760
Enable the loop vectorizer only on O2 and above. (Still disabled by default) · 36cdd826
Nadav Rotem authored Dec 10, 2012
```
llvm-svn: 169774
```
36cdd826
LegalizeDAG: Allow type promotion of scalar loads · 30e2aa50
Tom Stellard authored Dec 10, 2012
```
llvm-svn: 169773
```
30e2aa50
LegalizeDAG: Allow type promotion for scalar stores · b785bd77
Tom Stellard authored Dec 10, 2012
```
llvm-svn: 169772
```
b785bd77
Split the LoopVectorizer into H and CPP. · 07df5ac1
Nadav Rotem authored Dec 10, 2012
```
llvm-svn: 169771
```
07df5ac1

Revert r169656. · 4a8fc8f2

Bill Wendling authored Dec 10, 2012

The linker will call `lto_codegen_add_must_preserve_symbol' on all globals that
should be kept around. The linker will pretend that a dylib is being created.
<rdar://problem/12528059>

llvm-svn: 169770

4a8fc8f2

Add a test for explicitly exercising the mc-relax-all flag. · 7352e4ff
Eli Bendersky authored Dec 10, 2012
```
llvm-svn: 169764
```
7352e4ff
Cleanup formatting, comments and naming. · 4c7296fd
Eli Bendersky authored Dec 10, 2012
```
llvm-svn: 169762
```
4c7296fd
[mips] Set HWEncoding field of registers. Use delete function · 5d6faed1
Akira Hatanaka authored Dec 10, 2012
```
getMipsRegisterNumbering and use MCRegisterInfo::getEncodingValue instead.

llvm-svn: 169760
```
5d6faed1
Use the somewhat semantic term "split dwarf" it more matches what's · cdf218d6
Eric Christopher authored Dec 10, 2012
```
going on and makes a lot of the terminology in comments make more sense.

llvm-svn: 169758
```
cdf218d6
Delete the FissionCU. · 8afd7b60
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169757
```
8afd7b60
Reorder fission variables. · d79f5480
Eric Christopher authored Dec 10, 2012
```
llvm-svn: 169756
```
d79f5480

Don't use a red zone for code coverage if the user specified `-mno-red-zone'. · 74f334e4

Bill Wendling authored Dec 10, 2012

The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>

llvm-svn: 169754

74f334e4

Add support for reverse induction variables. For example: · 7b5b55c1
Nadav Rotem authored Dec 10, 2012
```
while (i--)
 sum+=A[i];

llvm-svn: 169752
```
7b5b55c1

CMake: Don't run 'git svn' if there is no .git/svn directory. · f7cbcf26

Jim Grosbach authored Dec 10, 2012

If the local checkout does not have 'git svn' references set up, don't try
to use 'git svn' for version information.

llvm-svn: 169749

f7cbcf26

This patch adds statistics for other non-DWARF fragments emitted by · c01322ee

Eli Bendersky authored Dec 10, 2012

the assembler. This is useful in order to know how the numbers add up,
since in particular the Align fragments account for a non-trivial
portion of the emitted fragments (especially on -O0 which sets
relax-all).

llvm-svn: 169747

c01322ee

Use GetUnderlyingObjects in misched · 66859ae0

Hal Finkel authored Dec 10, 2012

misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.

This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.

Andy reviewed, tested and simplified this patch; Thanks!

llvm-svn: 169744

66859ae0

Fix funky copy-pasted grammatical error. · aab278fb
Sean Silva authored Dec 10, 2012
```
PR14343

llvm-svn: 169742
```
aab278fb
Revert "Make '-mtune=x86_64' assume fast unaligned memory accesses." · 867c7bff
Chandler Carruth authored Dec 10, 2012
```
Accidental commit... git svn betrayed me. Sorry for the noise.

llvm-svn: 169741
```
867c7bff

Make '-mtune=x86_64' assume fast unaligned memory accesses. · 7eaa45c7

Chandler Carruth authored Dec 10, 2012

Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.

Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D195

llvm-svn: 169740

7eaa45c7

Fix a typo in my previous commit -- bloomfield is 0x1A not 0x2A. · 17f25c4e
Chandler Carruth authored Dec 10, 2012
```
Thanks to the PaX folks for noticing in review! We need some tests here,
any sugestions welcome...

llvm-svn: 169739
```
17f25c4e

Address a FIXME and update the fast unaligned memory feature for newer · 0f585581

Chandler Carruth authored Dec 10, 2012

Intel chips.

The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.

No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.

llvm-svn: 169730

0f585581

Add a new visitor for walking the uses of a pointer value. · e41e7b79

Chandler Carruth authored Dec 10, 2012

This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.

However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
   tracking isn't needed.
3) It doesn't support non-instruction pointer values.

The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.

Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.

The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.

Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.

llvm-svn: 169728

e41e7b79