Commits · 8a49d049e17b1e0916aa4c1b662444b4b6da3405 · Roger Ferrer / llvm-epi-0.8

Apr 09, 2012

Add a hook to turn on the internalize pass through the LTO interface. · 8a49d049
Bill Wendling authored Apr 09, 2012
```
llvm-svn: 154306
```
8a49d049
Replace some explicit checks with asserts for conditions that should never happen. · 5894fe43
Craig Topper authored Apr 09, 2012
```
llvm-svn: 154305
```
5894fe43

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Fold 15 tiny test cases into a single file that implements the · 84b83426

Chandler Carruth authored Apr 09, 2012

comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303

84b83426

Remove definedAtomsBegin() and co. so that C++11 range based for loops can be used · 467209b1
Nick Kledzik authored Apr 09, 2012
```
llvm-svn: 154302
```
467209b1
Rename referencesBegin() to begin() so that C++11 range based for loops can be used · 062a98cf
Nick Kledzik authored Apr 08, 2012
```
llvm-svn: 154301
```
062a98cf
Optimize code a bit. No functional change intended. · 6148fe65
Craig Topper authored Apr 08, 2012
```
llvm-svn: 154299
```
6148fe65

Apr 08, 2012

Wire up -fpie and -fPIE to LLVM's newly added TargetOptions. No test · 097d019c

Chandler Carruth authored Apr 08, 2012

case as we don't currently have any way of dumping target options or
otherwise observing this. Another small step toward fixing PR12380. With
this we generate TLS accesses using the static model instead of the
dynamic model, but we're still generating suboptimal code under the
mistaken assumption that the TLS offset might be greater than 2^32, and
therefor not viable as an immediate offset of a segment register.

llvm-svn: 154298

097d019c

Silence sign-compare warning. · bb6ff087
Benjamin Kramer authored Apr 08, 2012
```
llvm-svn: 154297
```
bb6ff087

Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381

Duncan Sands authored Apr 08, 2012

when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296

2f1dc381

Simplify code that tries to do vector extracts for shuffles when the mask... · c8e2d91a

Craig Topper authored Apr 08, 2012

Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently.

llvm-svn: 154295

c8e2d91a

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

Move the TLSModel information into the TargetMachine rather than hiding · 16f0ebcb

Chandler Carruth authored Apr 08, 2012

in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.

llvm-svn: 154292

16f0ebcb

Teach Clang about PIE compilations. This is the first step of PR12380. · c0c0455f

Chandler Carruth authored Apr 08, 2012

First, this patch cleans up the parsing of the PIC and PIE family of
options in the driver. The existing logic failed to claim arguments all
over the place resulting in kludges that marked the options as unused.
Instead actually walk all of the arguments and claim them properly.

We now treat -f{,no-}{pic,PIC,pie,PIE} as a single set, accepting the
last one on the commandline. Previously there were lots of ordering bugs
that could creep in due to the nature of the parsing. Let me know if
folks would like weird things such as "-fPIE -fno-pic" to turn on PIE,
but disable full PIC. This doesn't make any sense to me, but we could in
theory support it.

Options that seem to have intentional "trump" status (-static, -mkernel,
etc) continue to do so and are commented as such.

Next, a -pie-level flag is threaded into the frontend, rigged to
a language option, and handled preprocessor, setting up the appropriate
defines. We'll now have the correct defines when compiling with -fpie.

The one place outside of the preprocessor that was inspecting the PIC
level (as opposed to the relocation model, which is set and handled
separately, yay!) is in the GNU ObjC runtime. I changed it to exactly
preserve existing behavior. If folks want to change its behavior in the
face of PIE, they can do that in a separate patch.

Essentially the only functionality changed here is the preprocessor
defines and bug-fixes to the argument management.

Tests have been updated and extended to test all of this a bit more
thoroughly.

llvm-svn: 154291

c0c0455f

Rephrase the preprocessor test to directly use CC1 and not bother · 4e973379

Chandler Carruth authored Apr 08, 2012

testing any of the strange driver behavior. We already have some tiny
tests for the driver behavior, and I'm going to expand them greatly in
the next commit.

llvm-svn: 154290

4e973379

FileCheck-ize this test. · f4725b46
Chandler Carruth authored Apr 08, 2012
```
llvm-svn: 154289
```
f4725b46

EngineBuilder::create is expected to take ownership of the TargetMachine... · 25a3d816

Benjamin Kramer authored Apr 08, 2012

EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it.

llvm-svn: 154288

25a3d816

Remove an over zealous assert. The assert was trying to catch places · bed1abf9

Chandler Carruth authored Apr 08, 2012

where a chain outside of the loop block-set ended up in the worklist for
scheduling as part of the contiguous loop. However, asserting the first
block in the chain is in the loop-set isn't a valid check -- we may be
forced to drag a chain into the worklist due to one block in the chain
being part of the loop even though the first block is *not* in the loop.
This occurs when we have been forced to form a chain early due to
un-analyzable branches.

No test case here as I have no idea how to even begin reducing one, and
it will be hopelessly fragile. We have to somehow end up with a loop
header of an inner loop which is a successor of a basic block with an
unanalyzable pair of branch instructions. Ow. Self-host triggers it so
it is unlikely it will regress.

This at least gets block placement back to passing selfhost and the test
suite. There are still a lot of slowdown that I don't like coming out of
block placement, although there are now also a lot of speedups. =[ I'm
seeing swings in both directions up to 10%. I'm going to try to find
time to dig into this and see if we can turn this on for 3.1 as it does
a really good job of cleaning up after some loops that degraded with the
inliner changes.

llvm-svn: 154287

bed1abf9

Add a debug-only 'dump' method to the BlockChain structure to ease · 49158908
Chandler Carruth authored Apr 08, 2012
```
debugging.

llvm-svn: 154286
```
49158908

Teach InstCombine to nuke a common alloca pattern -- an alloca which has · f82b0e2d

Chandler Carruth authored Apr 08, 2012

GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285

f82b0e2d

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Remove old 'grep' lines. · 8c783d41
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154283
```
8c783d41
Formatting changes. Don't put spaces in front of some code, which only makes it look 'off'. · ccf11090
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154282
```
ccf11090
FileCheckize these testcases. · 57f8e5eb
Bill Wendling authored Apr 08, 2012
```
llvm-svn: 154281
```
57f8e5eb

Remove the 'Parent' pointer from the MDNodeOperand class. · 5c0068f8

Bill Wendling authored Apr 08, 2012

An MDNode has a list of MDNodeOperands allocated directly after it as part of
its allocation. Therefore, the Parent of the MDNodeOperands can be found by
walking back through the operands to the beginning of that list. Mark the first
operand's value pointer as being the 'first' operand so that we know where the
beginning of said list is.

This saves a *lot* of space during LTO with -O0 -g flags.

llvm-svn: 154280

5c0068f8

Allow subclasses of the ValueHandleBase to store information as part of the · 9b2503a0
Bill Wendling authored Apr 08, 2012
```
value pointer by making the value pointer into a pointer-int pair with 2 bits
available for flags.

llvm-svn: 154279
```
9b2503a0
Don't forget to evaluate the subexpression in a null pointer cast. If we're · 4051ff76
Richard Smith authored Apr 08, 2012
```
converting from std::nullptr_t, the subexpression might have side-effects.

llvm-svn: 154278
```
4051ff76
[docs] Add more open projects. · d73a53f1
Michael J. Spencer authored Apr 08, 2012
```
llvm-svn: 154277
```
d73a53f1
[docs] Add documentation todos. · 00d9e87c
Michael J. Spencer authored Apr 08, 2012
```
llvm-svn: 154276
```
00d9e87c
[docs] Make the index page ReST based instead of html based. · d01c8fe7
Michael J. Spencer authored Apr 08, 2012
```
llvm-svn: 154275
```
d01c8fe7
[docs] Add open projects page that includes the TODO.txt files. · f9bc125c
Michael J. Spencer authored Apr 07, 2012
```
llvm-svn: 154274
```
f9bc125c

ext_reserved_user_defined_literal must not default to Error in MicrosoftMode.... · 7ebc4c19

Francois Pichet authored Apr 07, 2012

ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse.

Fixes PR12383.

llvm-svn: 154273

7ebc4c19

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and... · d024cef2

Craig Topper authored Apr 07, 2012

Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1.

llvm-svn: 154272

d024cef2

MIPS: Pass -mabi option to the assmbler when compile MIPS targets. · 571d7bde
Simon Atanasyan authored Apr 07, 2012
```
llvm-svn: 154270
```
571d7bde
MIPS: Move code calculates CPU and ABI names to the separate function to reuse this function later. · 3b7589a6
Simon Atanasyan authored Apr 07, 2012
```
llvm-svn: 154269
```
3b7589a6

Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a

Craig Topper authored Apr 07, 2012

Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.

llvm-svn: 154268

aa9aab5a

Remove 'else' after 'if' that ends in return. · e09d1c5c
Craig Topper authored Apr 07, 2012
```
llvm-svn: 154267
```
e09d1c5c

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5

Convert floating point division by a constant into multiplication by the · 5f8397a9

Duncan Sands authored Apr 07, 2012

reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265

5f8397a9

Perform partial SROA on the helper hashing structure. I really wish the · 75a1cf32

Chandler Carruth authored Apr 07, 2012

optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.

We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.

With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.

There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.

llvm-svn: 154264

75a1cf32