Commits · ec96cd0690e7f28769fbc2bf7e2c2ba90e84178c · Roger Ferrer / llvm-epi-0.8

Apr 10, 2012
- Test case for PR12495. · ec96cd06
  Lang Hames authored Apr 09, 2012
```
llvm-svn: 154359
```
  ec96cd06
Apr 09, 2012

Pattern match a setcc of boolean value with 0 as a truncate. · 8f62b324
Rafael Espindola authored Apr 09, 2012
```
llvm-svn: 154322
```
8f62b324
Lower some x86 shuffle sequences to the vblend family of instructions. · fb7e2ae5
Nadav Rotem authored Apr 09, 2012
```
llvm-svn: 154313
```
fb7e2ae5
Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
b801ca39

Cleanup and relax a restriction on the matching of global offsets into · 3779ac10

Chandler Carruth authored Apr 09, 2012

x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304

3779ac10

Fold 15 tiny test cases into a single file that implements the · 84b83426

Chandler Carruth authored Apr 09, 2012

comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303

84b83426

Apr 08, 2012

Only have codegen turn fdiv by a constant into fmul by the reciprocal · 2f1dc381

Duncan Sands authored Apr 08, 2012

when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296

2f1dc381

Teach LLVM about a PIE option which, when enabled on top of PIC, makes · ede4a8aa

Chandler Carruth authored Apr 08, 2012

optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294

ede4a8aa

AVX2: Build splat vectors by broadcasting a scalar from the constant pool. · 82609df6

Nadav Rotem authored Apr 08, 2012

Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284

82609df6

Apr 07, 2012

1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5

Nadav Rotem authored Apr 07, 2012

   shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266

71d07ae5

Convert floating point division by a constant into multiplication by the · 5f8397a9

Duncan Sands authored Apr 07, 2012

reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265

5f8397a9

Make the test for r154235 more platform-independent with a shorter · 78fce432
Alexis Hunt authored Apr 07, 2012
```
string.

llvm-svn: 154243
```
78fce432

Output UTF-8-encoded characters as identifier characters into assembly · 0235f684

Alexis Hunt authored Apr 07, 2012

by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235

0235f684

Apr 06, 2012

Test case for PR12413 · bdc9f071
Craig Topper authored Apr 06, 2012
```
llvm-svn: 154172
```
bdc9f071

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a... · 447417c9

Craig Topper authored Apr 06, 2012

Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413.

llvm-svn: 154166

447417c9

Apr 05, 2012

Don't break the IV update in TLI::SimplifySetCC(). · 37492eac

Jakob Stoklund Olesen authored Apr 05, 2012

LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.

When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:

   (icmp (add iv, stride), stride) --> (cmp iv, 0)

This forced us to use lea for the IC update, preventing the simpler
incl+cmp.

<rdar://problem/7643606>
<rdar://problem/11184260>

llvm-svn: 154119

37492eac

Apr 03, 2012
- Add an additional testcase which checks ops with multiple users. · 269703f9
  Nadav Rotem authored Apr 03, 2012
```
llvm-svn: 153939
```
  269703f9
- Allocate virtual registers in ascending order. · 291007b0
  Jakob Stoklund Olesen authored Apr 02, 2012
```
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.

Patch by Shamil Kurmangaleev!

llvm-svn: 153904
```
  291007b0
Apr 02, 2012

Optimizing swizzles of complex shuffles may generate additional complex shuffles. · 702f0807

Nadav Rotem authored Apr 02, 2012

Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.

llvm-svn: 153864

702f0807

Apr 01, 2012

This commit contains a few changes that had to go in together. · b0783508

Nadav Rotem authored Apr 01, 2012

1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848

b0783508

Mar 31, 2012

Add a triple to the test. · 77242fa7
Rafael Espindola authored Mar 31, 2012
```
llvm-svn: 153818
```
77242fa7

Teach CodeGen's version of computeMaskedBits to understand the range metadata. · 80c540e6

Rafael Espindola authored Mar 31, 2012

This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.

llvm-svn: 153817

80c540e6

Mar 30, 2012
- Testcase for r153710. · afe7ec70
  Bill Wendling authored Mar 30, 2012
```
llvm-svn: 153711
```
  afe7ec70
Mar 29, 2012

The shuffle scheduler is only available in asserts build - make misched-new.ll · dd1211b4
Lang Hames authored Mar 29, 2012
```
testcase require asserts.

llvm-svn: 153687
```
dd1211b4
Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode. · 5569ce7d
Lang Hames authored Mar 29, 2012
```
llvm-svn: 153680
```
5569ce7d

For X86, change load/dec-or-inc/store into dec-or-inc, respectively. · 68d59e8a

Joel Jones authored Mar 29, 2012

This is a code change to add support for changing instruction sequences of the form:

  load
  inc/dec of 8/16/32/64 bits
  store

into the appropriate X86 inc/dec through memory instruction:

  inc[qlwb] / dec[qlwb]

The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.

llvm-svn: 153635

68d59e8a

Reverted to revision 153616 to unblock build · b474099e
Joel Jones authored Mar 29, 2012
```
llvm-svn: 153623
```
b474099e

For X86, change load/dec-or-inc/store into dec-or-inc, respectively. · b88c81fe

Joel Jones authored Mar 29, 2012

This is a code change to add support for changing instruction sequences of the form:

  load
  inc/dec of 8/16/32/64 bits
  store

into the appropriate X86 inc/dec through memory instruction:

  inc[qlwb] / dec[qlwb]

The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.

llvm-svn: 153617

b88c81fe

Mar 27, 2012
- Add a test for the previous commit. Also, remove two tests that were · d8abaf3f
  Eric Christopher authored Mar 27, 2012
```
testing a) the wrong behavior or b) something that I'm already testing
in the new test.

llvm-svn: 153525
```
  d8abaf3f
- Post-ra LICM should take care not to hoist an instruction that would clobber a · 7fede873
  Evan Cheng authored Mar 27, 2012
```
register that's read by the preheader terminator.

rdar://11095580

llvm-svn: 153492
```
  7fede873
Mar 25, 2012

Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu · f3308605

Eli Bendersky authored Mar 25, 2012

* Removed test/lib/llvm.exp - it is no longer needed 
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
  left in the test suite so this code is no longer required. test/lit.cfg is
  now much shorter and clearer 
* Removed a lot of duplicate code in lit.local.cfg files that need access to
  the root configuration, by adding a "root" attribute to the TestingConfig
  object. This attribute is dynamically computed to provide the same
  information as was previously provided by the custom getRoot functions. 
* Documented the config.root attribute in docs/CommandGuide/lit.pod

llvm-svn: 153408

f3308605

Mar 22, 2012
- Remove -enable-lsr-nested in time for 3.1. · d97b83e3
  Andrew Trick authored Mar 22, 2012
```
Tests cases have been removed but attached to open PR12330.

llvm-svn: 153286
```
  d97b83e3
Mar 21, 2012
- misched: tag a few XFAILs that I plan to fix · 267b57de
  Andrew Trick authored Mar 21, 2012
```
llvm-svn: 153222
```
  267b57de
Mar 20, 2012

[avx] Add patterns for combining vextractf128 + vmovaps/vmovups/vmobdqu to · 41069173

Chad Rosier authored Mar 20, 2012

vextractf128 with 128-bit mem dest.

Combines

	vextractf128 $0, %ymm0, %xmm0
	vmovaps %xmm0, (%rdi)

to

    vextractf128 $0, %ymm0, (%rdi)

rdar://11082570

llvm-svn: 153139

41069173

[avx] Move the vextractf128 patterns closer to the vextractf128 def. Remove · 5a601126
Chad Rosier authored Mar 20, 2012
```
whitespace from test case.  No functional change intended.

llvm-svn: 153103
```
5a601126
Fix test. · 58a7c9fd
Chad Rosier authored Mar 20, 2012
```
llvm-svn: 153095
```
58a7c9fd

[avx] Adjust the VINSERTF128rm pattern to allow for unaligned loads. · 07a4cb93

Chad Rosier authored Mar 20, 2012

This results in things such as

	vmovups	16(%rdi), %xmm0
	vinsertf128	$1, %xmm0, %ymm0, %ymm0

to be combined to

    vinsertf128	$1, 16(%rdi), %ymm0, %ymm0

rdar://11076953

llvm-svn: 153092

07a4cb93

It's possible to have a constant expression who's size is quite big (e.g., · 7315c4b9

Bill Wendling authored Mar 20, 2012

i128). In that case, we may not be able to print out the MCExpr as an
expression. For instance, we could have an MCExpr like this:

    0xBEEF0000BEEF0000 | (0xBEEF0000BEEF0000 << 64)

The MCExpr printer handles sizes up to 64-bits, but this expression would
require 128-bits. In this situation, try to evaluate the constant expression and
emit that as the value into 64-bit chunks.
<rdar://problem/11070338>

llvm-svn: 153081

7315c4b9

Mar 19, 2012

This patch adds X86 instruction itineraries for non-pseudo opcodes in · 48ccc4df

Preston Gurd authored Mar 19, 2012

X86InstrCompiler.td.
 
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.

llvm-svn: 153033

48ccc4df

Mar 15, 2012

When optimizing certain BUILD_VECTOR nodes into other BUILD_VECTOR nodes, add... · 6fd1d32c

Nadav Rotem authored Mar 15, 2012

When optimizing certain BUILD_VECTOR nodes into other BUILD_VECTOR nodes, add the new node into the work list because there is a potential for further optimizations.

llvm-svn: 152784

6fd1d32c