Commits · 9cce299cc8d504b5d2fff5e24f72ea5ef751b5b1 · Roger Ferrer / llvm-epi-0.8

Apr 29, 2009

spillPhysRegAroundRegDefsUses() may have invalidated iterators stored in... · 9cce299c
Evan Cheng authored Apr 29, 2009
```
spillPhysRegAroundRegDefsUses() may have invalidated iterators stored in fixed_ IntervalPtrs. Reset them.

llvm-svn: 70378
```
9cce299c

Bill Wendling authored Apr 29, 2009

Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to
use the old behavior, the flag is -O0. This change allows for finer-grained
control over which optimizations are run at different -O levels.

Most of this work was pretty mechanical. The majority of the fixes came from
verifying that a "fast" variable wasn't used anymore. The JIT still uses a
"Fast" flag. I'll change the JIT with a follow-up patch.

llvm-svn: 70343

084669a1

Apr 28, 2009
- Properly print 'P' modifier on inline asm memory operands. · dac88bae
  Anton Korobeynikov authored Apr 28, 2009
  
  This should fix PR3379 and PR4064. Patch inspired by Edwin Török! llvm-svn: 70328
  dac88bae
- Fix PR4034. Bug in LiveInterval::join when it's compacting new valno's. · 7e09994c
  Evan Cheng authored Apr 28, 2009
  
  llvm-svn: 70291
  7e09994c
- Fix for PR4051. When 2address pass delete an instruction, update kill info when necessary. · c3884be9
  Evan Cheng authored Apr 28, 2009
  
  llvm-svn: 70279
  c3884be9
- r70270 isn't ready yet. Back this out. Sorry for the noise. · 56f2987a
  Bill Wendling authored Apr 28, 2009
  
  llvm-svn: 70275
  56f2987a
- Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to · d0ae1594
  Bill Wendling authored Apr 28, 2009
  
  use the old behavior, the flag is -O0. This change allows for finer-grained control over which optimizations are run at different -O levels. Most of this work was pretty mechanical. The majority of the fixes came from verifying that a "fast" variable wasn't used anymore. The JIT still uses a "Fast" flag. I'm not 100% sure if it's necessary to change it there... llvm-svn: 70270
  d0ae1594
Apr 27, 2009

Fix PR4076. Correctly create live interval of physical register with two-address update. · 093e4c57
Evan Cheng authored Apr 27, 2009
```
llvm-svn: 70245
```
093e4c57
Permit ChangeCompareStride to rewrite a comparison when the factor · e99f9826
Dan Gohman authored Apr 27, 2009
```
between the comparison's iv stride and the candidate stride is
exactly -1.

llvm-svn: 70244
```
e99f9826

Teach getZeroExtendExpr and getSignExtendExpr to use trip-count · 76466373

Dan Gohman authored Apr 27, 2009

information to simplify [sz]ext({a,+,b}) to {zext(a),+,[zs]ext(b)},
as appropriate.

These functions and the trip count code each call into the other, so
this requires careful handling to avoid infinite recursion. During
the initial trip count computation, conservative SCEVs are used,
which are subsequently discarded once the trip count is actually
known.

Among other benefits, this change lets LSR automatically eliminate
some unnecessary zext-inreg and sext-inreg operation where the
operand is an induction variable.

llvm-svn: 70241

76466373

2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92

Nate Begeman authored Apr 27, 2009

PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225

8d6d4b92

Fix PR4056. It's possible a physical register def is dead if its implicit use... · 0f85bd36
Evan Cheng authored Apr 27, 2009
```
Fix PR4056. It's possible a physical register def is dead if its implicit use is deleted by two-address pass.

llvm-svn: 70213
```
0f85bd36
Fix the syntax for a PR number in a test. · 3266391e
Dan Gohman authored Apr 27, 2009
```
llvm-svn: 70208
```
3266391e

When transforming sext(trunc(load(x))) into sext(smaller load(x)), · be36f5cc

Dan Gohman authored Apr 27, 2009

the trunc is directly replaced with the smaller load, so don't
try to create a new sext node. This fixes PR4050.

llvm-svn: 70179

be36f5cc

Apr 25, 2009

Do not share a single unknown val# for all the live ranges merged into a... · 362acf8a

Evan Cheng authored Apr 25, 2009

Do not share a single unknown val# for all the live ranges merged into a physical sub-register live interval. When coalescer is merging in clobbered virtaul register live interval into a physical register live interval, give each virtual register val# a separate val# in the physical register live interval. Otherwise, the coalescer would have lost track of the definitions   information it needs to make correct coalescing decisions.

llvm-svn: 70026

362acf8a

Apr 24, 2009

Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not · c1396a23
Rafael Espindola authored Apr 24, 2009
```
very elegant, but neither is the tls specification :-(

llvm-svn: 69968
```
c1396a23
Revert 69952. Causes testsuite failures on linux x86-64. · b93db668
Rafael Espindola authored Apr 24, 2009
```
llvm-svn: 69967
```
b93db668

PR2957 · bb881d66

Nate Begeman authored Apr 24, 2009

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952

bb881d66

Apr 23, 2009
- Explicitly pass -tailcallopt=false to these tests so that they · 723f175b
  Dan Gohman authored Apr 23, 2009
  
  work as intended no matter what the default setting of that option is. llvm-svn: 69911
  723f175b
Apr 22, 2009

It has finally happened. Spiller is now using live interval info. · 1a99a5f5

Evan Cheng authored Apr 21, 2009

This fixes a very subtle bug. vr defined by an implicit_def is allowed overlap with any register since it doesn't actually modify anything. However, if it's used as a two-address use, its live range can be extended and it can be spilled. The spiller must take care not to emit a reload for the vn number that's defined by the implicit_def. This is both a correctness and performance issue.

llvm-svn: 69743

1a99a5f5

Apr 20, 2009

Added a linearscan register allocation optimization. When the register... · d67efaa8

Evan Cheng authored Apr 20, 2009

Added a linearscan register allocation optimization. When the register allocator spill an interval with multiple uses in the same basic block, it creates a different virtual register for each of the reloads. e.g.

	%reg1498<def> = MOV32rm %reg1024, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
        %reg1506<def> = MOV32rm %reg1024, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
        %reg1486<def> = MOV32rr %reg1506
        %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
        %reg1510<def> = MOV32rm %reg1024, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]

=>

        %reg1498<def> = MOV32rm %reg2036, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0]
        %reg1506<def> = MOV32rm %reg2037, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0]
        %reg1486<def> = MOV32rr %reg1506
        %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead>
        %reg1510<def> = MOV32rm %reg2038, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0]

From linearscan's point of view, each of reg2036, 2037, and 2038 are separate registers, each is "killed" after a single use. The reloaded register is available and it's often clobbered right away. e.g. In thise case reg1498 is allocated EAX while reg2036 is allocated RAX. This means we end up with multiple reloads from the same stack slot in the same basic block.

Now linearscan recognize there are other reloads from same SS in the same BB. So it'll "downgrade" RAX (and its aliases) after reg2036 is allocated until the next reload (reg2037) is done. This greatly increase the likihood reloads from SS are reused.

This speeds up sha1 from OpenSSL by 5.8%. It is also an across the board win for SPEC2000 and 2006.

llvm-svn: 69585

d67efaa8

Apr 18, 2009

Adjust XFAIL syntax, maybe that will help. The other · 2f6263fe
Dale Johannesen authored Apr 18, 2009
```
way worked for me...

llvm-svn: 69414
```
2f6263fe

patch 69408 breaks this by removing the opportunity · e34fb6b5

Dale Johannesen authored Apr 18, 2009

for the optimization it's testing to kick in (although
it improves the code, getting rid of all spills).
I don't understand the optimization well enough to
rescue the test, so XFAILing.

llvm-svn: 69409

e34fb6b5

Apr 17, 2009

For general dynamic TLS access we must use · 355fe12c

Rafael Espindola authored Apr 17, 2009

leaq	foo@TLSGD(%rip), %rdi

as part of the instruction sequence. Using a register other than %rdi and then
copying it to %rdi is not valid.

llvm-svn: 69350

355fe12c

Teach spiller to unfold instructions which modref spill slot when a scratch · b96a1082

Evan Cheng authored Apr 17, 2009

register is available and when it's profitable.

e.g.
     xorq  %r12<kill>, %r13
     addq  %rax, -184(%rbp)
     addq  %r13, -184(%rbp)
==>
     xorq  %r12<kill>, %r13
     movq  -184(%rbp), %r12
     addq  %rax, %r12
     addq  %r13, %r12
     movq  %r12, -184(%rbp)

Two more instructions, but fewer memory accesses. It can also open up
opportunities for more optimizations.

llvm-svn: 69341

b96a1082

Apr 16, 2009

fix PR3995. A scale must be 1, 2, 4 or 8. · 5e42177a
Rafael Espindola authored Apr 16, 2009
```
llvm-svn: 69284
```
5e42177a

Expand GEPs in ScalarEvolution expressions. SCEV expressions can now · 0a40ad93

Dan Gohman authored Apr 16, 2009

have pointer types, though in contrast to C pointer types, SCEV
addition is never implicitly scaled. This not only eliminates the
need for special code like IndVars' EliminatePointerRecurrence
and LSR's own GEP expansion code, it also does a better job because
it lets the normal optimizations handle pointer expressions just
like integer expressions.

Also, since LLVM IR GEPs can't directly index into multi-dimensional
VLAs, moving the GEP analysis out of client code and into the SCEV
framework makes it easier for clients to handle multi-dimensional
VLAs the same way as other arrays.

Some existing regression tests show improved optimization.
test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to
the point where if-conversion started kicking in; I turned it off
for this test to preserve the intent of the test.

llvm-svn: 69258

0a40ad93

Apr 15, 2009

Fix the RUN lines so that this test actually tests. · 86e43e79
Dan Gohman authored Apr 14, 2009
```
llvm-svn: 69096
```
86e43e79
For the h-register addressing-mode trick, use the correct value for · 62f44986
Dan Gohman authored Apr 14, 2009
```
any non-address uses of the address value. This fixes 186.crafty.

llvm-svn: 69094
```
62f44986

When the result of an EXTRACT_SUBREG, INSERT_SUBREG, or SUBREG_TO_REG · e5cd1fcd

Dan Gohman authored Apr 14, 2009

operator is used by a CopyToReg to export the value to a different
block, don't reuse the CopyToReg's register for the subreg operation
result if the register isn't precisely the right class for the
subreg operation.

Also, rename the h-registers.ll test, now that there are more
than one.

llvm-svn: 69087

e5cd1fcd

Apr 14, 2009

Some of GR8_NOREX registers are only available in 64-bit mode. · dfbbf5c0
Evan Cheng authored Apr 14, 2009
```
llvm-svn: 69049
```
dfbbf5c0

Fix PR3934 part 2. findOnlyInterestingUse() was not setting IsCopy and... · 9787183b

Evan Cheng authored Apr 14, 2009

Fix PR3934 part 2. findOnlyInterestingUse() was not setting IsCopy and IsDstPhys which are returned by value and used by callee. This happened to work on the earlier test cases because of a logic error in the caller side.

llvm-svn: 69006

9787183b

Apr 13, 2009

PR3934: Fix a bogus two-address pass assertion. · f0843803
Evan Cheng authored Apr 13, 2009
```
llvm-svn: 68979
```
f0843803

Implement x86 h-register extract support. · 57d6bd36

Dan Gohman authored Apr 13, 2009

 - Add patterns for h-register extract, which avoids a shift and mask,
   and in some cases a temporary register.
 - Add address-mode matching for turning (X>>(8-n))&(255<<n), where
   n is a valid address-mode scale value, into an h-register extract
   and a scaled-offset address.
 - Replace X86's MOV32to32_ and related instructions with the new
   target-independent COPY_TO_SUBREG instruction.

On x86-64 there are complicated constraints on h registers, and
CodeGen doesn't currently provide a high-level way to express all of them,
so they are handled with a bunch of special code. This code currently only
supports extracts where the result is used by a zero-extend or a store,
though these are fairly common.

These transformations are not always beneficial; since there are only
4 h registers, they sometimes require extra move instructions, and
this sometimes increases register pressure because it can force out
values that would otherwise be in one of those registers. However,
this appears to be relatively uncommon.

llvm-svn: 68962

57d6bd36

X86-64 TLS support for local exec and initial exec. · 6d6c6043
Rafael Espindola authored Apr 13, 2009
```
llvm-svn: 68947
```
6d6c6043
In X86DAGToDAGISel::MatchWrapper, if base or index are set, avoid matching · 7186f20a
Rafael Espindola authored Apr 12, 2009
```
only if symbolic addresses are RIP relatives.

llvm-svn: 68924
```
7186f20a

Apr 12, 2009
- Add tests for the parts of X86-64 TLS that are already implemented. · e4bd8904
  Rafael Espindola authored Apr 12, 2009
  
  llvm-svn: 68901
  e4bd8904
- fix a cross-block fastisel crash handling overflow intrinsics. · ce6bcf08
  Chris Lattner authored Apr 12, 2009
  
  See comment for details. This fixes rdar://6772169 llvm-svn: 68890
  ce6bcf08
Apr 10, 2009

Don't fold a load if the other operand is a TLS address. · bb834f09

Rafael Espindola authored Apr 10, 2009

With this we generate

movl    %gs:0, %eax
leal    i@NTPOFF(%eax), %eax

instead of

movl    $i@NTPOFF, %eax
addl    %gs:0, %eax

llvm-svn: 68778

bb834f09

Apr 09, 2009
- reg0 references are not real registers. This fixes a crash on the · a725028d
  Chris Lattner authored Apr 09, 2009
  
  attached testcase. llvm-svn: 68712
  a725028d