Commits · 5938650b134da627a808888b673c62b10eb797cb · Roger Ferrer / llvm-epi-0.8

Nov 15, 2011

Properly qualify AVX2 specific parts of execution dependency table. Also... · 05baa85f

Craig Topper authored Nov 15, 2011

Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.

llvm-svn: 144622

05baa85f

Add vmov.f32 to materialize f32 immediate splats which cannot be handled by · 7ca4b6eb
Evan Cheng authored Nov 15, 2011
```
integer variants. rdar://10437054

llvm-svn: 144608
```
7ca4b6eb
ARM parsing datatype suffix variants for fixed-writeback VLD1/VST1 instructions. · 29cdcda8
Jim Grosbach authored Nov 15, 2011
```
rdar://10435076

llvm-svn: 144606
```
29cdcda8
Move WEAK marking to the declaration. · 6804d270
Nick Lewycky authored Nov 15, 2011
```
llvm-svn: 144603
```
6804d270

Break false dependencies before partial register updates. · f8ad336b

Jakob Stoklund Olesen authored Nov 15, 2011

Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.

The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.

The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.

llvm-svn: 144602

f8ad336b

Track register ages more accurately. · 543bef6e

Jakob Stoklund Olesen authored Nov 15, 2011

Keep track of the last instruction to define each register individually
instead of per DomainValue.  This lets us track more accurately when a
register was last written.

Also track register ages across basic blocks.  When entering a new
basic block, use the least stale predecessor def as a worst case
estimate for register age.

The register age is used to arbitrate between conflicting domains. The
most recently defined register wins.

llvm-svn: 144601

543bef6e

Fix linking for some users who already have tsan enabled code and are trying to · b2489b74
Nick Lewycky authored Nov 15, 2011
```
link it against llvm code, by making our definitions weak. "Some users."

llvm-svn: 144596
```
b2489b74
ARM parsing datatype suffix variants for non-writeback VST1 instructions. · a498af2b
Jim Grosbach authored Nov 14, 2011
```
rdar://10435076

llvm-svn: 144593
```
a498af2b
ARM parsing datatype suffix variants for non-writeback VLD1 instructions. · 72838a03
Jim Grosbach authored Nov 14, 2011
```
rdar://10435076

llvm-svn: 144592
```
72838a03
Add explanatory comment. · 750de7a3
Jim Grosbach authored Nov 14, 2011
```
llvm-svn: 144589
```
750de7a3

Split out the plain '.{8|16|32|64}' suffix handling. · 9c2d9d59

Jim Grosbach authored Nov 14, 2011

Make it easier to deal with aliases for instructions that do require a suffix
but accept more specific variants of the same size.

llvm-svn: 144588

9c2d9d59

ARM parsing optional datatype suffix for VAND/VEOR/VORR instructions. · 3d6c0e0b
Jim Grosbach authored Nov 14, 2011
```
rdar://10435076

llvm-svn: 144587
```
3d6c0e0b

Supporting inline memmove isn't going to be worthwhile. The only way to avoid · 057b6d34

Chad Rosier authored Nov 14, 2011

violating a dependency is to emit all loads prior to stores.  This would likely
cause a great deal of spillage offsetting any potential gains.

llvm-svn: 144585

057b6d34

ARM VLDR/VSTR instructions don't need a size suffix. · 3e2c6f38

Jim Grosbach authored Nov 14, 2011

Canonicallize on the non-suffixed form, but continue to accept assembly that
has any correctly sized type suffix.

llvm-svn: 144583

3e2c6f38

Nov 14, 2011

Refactor capture tracking (which already had a couple flags for whether returns · 7013a19e

Nick Lewycky authored Nov 14, 2011

and stores capture) to permit the caller to see each capture point and decide
whether to continue looking.

Use this inside memdep to do an analysis that basicaa won't do. This lets us
solve another devirtualization case, fixing PR8908!

llvm-svn: 144580

7013a19e

Add support for inlining small memcpys. · ab7223e9
Chad Rosier authored Nov 14, 2011
```
rdar://10412592

llvm-svn: 144578
```
ab7223e9
Fix a performance regression from r144565. Positive offsets were being lowered · 45110fdf
Chad Rosier authored Nov 14, 2011
```
into registers, rather then encoded directly in the load/store.

llvm-svn: 144576
```
45110fdf
ARM assembly parsing type suffix options for VLDR/VSTR. · 7996b157
Jim Grosbach authored Nov 14, 2011
```
rdar://10435076

llvm-svn: 144575
```
7996b157
Avoid dereferencing off the beginning of lists. · f2fc508d
Evan Cheng authored Nov 14, 2011
```
llvm-svn: 144569
```
f2fc508d

At -O0, multiple uses of a virtual registers in the same BB are being marked · 28ffb7e4

Evan Cheng authored Nov 14, 2011

"kill". This looks like a bug upstream. Since that's going to take some time
to understand, loosen the assertion and disable the optimization when
multiple kills are seen.

llvm-svn: 144568

28ffb7e4

Add support for tsan annotations (thread sanitizer, a valgrind-based tool). · fe856110

Nick Lewycky authored Nov 14, 2011

These annotations are disabled entirely when either ENABLE_THREADS is off, or
building a release build. When enabled, they add calls to functions with no
statements to ManagedStatic's getters.

Use these annotations to inform tsan that the race used inside ManagedStatic
initialization is actually benign. Thanks to Kostya Serebryany for helping
write this patch!

llvm-svn: 144567

fe856110

Add a missing pattern for X86ISD::MOVLPD. rdar://10436044 · fb13d32b
Evan Cheng authored Nov 14, 2011
```
llvm-svn: 144566
```
fb13d32b
Add support for Thumb load/stores with negative offsets. · adfd200b
Chad Rosier authored Nov 14, 2011
```
rdar://10412592

llvm-svn: 144565
```
adfd200b
Unbreak Release builds. · 319904cc
Benjamin Kramer authored Nov 14, 2011
```
llvm-svn: 144560
```
319904cc

Teach two-address pass to re-schedule two-address instructions (or the kill · 30f44ad7

Evan Cheng authored Nov 14, 2011

instructions of the two-address operands) in order to avoid inserting copies.
This fixes the few regressions introduced when the two-address hack was
disabled (without regressing the improvements).
rdar://10422688

llvm-svn: 144559

30f44ad7

Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered · 890e02e8

Pete Cooper authored Nov 14, 2011

Constant idx case is still done in tablegen but other cases are then expanded

Fixes <rdar://problem/10435460>

llvm-svn: 144557

890e02e8

Fold ConstantVector::isAllOnesValue into Constant::isAllOnesValue and simplify it. · 42d098e1
Benjamin Kramer authored Nov 14, 2011
```
llvm-svn: 144555
```
42d098e1
32-to-64-bit extended load. · f93b3f46
Akira Hatanaka authored Nov 14, 2011
```
llvm-svn: 144554
```
f93b3f46

AnalyzeCallOperands function for N32/64. · 0b8bc004

Akira Hatanaka authored Nov 14, 2011

N32/64 places all variable arguments in integer registers (or on stack),
regardless of their types, but follows calling convention of non-vaarg function
when it handles fixed arguments.

llvm-svn: 144553

0b8bc004

Modify LowerFormalArguments to correctly handle vaarg arguments for Mips64. · 52359363
Akira Hatanaka authored Nov 14, 2011
```
llvm-svn: 144552
```
52359363
PTX: Let LLVM use loads/stores for all mem* intrinsics, instead of relying on... · 33a51902
Justin Holewinski authored Nov 14, 2011
```
PTX: Let LLVM use loads/stores for all mem* intrinsics, instead of relying on custom implementations.

llvm-svn: 144551
```
33a51902

Remove variable that keeps the size of area used to save byval or variable · d673cfe0

Akira Hatanaka authored Nov 14, 2011

argument registers on the callee's stack frame, along with functions that set
and get it.
    
It is not necessary to add the size of this area when computing stack size in
emitPrologue, since it has already been accounted for in
PEI::calculateFrameObjectOffsets.

llvm-svn: 144549

d673cfe0

Fix early-clobber handling in shrinkToUses. · 7e6004a3
Jakob Stoklund Olesen authored Nov 14, 2011
```
I broke this in r144515, it affected most ARM testers.

<rdar://problem/10441389>

llvm-svn: 144547
```
7e6004a3
Disable generation of compact unwind encodings. <rdar://problem/10441578> · 8d1c7dbd
Bob Wilson authored Nov 14, 2011
```
This still seems to be causing some failures.  It needs more testing before
it gets enabled again.

llvm-svn: 144543
```
8d1c7dbd
Tidy up. 80 column. · ee201fae
Jim Grosbach authored Nov 14, 2011
```
llvm-svn: 144538
```
ee201fae
Make headers standalone, move a virtual method out of line. · d00e94e8
Benjamin Kramer authored Nov 14, 2011
```
llvm-svn: 144536
```
d00e94e8

It helps to deallocate memory as well as allocate it. =] This actually · fd9b4d98

Chandler Carruth authored Nov 14, 2011

cleans up all the chains allocated during the processing of each
function so that for very large inputs we don't just grow memory usage
without bound.

llvm-svn: 144533

fd9b4d98

Remove an over-eager assert that was firing on one of the ARM regression · 0a31d149

Chandler Carruth authored Nov 14, 2011

tests when I forcibly enabled block placement.

It is apparantly possible for an unanalyzable block to fallthrough to
a non-loop block. I don't actually beleive this is correct, I believe
that 'canFallThrough' is returning true needlessly for the code
construct, and I've left a bit of a FIXME on the verification code to
try to track down why this is coming up.

Anyways, removing the assert doesn't degrade the correctness of the algorithm.

llvm-svn: 144532

0a31d149

Begin chipping away at one of the biggest quadratic-ish behaviors in · 0af6a0bb

Chandler Carruth authored Nov 14, 2011

this pass. We're leaving already merged blocks on the worklist, and
scanning them again and again only to determine each time through that
indeed they aren't viable. We can instead remove them once we're going
to have to scan the worklist. This is the easy way to implement removing
them. If this remains on the profile (as I somewhat suspect it will), we
can get a lot more clever here, as the worklist's order is essentially
irrelevant. We can use swapping and fold the two loops to reduce
overhead even when there are many blocks on the worklist but only a few
of them are removed.

llvm-svn: 144531

0af6a0bb

Under the hood, MBPI is doing a linear scan of every successor every · 84cd44c7

Chandler Carruth authored Nov 14, 2011

time it is queried to compute the probability of a single successor.
This makes computing the probability of every successor of a block in
sequence... really really slow. ;] This switches to a linear walk of the
successors rather than a quadratic one. One of several quadratic
behaviors slowing this pass down.

I'm not really thrilled with moving the sum code into the public
interface of MBPI, but I don't (at the moment) have ideas for a better
interface. My direction I'm thinking in for a better interface is to
have MBPI actually retain much more state and make *all* of these
queries cheap. That's a lot of work, and would require invasive changes.
Until then, this seems like the least bad (ie, least quadratic)
solution. Suggestions welcome.

llvm-svn: 144530

84cd44c7