Commits · 8d5e128bd48a9c481150251c9761ce7838e65e87 · Roger Ferrer / llvm-epi-0.8

Aug 20, 2013

Add an option which permits the user to specify using a bitmask, that various · d8f33625

Reed Kotler authored Aug 20, 2013

functions be compiled as mips32, without having to add attributes. This
is useful in certain situations where you don't want to have to edit the
function attributes in the source. For now it's only an option used for
the compiler developers when debugging the mips16 port.

llvm-svn: 188826

d8f33625

[mips] Guard micromips instructions with predicate InMicroMips. Also, fix · a43b56d9
Akira Hatanaka authored Aug 20, 2013
```
assembler predicate HasStdEnd so that it is false when the target is micromips.

llvm-svn: 188824
```
a43b56d9

ARM: Fix fast-isel copy/paste-o. · 71a78f96

Jim Grosbach authored Aug 20, 2013

Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.

rdar://14779513

llvm-svn: 188819

71a78f96

AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQ · 540d5825
Elena Demikhovsky authored Aug 20, 2013
```
llvm-svn: 188786
```
540d5825

[mips][msa] Removed fcge, fcgt, fsge, fsgt · 4260527f

Daniel Sanders authored Aug 20, 2013

These instructions were present in a draft spec but were removed before
publication.

llvm-svn: 188782

4260527f

[SystemZ] Update README · 2bf7b8cc
Richard Sandiford authored Aug 20, 2013
```
We now use MVST, CLST and SRST for the obvious cases.

llvm-svn: 188781
```
2bf7b8cc

[SystemZ] Use SRST to optimize memchr · 6f6d5516

Richard Sandiford authored Aug 20, 2013

SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779

6f6d5516

[mips][msa] Added insve · f2a0f1d1
Daniel Sanders authored Aug 20, 2013
```
llvm-svn: 188777
```
f2a0f1d1

ARM: implement some simple f64 materializations. · f79c3a5a

Tim Northover authored Aug 20, 2013

Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.

llvm-svn: 188773

f79c3a5a

[mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.v · 869bdad9
Daniel Sanders authored Aug 20, 2013
```
llvm-svn: 188767
```
869bdad9
Fix formatting. No functional change. · 7a8cf010
Craig Topper authored Aug 20, 2013
```
llvm-svn: 188746
```
7a8cf010
Add AVX-512 and related features to the CPUID detection code. · e13a066c
Craig Topper authored Aug 20, 2013
```
llvm-svn: 188745
```
e13a066c

Move AVX and non-AVX replication inside a couple multiclasses to avoid... · fd2b3892

Craig Topper authored Aug 20, 2013

Move AVX and non-AVX replication inside a couple multiclasses to avoid repeating each instruction for both individually.

llvm-svn: 188743

fd2b3892

[PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes. · f381afc9

Bill Schmidt authored Aug 20, 2013

(Patch committed on behalf of Mark Minich, whose log entry follows.)

This is a continuation of the refactorings performed in svn rev 188573
(see that rev's comments for more detail).

This is my stage 2 refactoring: I combined the emitPrologue() &
emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a
lot of the code since in essence the PPC32 & PPC64 code generation
logic is the same, only the instruction forms are different (in most
cases). This simplification is necessary because my functional changes
(yet to come) add significant complexity, and without the
simplification of my stage 2 refactoring, the overall complexity of
both emitPrologue() & emitEpilogue() would have become almost
intractable for most mortal programmers (like me).

This submission was intended to be a pure refactoring (no functional
changes whatsoever). However, in the process of combining the PPC32 &
PPC64 flows, I spotted a difference that I believe is a bug (see svn
rev 186478 line 863, or svn rev 188573 line 888): This line appears to
be restoring the BP with the original FP content, not the original BP
content. When I merged the 32-bit and 64-bit code, I used the
corresponding code from the 64-bit flow, which I believe uses the
correct offset (BPOffset) for this operation.

llvm-svn: 188741

f381afc9

[Sparc] Use HWEncoding instead of unused Num field in Sparc register... · f625773b

Venkatraman Govindaraju authored Aug 20, 2013

[Sparc] Use HWEncoding instead of unused Num field in Sparc register definitions. Also, correct the definitions of RETL and RET instructions.

llvm-svn: 188738

f625773b

Add a llvm.copysign intrinsic · 0c5c01aa

Hal Finkel authored Aug 19, 2013

This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728

0c5c01aa

Don't form PPC CTR-based loops around a copysignl call · 1cf48ab8

Hal Finkel authored Aug 19, 2013

copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

llvm-svn: 188727

1cf48ab8

Aug 19, 2013

[mips] Fix instruction definitions that were incorrectly marked as code-gen-only. · ff7beb17
Akira Hatanaka authored Aug 19, 2013
```
llvm-svn: 188690
```
ff7beb17

Thumb2 add immediate alias for SP · 4a9df8a7

Mihai Popa authored Aug 19, 2013

The Thumb2 add immediate is in fact defined for SP. The manual is misleading as it points to a different section for add immediate with SP, however the encoding is the same as for add immediate with register only with the SP operand hard coded. As such add immediate with SP and add immediate with register can safely be treated as the same instruction.

All the patch does is adjust a register constraint on an instruction alias.

llvm-svn: 188676

4a9df8a7

AVX-512: added arithmetic and logical operations. · 1490c5eb

Elena Demikhovsky authored Aug 19, 2013

ADD, SUB, MUL integer and FP types. OR, AND, XOR.
Added embeded broadcast form for these instructions.

llvm-svn: 188673

1490c5eb

[SystemZ] Add negative integer absolute (load negative) · 784a5803

Richard Sandiford authored Aug 19, 2013

For now this matches the equivalent of (neg (abs ...)), which did hit a few
times in projects/test-suite.  We should probably also match cases where
absolute-like selects are used with reversed arguments.

llvm-svn: 188671

784a5803

[SystemZ] Add integer absolute (load positive) · 4b897054
Richard Sandiford authored Aug 19, 2013
```
llvm-svn: 188670
```
4b897054

[SystemZ] Add support for sibling calls · 709bda66

Richard Sandiford authored Aug 19, 2013

This first cut is pretty conservative. The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.

Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll. Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary. I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them. The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.

llvm-svn: 188669

709bda66

Add the PPC fcpsgn instruction · dbc78e1f

Hal Finkel authored Aug 19, 2013

Modern PPC cores support a floating-point copysign instruction, and we can use
this to lower the FCOPYSIGN node (which is created from calls to the libm
copysign function). A couple of extra patterns are necessary because the
operand types of FCOPYSIGN need not agree.

llvm-svn: 188653

dbc78e1f

Aug 18, 2013
- ARM: make sure we keep inline asm operands tied. · 55349a29
  Tim Northover authored Aug 18, 2013
```
When patching inlineasm nodes to use GPRPair for 64-bit values, we
were dropping the information that two operands were tied, which
effectively broke the live-interval of vregs affected.

llvm-svn: 188643
```
  55349a29
- AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions. · 3ce8dbba
  Elena Demikhovsky authored Aug 18, 2013
```
llvm-svn: 188637
```
  3ce8dbba
- Make more of the lowering helpers static. Also use MVT instead of EVT in a couple places. · e6861c9c
  Craig Topper authored Aug 18, 2013
```
llvm-svn: 188629
```
  e6861c9c
- Remove unused stdio.h includes · 8b2a3d1f
  Dmitri Gribenko authored Aug 18, 2013
```
llvm-svn: 188626
```
  8b2a3d1f
Aug 17, 2013

R600: Fix possible use of an uninitialized variable · 59ed08b2
Tom Stellard authored Aug 17, 2013
```
Spotted by Nick Lewycky!

llvm-svn: 188599
```
59ed08b2
R600: Expand vector FRINT ops · b249b757
Tom Stellard authored Aug 16, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188598
```
b249b757
R600: Expand vector FFLOOR ops · ad3aff24
Tom Stellard authored Aug 16, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188597
```
ad3aff24
R600: Expand vector float operations for both SI and R600 · a92ff879
Tom Stellard authored Aug 16, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188596
```
a92ff879
ARM: Properly constrain comparison fastisel register classes. · d7866790
Jim Grosbach authored Aug 16, 2013
```
Ongoing 'make the verifier happy' improvements to ARM fast-isel.

rdar://12594152

llvm-svn: 188595
```
d7866790

ARM: Fast-isel register class constrain for extends. · 3fa74910

Jim Grosbach authored Aug 16, 2013

Properly constrain the operand register class for instructions used
in [sz]ext expansion. Update more tests to use the verifier now that
we're getting the register classes correct.

rdar://12594152

llvm-svn: 188594

3fa74910

ARM: Fix more fast-isel verifier failures. · 06c2a681

Jim Grosbach authored Aug 16, 2013

Teach the generic instruction selection helper functions to constrain
the register classes of their input operands. For non-physical register
references, the generic code needs to be careful not to mess that up
when replacing references to result registers. As the comment indicates
for MachineRegisterInfo::replaceRegWith(), it's important to call
constrainRegClass() first.

rdar://12594152

llvm-svn: 188593

06c2a681

ARM: Clean up fast-isel machine verifier errors. · d69f3ed9

Jim Grosbach authored Aug 16, 2013

Lots of machine verifier errors result from using a plain GPR regclass
for incoming argument copies. A more restrictive rGPR class is more
appropriate since it more accurately represents what's happening, plus
it lines up better with isel later on so the verifier is happier.
Reduces the number of ARM fast-isel tests not running with the verifier
enabled by over half.

rdar://12594152

llvm-svn: 188592

d69f3ed9

Fix a subtle difference between running clang vs llc for mips16. · 0eae85fb

Reed Kotler authored Aug 16, 2013

This regards how mips16 is viewed. It's not really a target type but
there has always been a target for it in the td files. It's more properly
-mcpu=mips32 -mattr=+mips16 . This is how clang treats it but we have
always had the -mcpu=mips16 which I probably should delete now but it will
require updating all the .ll test cases for mips16. In this case it changed
how we decide if we have a count bits instruction and whether instruction
lowering should then expand ctlz. Now that we have dual mode compilation,
-mattr=+mips16 really just indicates the inital processor mode that
we are compiling for. (It is also possible to have -mcpu=64 -mattr=+mips16
but as far as I know, nobody has even built such a processor, though there
is an architecture manual for this).

llvm-svn: 188586

0eae85fb

Aug 16, 2013

[PowerPC] Preparatory refactoring for making prologue and epilogue · 8893a3d1

Bill Schmidt authored Aug 16, 2013

safe on PPC32 SVR4 ABI

[Patch and following text by Mark Minich; committing on his behalf.]

There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
(and any ABI without a Red Zone), no spills may be made until after
the stackframe is claimed, which also includes the LR spill which is
at a positive offset. The same problem exists in emitEpilogue(),
though there's no FIXME for it. I intend to fix this issue, making
LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
platforms (including in particular, OS-free embedded systems & kernel
code, where interrupts may share the same stack as user code).

In preparation for making these changes, to make the diffs for the
functional changes less cluttered, I am providing the non-functional
refactorings in two stages:

Stage 1 does some minor fluffy refactorings to pull multiple method
calls up into a single bool, creating named bools for repeated uses of
obscure logic, moving some code up earlier because either stage 2 or
my final version will require it earlier, and rewording/adding some
comments. My stage 1 changes can be characterized as primarily fluffy
cleanup, the purpose of which may be unclear until the stage 2 or
final changes are made.

My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
which is currently performed by largely duplicate code, into a single
flow, with the differences handled by a group of constants initialized
early in the methods.

This submission is for my stage 1 changes. There should be no
functional changes whatsoever; this is a pure refactoring.

llvm-svn: 188573

8893a3d1

R600/SI: Add pattern for xor of i1 · 8522270d

Michel Danzer authored Aug 16, 2013



Fixes two recent piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188559

8522270d

R600/SI: Fix broken encoding of DS_WRITE_B32 · 20680b1c

Michel Danzer authored Aug 16, 2013



The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558

20680b1c