Commits · 77df9cdd0b4ecbed009217a3c7e9097cbf182b4e · Roger Ferrer / llvm-epi-0.8

Aug 21, 2013

Synchronize VEX JIT encoding code with the MCJIT version. Fix a bug in the... · 77df9cdd

Craig Topper authored Aug 21, 2013

Synchronize VEX JIT encoding code with the MCJIT version. Fix a bug in the MCJIT code where CurOp was being incremented even if the operand it was pointing at wasn't used. Maybe only matters if there are any EVEX_K instructions that aren't VEX_4V.

llvm-svn: 188868

77df9cdd

In LLVM FMA3 operands are dst, src1, src2, src3, however dst is not encoded as... · 7efc04cb

Nadav Rotem authored Aug 21, 2013

In LLVM FMA3 operands are dst, src1, src2, src3, however dst is not encoded as it is always src1. This was causing the encoding of the operands to be off by one.

Patch by Chris Bieneman.

llvm-svn: 188866

7efc04cb

Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi ->... · 5c94bb85

Craig Topper authored Aug 21, 2013

Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi -> av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening.

llvm-svn: 188859

5c94bb85

X86TargetMachine.cpp: Clarify to emit GOT in i686-{cygming|win32}-elf for mcjit. · de8880a2
NAKAMURA Takumi authored Aug 21, 2013
```
I suppose all "lli -use-mcjit i686-*" should require GOT, (and to fail.)

llvm-svn: 188856
```
de8880a2
Move #includes from .h to .cpp file. · 84a0ae74
Jakub Staszak authored Aug 21, 2013
```
llvm-svn: 188852
```
84a0ae74
[micromips] Print instruction alias "not" if the last operand of a nor is zero. · 39f915b5
Akira Hatanaka authored Aug 21, 2013
```
llvm-svn: 188851
```
39f915b5

Move registering the execution of a basic block to the beginning rather than the end. · 707f601f

Bill Wendling authored Aug 20, 2013

There are situations which can affect the correctness (or at least expectation)
of the gcov output. For instance, if a call to __gcov_flush() occurs within a
block before the execution count is registered and then the program aborts in
some way, then that block will not be marked as executed. This is not normally
what the user expects.

If we move the code that's registering when a block is executed to the
beginning, we can catch these types of situations.

PR16893

llvm-svn: 188849

707f601f

[mips] Add support for mfhc1 and mthc1. · 9a1fb6b9
Akira Hatanaka authored Aug 20, 2013
```
llvm-svn: 188848
```
9a1fb6b9
[mips] Add support for calling convention CC_MipsO32_FP64, which is used when the · bfb66247
Akira Hatanaka authored Aug 20, 2013
```
size of floating point registers is 64-bit.

Test case will be added when support for mfhc1 and mthc1 is added.

llvm-svn: 188847
```
bfb66247
[mips] Remove predicates that were incorrectly or unnecessarily added. · 8dd951bc
Akira Hatanaka authored Aug 20, 2013
```
llvm-svn: 188845
```
8dd951bc
Add some constantness. · d184e2de
Jakub Staszak authored Aug 20, 2013
```
llvm-svn: 188844
```
d184e2de

[mips] Define register class FGRH32 for the high half of the 64-bit floating · 14e31a2f

Akira Hatanaka authored Aug 20, 2013

point registers. We will need this register class later when we add
definitions for instructions mfhc1 and mthc1. Also, remove sub-register indices
sub_fpeven and sub_fpodd and use sub_lo and sub_hi instead.

llvm-svn: 188842

14e31a2f

Aug 20, 2013

SLPVectorizer: Fix invalid iterator errors · e1f3ab69

Arnold Schwaighofer authored Aug 20, 2013

Update iterator when the SLP vectorizer changes the instructions in the basic
block by restarting the traversal of the basic block.

Patch by Yi Jiang!

Fixes PR 16899.

llvm-svn: 188832

e1f3ab69

Teach ConstantFolding about pointer address spaces · 7a960a84
Matt Arsenault authored Aug 20, 2013
```
llvm-svn: 188831
```
7a960a84

[mips] Resolve register classes dynamically using ptr_rc to reduce the number of · 6781fc16

Akira Hatanaka authored Aug 20, 2013

load/store instructions defined. Previously, we were defining load/store
instructions for each pointer size (32 and 64-bit), but now we need just one
definition.

llvm-svn: 188830

6781fc16

Add an option which permits the user to specify using a bitmask, that various · d8f33625

Reed Kotler authored Aug 20, 2013

functions be compiled as mips32, without having to add attributes. This
is useful in certain situations where you don't want to have to edit the
function attributes in the source. For now it's only an option used for
the compiler developers when debugging the mips16 port.

llvm-svn: 188826

d8f33625

[mips] Guard micromips instructions with predicate InMicroMips. Also, fix · a43b56d9
Akira Hatanaka authored Aug 20, 2013
```
assembler predicate HasStdEnd so that it is false when the target is micromips.

llvm-svn: 188824
```
a43b56d9

ARM: Fix fast-isel copy/paste-o. · 71a78f96

Jim Grosbach authored Aug 20, 2013

Update testcase to be more careful about checking register
values. While regexes are general goodness for these sorts of
testcases, in this example, the registers are constrained by
the calling convention, so we can and should check their
explicit values.

rdar://14779513

llvm-svn: 188819

71a78f96

Fix style issues in AsmParser.cpp · 9bad0d33
Vladimir Medic authored Aug 20, 2013
```
llvm-svn: 188798
```
9bad0d33
AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQ · 540d5825
Elena Demikhovsky authored Aug 20, 2013
```
llvm-svn: 188786
```
540d5825

[mips][msa] Removed fcge, fcgt, fsge, fsgt · 4260527f

Daniel Sanders authored Aug 20, 2013

These instructions were present in a draft spec but were removed before
publication.

llvm-svn: 188782

4260527f

[SystemZ] Update README · 2bf7b8cc
Richard Sandiford authored Aug 20, 2013
```
We now use MVST, CLST and SRST for the obvious cases.

llvm-svn: 188781
```
2bf7b8cc

[SystemZ] Use SRST to optimize memchr · 6f6d5516

Richard Sandiford authored Aug 20, 2013

SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry.  I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.

This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA.  This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.

We should also try to optimize null checks so that they test the CC result
of the SRST directly.  The select between null and the SRST GPR result could
then usually be deleted as dead.

llvm-svn: 188779

6f6d5516

memcmp is not a valid way to compare structs with padding in them. · 5a712501
Benjamin Kramer authored Aug 20, 2013
```
llvm-svn: 188778
```
5a712501
[mips][msa] Added insve · f2a0f1d1
Daniel Sanders authored Aug 20, 2013
```
llvm-svn: 188777
```
f2a0f1d1

Fix overly pessimistic shortcut in post-RA MachineLICM · 96aa93d5

Richard Sandiford authored Aug 20, 2013

Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers
and TermRegs.  When it sees a definition of R it adds all aliases of R
to the corresponding set, so that when it needs to test for membership
it only needs to test a single register, rather than worrying about
aliases there too.  E.g. the final candidate loop just has:

    unsigned Def = Candidates[i].Def;
    if (!PhysRegClobbers.test(Def) && ...) {

to test whether register Def is multiply defined.

However, there was also a shortcut in ProcessMI to make sure we didn't
add candidates if we already knew that they would fail the final test.
This shortcut was more pessimistic than the final one because it
checked whether _any alias_ of the defined register was multiply defined.
This is too conservative for targets that define register pairs.
E.g. on z, R0 and R1 are sometimes used as a pair, so there is a
128-bit register that aliases both R0 and R1.  If a loop used
R0 and R1 independently, and the definition of R0 came first,
we would be able to hoist the R0 assignment (because that used
the final test quoted above) but not the R1 assignment (because
that meant we had two definitions of the paired R0/R1 register
and would fail the shortcut in ProcessMI).

This patch just uses the same check for the ProcessMI shortcut as
we use in the final candidate loop.

llvm-svn: 188774

96aa93d5

ARM: implement some simple f64 materializations. · f79c3a5a

Tim Northover authored Aug 20, 2013

Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.

llvm-svn: 188773

f79c3a5a

[stackprotector] Small cleanup. · dc985ef0
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188772
```
dc985ef0
[stackprotector] Small Bit of computation hoisting. · 76c44be1
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188771
```
76c44be1

[stackprotector] Added significantly longer comment to FindPotentialTailCall... · 1977d15e

Michael Gottesman authored Aug 20, 2013

[stackprotector] Added significantly longer comment to FindPotentialTailCall to make clear its relationship to llvm::isInTailCallPosition.

llvm-svn: 188770

1977d15e

Removed trailing whitespace. · 62c5d714
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188769
```
62c5d714
[stackprotector] Removed stale TODO. · 56e246b1
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188768
```
56e246b1
[mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.v · 869bdad9
Daniel Sanders authored Aug 20, 2013
```
llvm-svn: 188767
```
869bdad9
[stackprotector] Added support for emitting the llvm intrinsic stack protector check. · 5e57068b
Michael Gottesman authored Aug 20, 2013
```
rdar://13935163

llvm-svn: 188766
```
5e57068b

[stackprotector] Refactor out the end of isInTailCallPosition into the... · ce0e4c26

Michael Gottesman authored Aug 20, 2013

[stackprotector] Refactor out the end of isInTailCallPosition into the function returnTypeIsEligibleForTailCall.

This allows me to use returnTypeIsEligibleForTailCall in the stack protector pass.

rdar://13935163

llvm-svn: 188765

ce0e4c26

Remove unused variables that crept in. · f7e1203d
Michael Gottesman authored Aug 20, 2013
```
llvm-svn: 188761
```
f7e1203d

Teach selectiondag how to handle the stackprotectorcheck intrinsic. · b27f0f1f

Michael Gottesman authored Aug 20, 2013

Previously, generation of stack protectors was done exclusively in the
pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated
splitting basic blocks at the IR level to create the success/failure basic
blocks in the tail of the basic block in question. As a result of this,
calls that would have qualified for the sibling call optimization were no
longer eligible for optimization since said calls were no longer right in
the "tail position" (i.e. the immediate predecessor of a ReturnInst
instruction).

Then it was noticed that since the sibling call optimization causes the
callee to reuse the caller's stack, if we could delay the generation of
the stack protector check until later in CodeGen after the sibling call
decision was made, we get both the tail call optimization and the stack
protector check!

A few goals in solving this problem were:

1. Preserve the architecture independence of stack protector generation.

2. Preserve the normal IR level stack protector check for platforms like
OpenBSD for which we support platform specific stack protector
generation.

The main problem that guided the present solution is that one can not
solve this problem in an architecture independent manner at the IR level
only. This is because:

1. The decision on whether or not to perform a sibling call on certain
platforms (for instance i386) requires lower level information
related to available registers that can not be known at the IR level.

2. Even if the previous point were not true, the decision on whether to
perform a tail call is done in LowerCallTo in SelectionDAG which
occurs after the Stack Protector Pass. As a result, one would need to
put the relevant callinst into the stack protector check success
basic block (where the return inst is placed) and then move it back
later at SelectionDAG/MI time before the stack protector check if the
tail call optimization failed. The MI level option was nixed
immediately since it would require platform specific pattern
matching. The SelectionDAG level option was nixed because
SelectionDAG only processes one IR level basic block at a time
implying one could not create a DAG Combine to move the callinst.

To get around this problem a few things were realized:

1. While one can not handle multiple IR level basic blocks at the
SelectionDAG Level, one can generate multiple machine basic blocks
for one IR level basic block. This is how we handle bit tests and
switches.

2. At the MI level, tail calls are represented via a special return
MIInst called "tcreturn". Thus if we know the basic block in which we
wish to insert the stack protector check, we get the correct behavior
by always inserting the stack protector check right before the return
statement. This is a "magical transformation" since no matter where
the stack protector check intrinsic is, we always insert the stack
protector check code at the end of the BB.

Given the aforementioned constraints, the following solution was devised:

1. On platforms that do not support SelectionDAG stack protector check
generation, allow for the normal IR level stack protector check
generation to continue.

2. On platforms that do support SelectionDAG stack protector check
generation:

a. Use the IR level stack protector pass to decide if a stack
protector is required/which BB we insert the stack protector check
in by reusing the logic already therein. If we wish to generate a
stack protector check in a basic block, we place a special IR
intrinsic called llvm.stackprotectorcheck right before the BB's
returninst or if there is a callinst that could potentially be
sibling call optimized, before the call inst.

b. Then when a BB with said intrinsic is processed, we codegen the BB
normally via SelectBasicBlock. In said process, when we visit the
stack protector check, we do not actually emit anything into the
BB. Instead, we just initialize the stack protector descriptor
class (which involves stashing information/creating the success
mbbb and the failure mbb if we have not created one for this
function yet) and export the guard variable that we are going to
compare.

c. After we finish selecting the basic block, in FinishBasicBlock if
the StackProtectorDescriptor attached to the SelectionDAGBuilder is
initialized, we first find a splice point in the parent basic block
before the terminator and then splice the terminator of said basic
block into the success basic block. Then we code-gen a new tail for
the parent basic block consisting of the two loads, the comparison,
and finally two branches to the success/failure basic blocks. We
conclude by code-gening the failure basic block if we have not
code-gened it already (all stack protector checks we generate in
the same function, use the same failure basic block).

llvm-svn: 188755

b27f0f1f

Fix formatting. No functional change. · 7a8cf010
Craig Topper authored Aug 20, 2013
```
llvm-svn: 188746
```
7a8cf010
Add AVX-512 and related features to the CPUID detection code. · e13a066c
Craig Topper authored Aug 20, 2013
```
llvm-svn: 188745
```
e13a066c

Move AVX and non-AVX replication inside a couple multiclasses to avoid... · fd2b3892

Craig Topper authored Aug 20, 2013

Move AVX and non-AVX replication inside a couple multiclasses to avoid repeating each instruction for both individually.

llvm-svn: 188743

fd2b3892