Commits · cf6979d8961b37fe64ae6e88a67b23078253ed75 · Roger Ferrer / llvm-epi-0.8

May 19, 2013

[Sparc] Rearrange integer registers' allocation order so that register... · 3320e5a9

Venkatraman Govindaraju authored May 19, 2013

[Sparc] Rearrange integer registers' allocation order so that register allocator will use I and G registers before using L and O registers.

Also, enable registers %g2-%g4 to be used in application and %g5 in 64 bit mode.

llvm-svn: 182219

3320e5a9

Handle i64 FrameIndex nodes in SPARC v9 mode. · ead983ce
Jakob Stoklund Olesen authored May 19, 2013
```
llvm-svn: 182216
```
ead983ce

May 18, 2013

Check InlineAsm clobbers in PPCCTRLoops · 2f474f0e

Hal Finkel authored May 18, 2013

We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.

llvm-svn: 182191

2f474f0e

AArch64: add CMake dependency to fix very parallel builds · fd2639f7
Tim Northover authored May 18, 2013
```
llvm-svn: 182190
```
fd2639f7

X86: Bad peephole interaction between adc, MOV32r0 · 5ba473af

David Majnemer authored May 18, 2013

The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.

The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.

Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.

llvm-svn: 182184

5ba473af

Add LLVMContext argument to getSetCCResultType · 75865923
Matt Arsenault authored May 18, 2013
```
llvm-svn: 182180
```
75865923

Support unaligned load/store on more ARM targets · 97b08c40

JF Bastien authored May 17, 2013

This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.

The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).

The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.

I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.

llvm-svn: 182175

97b08c40

Fix the build in c++11 mode. · 5986ce0e

Rafael Espindola authored May 17, 2013

The errors were:

non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list

and

non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list

llvm-svn: 182168

5986ce0e

May 17, 2013

R600: Lower int_load_input to copyFromReg instead of Register node · d3fcb501
Vincent Lejeune authored May 17, 2013
```
It solves a bug uncovered by dot4 patch where the register class of
int_load_input use was ignored.

llvm-svn: 182130
```
d3fcb501
R600: Use bottom up scheduling algorithm · 3d5118ca
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182129
```
3d5118ca

R600: Use depth first scheduling algorithm · 4c81d4da

Vincent Lejeune authored May 17, 2013

It should increase PV substitution opportunities and lower gpr
usage (pending computations path are "flushed" sooner)

llvm-svn: 182128

4c81d4da

R600: Replace big texture opcode switch in scheduler by usesTC/usesVC · e958c8e0
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182127
```
e958c8e0

R600: Relax some vector constraints on Dot4. · 519f21ee

Vincent Lejeune authored May 17, 2013

Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.

llvm-svn: 182126

519f21ee

R600: Improve texture handling · d3eed66e
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182125
```
d3eed66e

R600: Rename 128 bit registers. · 4ebef18a

Vincent Lejeune authored May 17, 2013

Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.

llvm-svn: 182124

4ebef18a

R600: Some factorization · 0fca91d5
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182123
```
0fca91d5
R600: Factorize Fetch size limit inside AMDGPUSubTarget · f9f4e1e7
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182122
```
f9f4e1e7
R600: prettier dump of clamp · 709e0168
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182121
```
709e0168

R600: Fix encoding for R600 family GPUs · ecc2ad1c

Tom Stellard authored May 17, 2013



Reviewed-by: Vincent Lejeune <vljn@ovi.com>

https://bugs.freedesktop.org/show_bug.cgi?id=64193
https://bugs.freedesktop.org/show_bug.cgi?id=64257
https://bugs.freedesktop.org/show_bug.cgi?id=64320

NOTE: This is a candidate for the 3.3 branch.
llvm-svn: 182113

ecc2ad1c

R600: Pass MCSubtargetInfo reference to R600CodeEmitter · edade94b
Tom Stellard authored May 17, 2013
```
llvm-svn: 182112
```
edade94b
[Sparc] Implements hasReservedCallFrame and hasFP. · 641b0b5a
Venkatraman Govindaraju authored May 17, 2013
```
 This is to generate correct framesetup code when the function
 has variable sized allocas.

llvm-svn: 182108
```
641b0b5a

X86: Make shuffle -> shift conversion more aggressive about undefs. · fc33e1d9

Benjamin Kramer authored May 17, 2013

Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.

We still prefer palignr over psrldq because it has higher throughput on
sandybridge.

llvm-svn: 182102

fc33e1d9

· 2dbe06a9

Ulrich Weigand authored May 17, 2013

[PowerPC] Fix hi/lo encoding in old-style code emitter

This patch implements the equivalent change to r182091/r182092
in the old-style code emitter.  Instead of having two separate
16-bit immediate encoding routines depending on the instruction,
this patch introduces a single encoder that checks the machine
operand flags to decide whether the low or high half of a
symbol address is required.

Since now both encoders make no further distinction between
"symbolLo" and "symbolHi", the .td operand can now use a
single getS16ImmEncoding method.

Tested by running the old-style JIT tests on 32-bit Linux.

llvm-svn: 182097

2dbe06a9

· 6e23ac60

Ulrich Weigand authored May 17, 2013

[PowerPC] Merge/rename PPC fixup types

Now that fixup_ppc_ha16 and fixup_ppc_lo16 are being treated exactly
the same everywhere, it no longer makes sense to have two fixup types.

This patch merges them both into a single type fixup_ppc_half16,
and renames fixup_ppc_lo16_ds to fixup_ppc_half16ds for consistency.
(The half16 and half16ds names are taken from the description of
relocation types in the PowerPC ABI.)

No change in code generation expected.

llvm-svn: 182092

6e23ac60

· 994f49ed

Ulrich Weigand authored May 17, 2013

[PowerPC] Fix processing of ha16/lo16 fixups

The current PowerPC MC back end distinguishes between fixup_ppc_ha16
and fixup_ppc_lo16, which are determined by the instruction the fixup
applies to, and uses this distinction to decide whether a fixup ought
to resolve to the high or the low part of a symbol address.

This isn't quite correct, however.  It is valid -if unusual- assembler
to use, e.g.
  li 1, symbol@ha
or
  lis 1, symbol@l
Whether the high or the low part of the address is used depends solely
on the @ suffix, not on the instruction.

In addition, both
  li 1, symbol
and
  lis 1, symbol
are valid, assuming the symbol address fits into 16 bits; again, both
will then refer to the actual symbol value (so li will load the value
itself, while lis will load the value shifted by 16).


To fix this, two places need to be adapted.  If the fixup cannot be
resolved at assembler time, a relocation needs to be emitted via
PPCELFObjectWriter::getRelocType.  This routine already looks at
the VK_ type to determine the relocation.  The only problem is that
will reject any _LO modifier in a ha16 fixup and vice versa.  This
is simply incorrect; any of those modifiers ought to be accepted
for either fixup type.

If the fixup *can* be resolved at assembler time, adjustFixupValue
currently selects the high bits of the symbol value if the fixup
type is ha16.  Again, this is incorrect; see the above example
  lis 1, symbol

Now, in theory we'd have to respect a VK_ modifier here.  However,
in fact common code never even attempts to resolve symbol references
using any nontrivial VK_ modifier at assembler time; it will always
fall back to emitting a reloc and letting the linker handle it.

If this ever changes, presumably there'd have to be a target callback
to resolve VK_ modifiers.  We'd then have to handle @ha etc. there.

llvm-svn: 182091

994f49ed

Don't cast away constness. · 2057a2b8
Benjamin Kramer authored May 17, 2013
```
llvm-svn: 182086
```
2057a2b8

R600/SI: return undef instead of null for skipped arguments · b7be72df

Christian Konig authored May 17, 2013

This is a candidate for the stable branch.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=64694



Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182084

b7be72df

[Sparc] Prevent instructions that defines or uses %o7 to be in call's delay slot. · 54bf611c
Venkatraman Govindaraju authored May 16, 2013
```
llvm-svn: 182063
```
54bf611c

May 16, 2013

[mips] Improve instruction selection for pattern (store (fp_to_sint $src), $ptr). · 252f54f7

Akira Hatanaka authored May 16, 2013

Previously, three instructions were needed:

trunc.w.s $f0, $f2
mfc1 $4, $f0
sw $4, 0($2)

Now we need only two:

trunc.w.s $f0, $f2
swc1 $f0, 0($2)

llvm-svn: 182053

252f54f7

Remove addFrameMove. · b08d2c2d

Rafael Espindola authored May 16, 2013

Now that we have good testing, remove addFrameMove and create cfi
instructions directly.

llvm-svn: 182052

b08d2c2d

[mips] Factor out unaligned store lowering code. · d82ee940
Akira Hatanaka authored May 16, 2013
```
llvm-svn: 182050
```
d82ee940

Mips assembler: Add TwoOperandConstraint definitions · 03f0fd37

Jack Carter authored May 16, 2013

This patch removes alias definition for addiu $rs,$imm 
and instead uses the TwoOperandAliasConstraint field in 
the ArithLogicI instruction class. 

This way all instructions that inherit ArithLogicI class 
have the same macro defined. 

The usage examples are added to test files.

Patch by Vladimir Medic

llvm-svn: 182048

03f0fd37

Mips td file formatting: white space and long lines · 59817110
Jack Carter authored May 16, 2013
```
llvm-svn: 182047
```
59817110

Create an new preheader in PPCCTRLoops to avoid counter register clobbers · 5f587c59

Hal Finkel authored May 16, 2013

Some IR-level instructions (such as FP <-> i64 conversions) are not chained
w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
counter register. At the selection-DAG level, these might be reordered with the
mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
preheader has instructions that might use the counter register, create a new
preheader for the mtctr intrinsic. This extra block will be remerged with the
old preheader at the MI level, but will prevent unwanted reordering at the
selection-DAG level.

llvm-svn: 182045

5f587c59

[mips] Test case for r182042. Add comment. · fce4dd79
Akira Hatanaka authored May 16, 2013
```
llvm-svn: 182044
```
fce4dd79

[mips] Fix instruction selection pattern for sint_to_fp node to avoid emitting an · 39d40f7b

Akira Hatanaka authored May 16, 2013

invalid instruction sequence.

Rather than emitting an int-to-FP move instruction and an int-to-FP conversion
instruction during instruction selection, we emit a pseudo instruction which gets
expanded post-RA. Without this change, register allocation can possibly insert a
floating point register move instruction between the two instructions, which is not
valid according to the ISA manual.

mtc1 $f4, $4         # int-to-fp move instruction.
mov.s $f2, $f4       # move contents of $f4 to $f2.
cvt.s.w $f0, $f2     # int-to-fp conversion.

llvm-svn: 182042

39d40f7b

Mips assembler: Add branch macro definitions · 51785c47

Jack Carter authored May 16, 2013

This patch adds bnez and beqz instructions which represent alias definitions for bne and beq instructions as follows:
bnez $rs,$imm => bne $rs,$zero,$imm
beqz $rs,$imm => beq $rs,$zero,$imm

The corresponding test cases are added.

Patch by Vladimir Medic

llvm-svn: 182040

51785c47

[mips] Fix indentation. · 21bab5ba
Akira Hatanaka authored May 16, 2013
```
llvm-svn: 182036
```
21bab5ba
[mips] Delete unused enum value. · 7b6e4f13
Akira Hatanaka authored May 16, 2013
```
llvm-svn: 182035
```
7b6e4f13

· 9d980cbd

Ulrich Weigand authored May 16, 2013

[PowerPC] Use true offset value in "memrix" machine operands

This is the second part of the change to always return "true"
offset values from getPreIndexedAddressParts, tackling the
case of "memrix" type operands.

This is about instructions like LD/STD that only have a 14-bit
field to encode immediate offsets, which are implicitly extended
by two zero bits by the machine, so that in effect we can access
16-bit offsets as long as they are a multiple of 4.

The PowerPC back end currently handles such instructions by
carrying the 14-bit value (as it will get encoded into the
actual machine instructions) in the machine operand fields
for such instructions.  This means that those values are
in fact not the true offset, but rather the offset divided
by 4 (and then truncated to an unsigned 14-bit value).

Like in the case fixed in r182012, this makes common code
operations on such offset values not work as expected.
Furthermore, there doesn't really appear to be any strong
reason why we should encode machine operands this way.

This patch therefore changes the encoding of "memrix" type
machine operands to simply contain the "true" offset value
as a signed immediate value, while enforcing the rules that
it must fit in a 16-bit signed value and must also be a
multiple of 4.

This change must be made simultaneously in all places that
access machine operands of this type.  However, just about
all those changes make the code simpler; in many cases we
can now just share the same code for memri and memrix
operands.

llvm-svn: 182032

9d980cbd