Commits · 18f3a1ffe6d0e38e0df8486c15b8b106d6c0d076 · Roger Ferrer / llvm-epi-0.8

May 20, 2013

[NVPTX] Add programmatic interface to NVVMReflect pass · 18f3a1ff
Justin Holewinski authored May 20, 2013
```
llvm-svn: 182297
```
18f3a1ff

Rename PPC MTCTRse to MTCTRloop · 0859ef29

Hal Finkel authored May 20, 2013

As the pairing of this instruction form with the bdnz/bdz branches is now
enforced by the verification pass, make it clear from the name that these
are used only for counter-based loops.

No functionality change intended.

llvm-svn: 182296

0859ef29

Add a PPCCTRLoops verification pass · 8ca38841

Hal Finkel authored May 20, 2013

When asserts are enabled, this adds a verification pass for PPC counter-loop
formation. Unfortunately, without sacrificing code quality, there is no better
way of forming counter-based loops except at the (late) IR level. This means
that we need to recognize, at the IR level, anything which might turn into a
function call (or indirect branch). Because this is currently a finite set of
things, and because SelectionDAG lowering is basic-block local, this can be
done. Nevertheless, it is fragile, and failure results in a miscompile. This
verification pass checks that all (reachable) counter-based branches are
dominated by a loop mtctr instruction, and that no instructions in between
clobber the counter register. If these conditions are not satisfied, then an
ICE will be triggered.

In short, this is to help us sleep better at night.

llvm-svn: 182295

8ca38841

R600: Fix bug detected by GCC warning. · 927ca942

Benjamin Kramer authored May 20, 2013

R600TextureIntrinsicsReplacer.cpp:232: warning: the address of ‘ArgsType’ will always evaluate as ‘true’

This doesn't have any effect on the output as a vararg intrinsic behaves the
same way as a non-vararg one.

llvm-svn: 182293

927ca942

R600/SI: Use a multiclass for MUBUF_Load_Helper · f1ee7164

Tom Stellard authored May 20, 2013



This will simplify the instructions and also the pattern definitions.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182288

f1ee7164

R600/SI: Add a pattern for S_LOAD_DWORDX2_* instructions · b8458f88
Tom Stellard authored May 20, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182287
```
b8458f88
R600/SI: Add pattern for rotr · d2eebf00
Tom Stellard authored May 20, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182286
```
d2eebf00
R600: Swap the legality of rotl and rotr · 5643c4ac
Tom Stellard authored May 20, 2013
```
The hardware supports rotr and not rotl.

llvm-svn: 182285
```
5643c4ac
R600/SI: Add patterns for 64-bit shift operations · 1cfd7a50
Tom Stellard authored May 20, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182284
```
1cfd7a50

R600/SI: Use the same names for VOP3 operands and encoding fields · 459a79a8

Tom Stellard authored May 20, 2013



This makes it possible to reorder the operands without breaking the
encoding.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182283

459a79a8

R600/SI: Make fitsRegClass() operands const · b35efba4
Tom Stellard authored May 20, 2013
```
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182282
```
b35efba4

VSTn instructions have a number of encoding constraints which are not... · f41e3f56

Mihai Popa authored May 20, 2013

VSTn instructions have a number of encoding constraints which are not implemented. I have added these using wrapper methods around the original custom decoder (incidentally - this is a huge poorly written method that should be cleaned up. I have left it as is since the changes would be much to hard to review).

llvm-svn: 182281

f41e3f56

Q registers are encoded in fields of the same length as D registers. As Q... · dcf09227

Mihai Popa authored May 20, 2013

Q registers are encoded in fields of the same length as D registers. As Q registers are half as many, the ARM reference manual mandates the least significant bit to be zeroed out. Failure to do so should result in an undefined instruction. With this change test/MC/Disassembler/ARM/invalid-VQADD-arm.txt is passing (removed XFAIL).

llvm-svn: 182279

dcf09227

[SystemZ] Add long branch pass · 312425f3

Richard Sandiford authored May 20, 2013

Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output.  This was
a useful first step, but it had two problems:

(1) The z assembler isn't traditionally supposed to perform branch shortening
    or branch relaxation.  We followed this rule by not relaxing branches
    in assembler input, but that meant that generating assembly code and
    then assembling it would not produce the same result as going directly
    to object code; the former would give long branches everywhere, whereas
    the latter would use short branches where possible.

(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
    We would need to do something else before supporting them.

    (Although COMPARE AND BRANCH does not change the condition codes,
    the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
    during codegen, so that we can safely lower it to a separate compare
    and long branch where necessary.  This is not a valid transformation
    for the assembler proper to make.)

This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.

The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added.  The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.

The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.

llvm-svn: 182274

312425f3

[NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs · 01f89f04

Justin Holewinski authored May 20, 2013

This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.

The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.

llvm-svn: 182254

01f89f04

[NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need... · 700b6fa9

Justin Holewinski authored May 20, 2013

[NVPTX] Fix i1 kernel parameters and global variables.  ABI rules say we need to use .u8 for i1 parameters for kernels.

llvm-svn: 182253

700b6fa9

PR15868 fix. · d0e34a20

Stepan Dyatkovskiy authored May 20, 2013

Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:

Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------ ...

FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.

We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.

This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.

llvm-svn: 182237

d0e34a20

Also expand 64-bit bitcasts. · f9278003
Jakob Stoklund Olesen authored May 20, 2013
```
llvm-svn: 182229
```
f9278003
Implement spill and fill of I64Regs. · c7bc5fbc
Jakob Stoklund Olesen authored May 20, 2013
```
llvm-svn: 182228
```
c7bc5fbc
Mark i64 SETCC as expand so it is turned into a SELECT_CC. · 751e9b84
Jakob Stoklund Olesen authored May 20, 2013
```
llvm-svn: 182227
```
751e9b84
Replace some bit operations with simpler ones. No functionality change. · 8bad66e5
Benjamin Kramer authored May 19, 2013
```
llvm-svn: 182226
```
8bad66e5

May 19, 2013
- Don't use %g0 to materialize 0 directly. · 86c5469d
  Jakob Stoklund Olesen authored May 19, 2013
```
The wired physreg doesn't work on tied operands like on MOVXCC.

Add a README note to fix this later.

llvm-svn: 182225
```
  86c5469d
- Select i64 values with %icc conditions. · 92ebf115
  Jakob Stoklund Olesen authored May 19, 2013
```
llvm-svn: 182224
```
  92ebf115
- Add floating point selects on %xcc predicates. · 7ca944b9
  Jakob Stoklund Olesen authored May 19, 2013
```
llvm-svn: 182222
```
  7ca944b9
- Implement SPselectfcc for i64 operands. · 4a78c86a
  Jakob Stoklund Olesen authored May 19, 2013
```
Also clean up the arguments to all the MOVCC instructions so the
operands always are (true-val, false-val, cond-code).

llvm-svn: 182221
```
  4a78c86a
- [Sparc] Rearrange integer registers' allocation order so that register... · 3320e5a9
  Venkatraman Govindaraju authored May 19, 2013
```
[Sparc] Rearrange integer registers' allocation order so that register allocator will use I and G registers before using L and O registers.

Also, enable registers %g2-%g4 to be used in application and %g5 in 64 bit mode.

llvm-svn: 182219
```
  3320e5a9
- Handle i64 FrameIndex nodes in SPARC v9 mode. · ead983ce
  Jakob Stoklund Olesen authored May 19, 2013
```
llvm-svn: 182216
```
  ead983ce
May 18, 2013

Check InlineAsm clobbers in PPCCTRLoops · 2f474f0e

Hal Finkel authored May 18, 2013

We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.

llvm-svn: 182191

2f474f0e

AArch64: add CMake dependency to fix very parallel builds · fd2639f7
Tim Northover authored May 18, 2013
```
llvm-svn: 182190
```
fd2639f7

X86: Bad peephole interaction between adc, MOV32r0 · 5ba473af

David Majnemer authored May 18, 2013

The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.

The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.

Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.

llvm-svn: 182184

5ba473af

Add LLVMContext argument to getSetCCResultType · 75865923
Matt Arsenault authored May 18, 2013
```
llvm-svn: 182180
```
75865923

Support unaligned load/store on more ARM targets · 97b08c40

JF Bastien authored May 17, 2013

This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.

The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).

The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.

I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.

llvm-svn: 182175

97b08c40

Fix the build in c++11 mode. · 5986ce0e

Rafael Espindola authored May 17, 2013

The errors were:

non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list

and

non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list

llvm-svn: 182168

5986ce0e

May 17, 2013

R600: Lower int_load_input to copyFromReg instead of Register node · d3fcb501
Vincent Lejeune authored May 17, 2013
```
It solves a bug uncovered by dot4 patch where the register class of
int_load_input use was ignored.

llvm-svn: 182130
```
d3fcb501
R600: Use bottom up scheduling algorithm · 3d5118ca
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182129
```
3d5118ca

R600: Use depth first scheduling algorithm · 4c81d4da

Vincent Lejeune authored May 17, 2013

It should increase PV substitution opportunities and lower gpr
usage (pending computations path are "flushed" sooner)

llvm-svn: 182128

4c81d4da

R600: Replace big texture opcode switch in scheduler by usesTC/usesVC · e958c8e0
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182127
```
e958c8e0

R600: Relax some vector constraints on Dot4. · 519f21ee

Vincent Lejeune authored May 17, 2013

Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.

llvm-svn: 182126

519f21ee

R600: Improve texture handling · d3eed66e
Vincent Lejeune authored May 17, 2013
```
llvm-svn: 182125
```
d3eed66e

R600: Rename 128 bit registers. · 4ebef18a

Vincent Lejeune authored May 17, 2013

Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.

llvm-svn: 182124

4ebef18a