- May 20, 2013
-
-
Justin Holewinski authored
llvm-svn: 182297
-
Hal Finkel authored
As the pairing of this instruction form with the bdnz/bdz branches is now enforced by the verification pass, make it clear from the name that these are used only for counter-based loops. No functionality change intended. llvm-svn: 182296
-
Hal Finkel authored
When asserts are enabled, this adds a verification pass for PPC counter-loop formation. Unfortunately, without sacrificing code quality, there is no better way of forming counter-based loops except at the (late) IR level. This means that we need to recognize, at the IR level, anything which might turn into a function call (or indirect branch). Because this is currently a finite set of things, and because SelectionDAG lowering is basic-block local, this can be done. Nevertheless, it is fragile, and failure results in a miscompile. This verification pass checks that all (reachable) counter-based branches are dominated by a loop mtctr instruction, and that no instructions in between clobber the counter register. If these conditions are not satisfied, then an ICE will be triggered. In short, this is to help us sleep better at night. llvm-svn: 182295
-
Benjamin Kramer authored
R600TextureIntrinsicsReplacer.cpp:232: warning: the address of ‘ArgsType’ will always evaluate as ‘true’ This doesn't have any effect on the output as a vararg intrinsic behaves the same way as a non-vararg one. llvm-svn: 182293
-
Tom Stellard authored
This will simplify the instructions and also the pattern definitions. Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182288
-
Tom Stellard authored
Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182287
-
Tom Stellard authored
Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182286
-
Tom Stellard authored
The hardware supports rotr and not rotl. llvm-svn: 182285
-
Tom Stellard authored
Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182284
-
Tom Stellard authored
This makes it possible to reorder the operands without breaking the encoding. Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182283
-
Tom Stellard authored
Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182282
-
Mihai Popa authored
VSTn instructions have a number of encoding constraints which are not implemented. I have added these using wrapper methods around the original custom decoder (incidentally - this is a huge poorly written method that should be cleaned up. I have left it as is since the changes would be much to hard to review). llvm-svn: 182281
-
Mihai Popa authored
Q registers are encoded in fields of the same length as D registers. As Q registers are half as many, the ARM reference manual mandates the least significant bit to be zeroed out. Failure to do so should result in an undefined instruction. With this change test/MC/Disassembler/ARM/invalid-VQADD-arm.txt is passing (removed XFAIL). llvm-svn: 182279
-
Richard Sandiford authored
Before this change, the SystemZ backend would use BRCL for all branches and only consider shortening them to BRC when generating an object file. E.g. a branch on equal would use the JGE alias of BRCL in assembly output, but might be shortened to the JE alias of BRC in ELF output. This was a useful first step, but it had two problems: (1) The z assembler isn't traditionally supposed to perform branch shortening or branch relaxation. We followed this rule by not relaxing branches in assembler input, but that meant that generating assembly code and then assembling it would not produce the same result as going directly to object code; the former would give long branches everywhere, whereas the latter would use short branches where possible. (2) Other useful branches, like COMPARE AND BRANCH, do not have long forms. We would need to do something else before supporting them. (Although COMPARE AND BRANCH does not change the condition codes, the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction during codegen, so that we can safely lower it to a separate compare and long branch where necessary. This is not a valid transformation for the assembler proper to make.) This patch therefore moves branch relaxation to a pre-emit pass. For now, calls are still shortened from BRASL to BRAS by the assembler, although this too is not really the traditional behaviour. The first test takes about 1.5s to run, and there are likely to be more tests in this vein once further branch types are added. The feeling on IRC was that 1.5s is a bit much for a single test, so I've restricted it to SystemZ hosts for now. The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests. A later patch will remove the {{g}}s from that directory. llvm-svn: 182274
-
Justin Holewinski authored
This converter currently only handles global variables in address space 0. For these variables, they are promoted to address space 1 (global memory), and all uses are updated to point to the result of a cvta.global instruction on the new variable. The motivation for this is address space 0 global variables are illegal since we cannot declare variables in the generic address space. Instead, we place the variables in address space 1 and explicitly convert the pointer to address space 0. This is primarily intended to help new users who expect to be able to place global variables in the default address space. llvm-svn: 182254
-
Justin Holewinski authored
[NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels. llvm-svn: 182253
-
Stepan Dyatkovskiy authored
Introduction: In case when stack alignment is 8 and GPRs parameter part size is not N*8: we add padding to GPRs part, so part's last byte must be recovered at address K*8-1. We need to do it, since remained (stack) part of parameter starts from address K*8, and we need to "attach" "GPRs head" without gaps to it: Stack: |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes... [ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ... FIX: Note, once we added padding we need to correct *all* Arg offsets that are going after padded one. That's why we need this fix: Arg offsets were never corrected before this patch. See new test-cases included in patch. We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8. Though, stack area, allocated for recovered byval params, must satisfy "Size mod 8 = 0" restriction. This patch reduces stack usage for some cases: We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be "packed" with alignment 4 in some cases. llvm-svn: 182237
-
Jakob Stoklund Olesen authored
llvm-svn: 182229
-
Jakob Stoklund Olesen authored
llvm-svn: 182228
-
Jakob Stoklund Olesen authored
llvm-svn: 182227
-
Benjamin Kramer authored
llvm-svn: 182226
-
- May 19, 2013
-
-
Jakob Stoklund Olesen authored
The wired physreg doesn't work on tied operands like on MOVXCC. Add a README note to fix this later. llvm-svn: 182225
-
Jakob Stoklund Olesen authored
llvm-svn: 182224
-
Jakob Stoklund Olesen authored
llvm-svn: 182222
-
Jakob Stoklund Olesen authored
Also clean up the arguments to all the MOVCC instructions so the operands always are (true-val, false-val, cond-code). llvm-svn: 182221
-
Venkatraman Govindaraju authored
[Sparc] Rearrange integer registers' allocation order so that register allocator will use I and G registers before using L and O registers. Also, enable registers %g2-%g4 to be used in application and %g5 in 64 bit mode. llvm-svn: 182219
-
Jakob Stoklund Olesen authored
llvm-svn: 182216
-
- May 18, 2013
-
-
Hal Finkel authored
We don't need to reject all inline asm as using the counter register (most does not). Only those that explicitly clobber the counter register need to prevent the transformation. llvm-svn: 182191
-
Tim Northover authored
llvm-svn: 182190
-
David Majnemer authored
The peephole tries to reorder MOV32r0 instructions such that they are before the instruction that modifies EFLAGS. The problem is that the peephole does not consider the case where the instruction that modifies EFLAGS also depends on the previous state of EFLAGS. Instead, walk backwards until we find an instruction that has a def for EFLAGS but does not have a use. If we find such an instruction, insert the MOV32r0 before it. If it cannot find such an instruction, skip the optimization. llvm-svn: 182184
-
Matt Arsenault authored
llvm-svn: 182180
-
JF Bastien authored
This patch matches GCC behavior: the code used to only allow unaligned load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for v6+ Darwin as well as for v7+ on Linux and NaCl. The distinction is made because v6 doesn't guarantee support (but LLVM assumes that Apple controls hardware+kernel and therefore have conformant v6 CPUs), whereas v7 does provide this guarantee (and Linux/NaCl behave sanely). The patch keeps the -arm-strict-align command line option, and adds -arm-no-strict-align. They behave similarly to GCC's -mstrict-align and -mnostrict-align. I originally encountered this discrepancy in FastIsel tests which expect unaligned load/store generation. Overall this should slightly improve performance in most cases because of reduced I$ pressure. llvm-svn: 182175
-
Rafael Espindola authored
The errors were: non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list and non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list llvm-svn: 182168
-
- May 17, 2013
-
-
Vincent Lejeune authored
It solves a bug uncovered by dot4 patch where the register class of int_load_input use was ignored. llvm-svn: 182130
-
Vincent Lejeune authored
llvm-svn: 182129
-
Vincent Lejeune authored
It should increase PV substitution opportunities and lower gpr usage (pending computations path are "flushed" sooner) llvm-svn: 182128
-
Vincent Lejeune authored
llvm-svn: 182127
-
Vincent Lejeune authored
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register coalescer to remove some unneeded COPY. This patch also defines some structures/functions that can be used to handle every vector instructions (CUBE, Cayman special instructions...) in a similar fashion. llvm-svn: 182126
-
Vincent Lejeune authored
llvm-svn: 182125
-
Vincent Lejeune authored
Almost all instructions that takes a 128 bits reg as input (fetch, export...) have the abilities to swizzle their argument and output. Instead of printing default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions print potentially optimized swizzles themselves. llvm-svn: 182124
-