- May 16, 2013
-
-
Akira Hatanaka authored
invalid instruction sequence. Rather than emitting an int-to-FP move instruction and an int-to-FP conversion instruction during instruction selection, we emit a pseudo instruction which gets expanded post-RA. Without this change, register allocation can possibly insert a floating point register move instruction between the two instructions, which is not valid according to the ISA manual. mtc1 $f4, $4 # int-to-fp move instruction. mov.s $f2, $f4 # move contents of $f4 to $f2. cvt.s.w $f0, $f2 # int-to-fp conversion. llvm-svn: 182042
-
Jack Carter authored
This patch adds bnez and beqz instructions which represent alias definitions for bne and beq instructions as follows: bnez $rs,$imm => bne $rs,$zero,$imm beqz $rs,$imm => beq $rs,$zero,$imm The corresponding test cases are added. Patch by Vladimir Medic llvm-svn: 182040
-
Akira Hatanaka authored
llvm-svn: 182036
-
Akira Hatanaka authored
llvm-svn: 182035
-
Ulrich Weigand authored
[PowerPC] Use true offset value in "memrix" machine operands This is the second part of the change to always return "true" offset values from getPreIndexedAddressParts, tackling the case of "memrix" type operands. This is about instructions like LD/STD that only have a 14-bit field to encode immediate offsets, which are implicitly extended by two zero bits by the machine, so that in effect we can access 16-bit offsets as long as they are a multiple of 4. The PowerPC back end currently handles such instructions by carrying the 14-bit value (as it will get encoded into the actual machine instructions) in the machine operand fields for such instructions. This means that those values are in fact not the true offset, but rather the offset divided by 4 (and then truncated to an unsigned 14-bit value). Like in the case fixed in r182012, this makes common code operations on such offset values not work as expected. Furthermore, there doesn't really appear to be any strong reason why we should encode machine operands this way. This patch therefore changes the encoding of "memrix" type machine operands to simply contain the "true" offset value as a signed immediate value, while enforcing the rules that it must fit in a 16-bit signed value and must also be a multiple of 4. This change must be made simultaneously in all places that access machine operands of this type. However, just about all those changes make the code simpler; in many cases we can now just share the same code for memri and memrix operands. llvm-svn: 182032
-
Hal Finkel authored
On PPC32, i64 FP conversions are implemented using runtime calls (which clobber the counter register). These must be excluded. llvm-svn: 182023
-
Aaron Ballman authored
llvm-svn: 182018
-
Rafael Espindola authored
Without a PROLOG_LABEL present, the cfi instructions are never printed. llvm-svn: 182016
-
Ulrich Weigand authored
[PowerPC] Report true displacement value from getPreIndexedAddressParts DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to decompose a memory address into a base/offset pair. It expects the offset (if constant) to be the true displacement value in order to perform optional additional optimizations; in particular, to convert other uses of the original pointer into uses of the new base pointer after pre-increment. The PowerPC implementation of getPreIndexedAddressParts, however, simply calls SelectAddressRegImm, which returns a TargetConstant. This value is appropriate for encoding into the instruction, but it is not always usable as true displacement value: - Its type is always MVT::i32, even on 64-bit, where addresses ought to be i64 ... this causes the optimization to simply always fail on 64-bit due to this line in DAGCombiner: // FIXME: In some cases, we can be smarter about this. if (Op1.getValueType() != Offset.getValueType()) { - Its value is truncated to an unsigned 16-bit value if negative. This causes the above opimization to generate wrong code. This patch fixes both problems by simply returning the true displacement value (in its original type). This doesn't affect any other user of the displacement. llvm-svn: 182012
-
Richard Sandiford authored
llvm-svn: 182007
-
Patrik Hagglund authored
-Wunused-but-set-variable. Leftover from r181979. llvm-svn: 181993
-
Rafael Espindola authored
llvm-svn: 181982
-
Rafael Espindola authored
getExceptionHandlingType is not ExceptionHandling::DwarfCFI on xcore, so etFrameInstructions is never called. There is no point creating cfi instructions if they are never used. llvm-svn: 181979
-
Rafael Espindola authored
llvm-svn: 181975
-
Reed Kotler authored
This creates stubs that help Mips32 functions call Mips16 functions which have floating point parameters that are normally passed in floating point registers. llvm-svn: 181972
-
Derek Schuff authored
This reverts r181898. llvm-svn: 181944
-
Rafael Espindola authored
llvm-svn: 181941
-
Hal Finkel authored
Trying to unbreak the VS build by copying some undef code from Utils/LowerInvoke.cpp. llvm-svn: 181938
-
David Majnemer authored
Increase the number of instructions LLVM recognizes as setting the ZF flag. This allows us to remove test instructions that redundantly recalculate the flag. llvm-svn: 181937
-
- May 15, 2013
-
-
Hal Finkel authored
The old PPCCTRLoops pass, like the Hexagon pass version from which it was derived, could only handle some simple loops in canonical form. We cannot directly adapt the new Hexagon hardware loops pass, however, because the Hexagon pass contains a fundamental assumption that non-constant-trip-count loops will contain a guard, and this is not always true (the result being that incorrect negative counts can be generated). With this commit, we replace the pass with a late IR-level pass which makes use of SE to calculate the backedge-taken counts and safely generate the loop-count expressions (including any necessary max() parts). This IR level pass inserts custom intrinsics that are lowered into the desired decrement-and-branch instructions. The most fragile part of this new implementation is that interfering uses of the counter register must be detected on the IR level (and, on PPC, this also includes any indirect branches in addition to function calls). Also, to make all of this work, we need a variant of the mtctr instruction that is marked as having side effects. Without this, machine-code level CSE, DCE, etc. illegally transform the resulting code. Hopefully, this can be improved in the future. This new pass is smaller than the original (and much smaller than the new Hexagon hardware loops pass), and can handle many additional cases correctly. In addition, the preheader-creation code has been copied from LoopSimplify, and after we decide on where it belongs, this code will be refactored so that it can be explicitly shared (making this implementation even smaller). The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for the new Hexagon pass. There are a few classes of loops that this pass does not transform (noted by FIXMEs in the files), but these deficiencies can be addressed within the SE infrastructure (thus helping many other passes as well). llvm-svn: 181927
-
Rafael Espindola authored
We want the order to be deterministic on all platforms. NAKAMURA Takumi fixed that in r181864. This patch is just two small cleanups: * Move the function to the cpp file. It is only passed to array_pod_sort. * Remove the ppc implementation which is now redundant llvm-svn: 181910
-
NAKAMURA Takumi authored
llvm-svn: 181907
-
NAKAMURA Takumi authored
llvm-svn: 181906
-
Derek Schuff authored
This patch matches GCC behavior: the code used to only allow unaligned load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for v6+ Darwin as well as for v7+ on other targets. The distinction is made because v6 doesn't guarantee support (but LLVM assumes that Apple controls hardware+kernel and therefore have conformant v6 CPUs), whereas v7 does provide this guarantee (and Linux behaves sanely). Overall this should slightly improve performance in most cases because of reduced I$ pressure. Patch by JF Bastien llvm-svn: 181897
-
Ulrich Weigand authored
[PowerPC] Remove need for adjustFixupOffst hack Now that applyFixup understands differently-sized fixups, we can define fixup_ppc_lo16/fixup_ppc_lo16_ds/fixup_ppc_ha16 to properly be 2-byte fixups, applied at an offset of 2 relative to the start of the instruction text. This has the benefit that if we actually need to generate a real relocation record, its address will come out correctly automatically, without having to fiddle with the offset in adjustFixupOffset. Tested on both 64-bit and 32-bit PowerPC, using external and integrated assembler. llvm-svn: 181894
-
Richard Sandiford authored
Thanks to Ulrich Weigand for noticing that this instruction was missing. llvm-svn: 181893
-
Ulrich Weigand authored
[PowerPC] Correctly handle fixups of other than 4 byte size The PPCAsmBackend::applyFixup routine handles the case where a fixup can be resolved within the same object file. However, this routine is currently hard-coded to assume the size of any fixup is always exactly 4 bytes. This is sort-of correct for fixups on instruction text; even though it only works because several of what really would be 2-byte fixups are presented as 4-byte fixups instead (requiring another hack in PPCELFObjectWriter::adjustFixupOffset to clean it up). However, this assumption breaks down completely for fixups on data, which legitimately can be of any size (1, 2, 4, or 8). This patch makes applyFixup aware of fixups of varying sizes, introducing a new helper routine getFixupKindNumBytes (along the lines of what the ARM back end does). Note that in order to handle fixups of size 8, we also need to fix the return type of adjustFixupValue to uint64_t to avoid truncation. Tested on both 64-bit and 32-bit PowerPC, using external and integrated assembler. llvm-svn: 181891
-
Richard Sandiford authored
Based on an analysis by Ulrich Weigand. llvm-svn: 181882
-
Arnold Schwaighofer authored
The transformation happening here is that we want to turn a "mul(ext(X), ext(X))" into a "vmull(X, X)", stripping off the extension. We have to make sure that X still has a valid vector type - possibly recreate an extension to a smaller type. In case of a extload of a memory type smaller than 64 bit we used create a ext(load()). The problem with doing this - instead of recreating an extload - is that an illegal type is exposed. This patch fixes this by creating extloads instead of ext(load()) sequences. Fixes PR15970. radar://13871383 llvm-svn: 181842
-
- May 14, 2013
-
-
Bill Schmidt authored
Instruction added at request of Roman Divacky. Tested via asm-parser. llvm-svn: 181821
-
Jyotsna Verma authored
where possible. llvm-svn: 181817
-
Eric Christopher authored
a somewhat randomly chosen cpu that will minimize cpu specific differences on bots. llvm-svn: 181814
-
Eric Christopher authored
It's causing failures on the atom bot. llvm-svn: 181812
-
Eric Christopher authored
Patch by Andrea DiBiagio. llvm-svn: 181809
-
Jyotsna Verma authored
llvm-svn: 181805
-
Jyotsna Verma authored
llvm-svn: 181803
-
Bill Schmidt authored
The changes to CR spill handling missed a case for 32-bit PowerPC. The code in PPCFrameLowering::processFunctionBeforeFrameFinalized() checks whether CR spill has occurred using a flag in the function info. This flag is only set by storeRegToStackSlot and loadRegFromStackSlot. spillCalleeSavedRegisters does not call storeRegToStackSlot, but instead produces MI directly. Thus we don't see the CR is spilled when assigning frame offsets, and the CR spill ends up colliding with some other location (generally the FP slot). This patch sets the flag in spillCalleeSavedRegisters for PPC32 so that the CR spill is properly detected and gets its own slot in the stack frame. llvm-svn: 181800
-
Jyotsna Verma authored
llvm-svn: 181797
-
Tom Stellard authored
Patch by: Alex Deucher Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181792
-
Richard Sandiford authored
llvm-svn: 181777
-