- Aug 21, 2014
-
-
Jon Roelofs authored
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to be avoided. This patch trades simplicity for implementation time at the expense of performance... As they say: correctness first, then performance. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few ideas on how to make this better. llvm-svn: 216138
-
- Aug 20, 2014
-
-
Quentin Colombet authored
the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088
-
Yi Kong authored
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 llvm-svn: 216064
-
- Aug 19, 2014
-
-
Juergen Ributzka authored
Note: This was originally reverted to track down a buildbot error. This commit exposed a latent bug that was fixed in r215753. Therefore it is reapplied without any modifications. I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new regeressions. Original commit message: This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 216006
-
- Aug 18, 2014
-
-
Oliver Stannard authored
Externally-defined functions with weak linkage should not be tail-called on ARM or AArch64, as the AAELF spec requires normal calls to undefined weak functions to be replaced with a NOP or jump to the next instruction. The behaviour of branch instructions in this situation (as used for tail calls) is implementation-defined, so we cannot rely on the linker replacing the tail call with a return. llvm-svn: 215890
-
Saleem Abdulrasool authored
The set of functions defined in the RTABI was separated for no real reason. This brings us closer to proper utilisation of the functions defined by the RTABI. It also sets the ground for correctly emitting function calls to AEABI functions on all AEABI conforming platforms. The previously existing lie on the behaviour of __ldivmod and __uldivmod is propagated as it is beyond the scope of the change. The changes to the test are due to the fact that we now use the divmod functions which return both the quotient and remainder and thus we no longer need to invoke two functions on Linux (making it closer to EABI's behaviour). llvm-svn: 215862
-
- Aug 15, 2014
-
-
Chad Rosier authored
Phabricator Revision: http://reviews.llvm.org/D4935 llvm-svn: 215772
-
Juergen Ributzka authored
Thanks Jim for finding this. llvm-svn: 215733
-
Juergen Ributzka authored
FastEmit_i won't always succeed to materialize an i32 constant and just fail. This would trigger a fall-back to SelectionDAG, which is really not necessary. This fix will first fall-back to a constant pool load to materialize the constant before giving up for good. This fixes <rdar://problem/18022633>. llvm-svn: 215682
-
- Aug 14, 2014
-
-
Juergen Ributzka authored
This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673
-
Sanjay Patel authored
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
-
Juergen Ributzka authored
This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588
-
- Aug 13, 2014
-
-
Juergen Ributzka authored
This change is also in preparation for a future change to make sure that the constant materialization uses MOVT/MOVW when available and not a load from the constant pool. llvm-svn: 215584
-
- Aug 11, 2014
-
-
Saleem Abdulrasool authored
For many Thumb-1 register register instructions, setting the CPSR is not permitted inside an IT block. We would not correctly flag those instructions. The previous change to identify this scenario was insufficient as it did not actually catch all the instances. The current list is formed by manual inspection of the ARMv6M ARM. The change to the Thumb2 IT block test is due to the fact that the new more stringent checking of the MIs results in the If Conversion pass being prevented from executing (since not all the instructions in the BB are predicable). This results in code gen changes. Thanks to Tim Northover for pointing out that the previous patch was insufficient and hinting that the use of the v6M ARM would be much easier to use than the v7 or v8! llvm-svn: 215382
-
Sanjay Patel authored
Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp. The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed. Differential Revision: http://reviews.llvm.org/D4846 llvm-svn: 215377
-
Oliver Stannard authored
By default, LLVM uses the "C" calling convention for all runtime library functions. The half-precision FP conversion functions use the soft-float calling convention, and are needed for some targets which use the hard-float convention by default, so must have their calling convention explicitly set. llvm-svn: 215348
-
Saleem Abdulrasool authored
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to an ordering of Thumb 2 Size Reduction and If Conversion, we would end up generating a THUMB MULS inside an IT block. The If Conversion pass uses the TTI isPredicable method to ensure that it can transform a Basic Block. However, because we only check for IT handling on Thumb2 functions, we may miss some cases. Even then, it only validates that the CPSR is not *live* rather than it is not accessed. This corrects the handling for that particular case since the same restriction does not hold on the vast majority of the instructions. This does prevent the IfConversion optimization from kicking in in certain cases, but generating correct code is more valuable. Addresses PR20555. llvm-svn: 215328
-
- Aug 08, 2014
-
-
Adrian Prantl authored
Thanks to dblaikie for pointing this out! llvm-svn: 215166
-
Adrian Prantl authored
llvm-svn: 215160
-
- Aug 07, 2014
-
-
Akira Hatanaka authored
BranchFolderPass was not correctly setting the basic block branch weights when tail-merging created or merged blocks. This patch recomutes the weights of tail-merged blocks using the following formula: branch_weight(merged block to successor j) = sum(block_frequency(bb) * branch_probability(bb -> j)) bb is a block that is in the set of merged blocks. <rdar://problem/16256423> llvm-svn: 215135
-
- Aug 06, 2014
-
-
Tim Northover authored
Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959
-
Tim Northover authored
llvm-svn: 214958
-
David Blaikie authored
This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939
-
David Blaikie authored
DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not including a 13th field, had some subtle behavior. Without the 13th field, the "emission kind" field defaults to 0 (which is not equal to either of the values of the emission kind enum (1 == full debug info, 2 == line tables only)). In this particular instance, the comparison with "FullDebugInfo" was done when adding elements to the ranges list - so for these test cases no values were added to the ranges list. This got weirder when emitting debug_loc entries as the addresses should be relative to the range of the CU if the CU has only one range (the reasonable assumption is that if we're emitting debug_loc lists for a CU that CU has at least one range - but due to the above situation, it has zero) so the ranges were emitted relative to the start of the section rather than relative to the start of the CU's singular range. Fix these tests by accounting for the difference in the description of debug_loc entries (in some cases making the test ignorant to these differences, in others adding the extra label difference expression, etc) or the presence/absence of high/low_pc on the CU, and add the 13th field to their CUs to enable proper "full debug info" emission here. In a future commit I'll fix up a bunch of other test cases that are not so rigorously depending on this behavior, but still doing similarly weird things due to the missing 13th field. llvm-svn: 214937
-
- Aug 05, 2014
-
-
Jon Roelofs authored
This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928
-
Sanjay Patel authored
1. Added ':' to CHECK-LABELs 2. Added more CHECKs 3. Added CHECK-NEXTs 4. Added verbose hex immediate comments to CHECKs llvm-svn: 214921
-
Jon Roelofs authored
llvm-svn: 214893
-
Sanjay Patel authored
Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
-
Jon Roelofs authored
POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881
-
David Blaikie authored
It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of the global symbol being used as an address in the addressing mode, but this avoids the dependence on hardcoded set labels that keep changing (5+ commits over the last few years that each update the set label as it changes due to other, unrelated differences in output). This could've, instead, been changed to match the set name then match the name in the string pool but that would present other issues (needing to skip over the sets that weren't of interest, etc) and checking that the addresses (granted, without relocations applied - so it's not the whole story) match in the two variable location descriptions seems sufficient and fairly stable here. There are a few similar other tests with similar label dependence that I'll update soonish. llvm-svn: 214878
-
- Aug 02, 2014
-
-
Akira Hatanaka authored
expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used in pic mode. This patch fixes the bug. <rdar://problem/17886592> llvm-svn: 214614
-
- Aug 01, 2014
-
-
Juergen Ributzka authored
This is a followup patch for r214366, which added the same behavior to the AArch64 and X86 FastISel code. This fix reproduces the already existing behavior of SelectionDAG in FastISel. llvm-svn: 214531
-
- Jul 31, 2014
-
-
Rafael Espindola authored
Before this patch we had @a = weak global ... but @b = alias weak ... The patch changes aliases to look more like global variables. Looking at some really old code suggests that the reason was that the old bison based parser had a reduction for alias linkages and another one for global variable linkages. Putting the alias first avoided the reduce/reduce conflict. The days of the old .ll parser are long gone. The new one parses just "linkage" and a later check is responsible for deciding if a linkage is valid in a given context. llvm-svn: 214355
-
- Jul 29, 2014
-
-
Tim Northover authored
ARM does actually define the name for this conversion, so we should use it on "-eabi" platforms. llvm-svn: 214176
-
Tim Northover authored
We need to make sure we use the softened version of all appropriate operands in the libcall, or things go horribly wrong. This may entail actually executing a 1-stage softening. llvm-svn: 214175
-
- Jul 25, 2014
-
-
Akira Hatanaka authored
address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967
-
David Blaikie authored
* Add CUs to the named CU node * Add missing DW_TAG_subprogram nodes * Add llvm::Functions to the DW_TAG_subprogram nodes This cleans up the tests so that they don't break under a soon-to-be-made change that is more strict about such things. llvm-svn: 213951
-
Amara Emerson authored
Patch by Ben Foster! Differential Revision: http://reviews.llvm.org/D4657 llvm-svn: 213944
-
NAKAMURA Takumi authored
llvm-svn: 213933
-
NAKAMURA Takumi authored
It sometimes confuses FileCheck. Consider the case that path contains 'stmib'. :) llvm-svn: 213932
-