- Dec 12, 2014
-
-
Charlie Turner authored
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet supported, there is not a user switch to change this behaviour. This build attribute should capture the default behaviour of the compiler, which is to expose the IEEE 754 version of __fp16. When -mfp16-format is emitted, that will be the way to control the value of this build attribute. Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0 llvm-svn: 224115
-
- Dec 11, 2014
-
-
Tim Northover authored
Quite a major error here: the expansions for the Pseudos with and without folded load were mixed up. Fortunately it only affects ARM-mode, when not using movw/movt, on Darwin. I'm guessing no-one actually uses that combination. llvm-svn: 223986
-
- Dec 10, 2014
-
-
Ahmed Bougacha authored
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 223862
-
- Dec 09, 2014
-
-
Ahmed Bougacha authored
llvm-svn: 223841
-
Ahmed Bougacha authored
It was missing from the VLD1/VST1 handling logic, even though the corresponding instructions exist (same form as v2i64). In preparation for a future patch. llvm-svn: 223832
-
- Dec 05, 2014
-
-
Charlie Turner authored
The test file test/CodeGen/ARM/build-attributes.ll was missing several floating-point build attribute tests. The intention of this commit is that for each CPU / architecture currently tested, there are now tests that make sure the following attributes are sufficiently checked, * Tag_ABI_FP_rounding * Tag_ABI_FP_denormal * Tag_ABI_FP_exceptions * Tag_ABI_FP_user_exceptions * Tag_ABI_FP_number_model Also in this commit, the -unsafe-fp-math flag has been augmented with the full suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is, `-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim -enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast' Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc llvm-svn: 223454
-
- Dec 04, 2014
-
-
Jon Roelofs authored
So there are a couple of issues with indirect calls on thumbv4t. First, the most 'obvious' instruction, 'blx' isn't available until v5t. And secondly, the next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code because the saved off pc has its thumb bit cleared, so when the callee returns we end up in ARM mode.... yuck. The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it. We could cut down on code size by sharing the landing pads between call sites that are close enough, but for the moment let's do correctness first and look at performance later. Patch by: Iain Sandoe http://reviews.llvm.org/D6519 llvm-svn: 223380
-
- Dec 03, 2014
-
-
Charlie Turner authored
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When the user has specified this option, the Tag_ABI_FP_rounding attribute should be emitted with value 1. This option currently does not appear to disable transformations and optimizations that assume default floating point rounding behavior, AFAICT, but the intention should be recorded in the build attributes, regardless of what the compiler actually does with the intention. Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e llvm-svn: 223218
-
Charlie Turner authored
Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393 llvm-svn: 223217
-
- Dec 02, 2014
-
-
Charlie Turner authored
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of relevance to this patch is that input denormals are flushed to zero. The way in which they're flushed to zero depends on the architecture, * For VFPv2, it is implementation defined as to whether the sign of zero is preserved. * For VFPv3 and above, the sign of zero is always preserved when a denormal is flushed to zero. When FP support has been disabled, the strategy taken by this patch is to assume the software support will mirror the behaviour of the hardware support for the target *if it existed*. That is, for architectures which can only have VFPv2, it is assumed the software will flush to positive zero. For later architectures it is assumed the software will flush to zero preserving sign. Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd llvm-svn: 223110
-
- Dec 01, 2014
-
-
Reid Kleckner authored
Previously we just used "cc 10" in the .ll files, but that isn't very human readable. llvm-svn: 223076
-
Tim Northover authored
Patch by Ben Gamari. llvm-svn: 223055
-
- Nov 27, 2014
-
-
Charlie Turner authored
The string data for string-valued build attributes were being unconditionally uppercased. There is no mention in the ARM ABI addenda about case conventions, so it's technically implementation defined as to whether the data are capitialised in some way or not. However, there are good reasons not to captialise the data. * It's less work. * Some vendors may legitimately have case-sensitive checks for these attributes which would fail on LLVM generated object files. * There could be locale issues with uppercasing. The original reasons for uppercasing appear to have stemmed from an old codesourcery toolchain behaviour, see http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133 This patch makes the object file emitted no longer captialise string data, it encodes as seen in the assembly source. Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538 llvm-svn: 222882
-
- Nov 17, 2014
-
-
Renato Golin authored
The triple parser should only accept existing architecture names when the triple starts with armv, armebv, thumbv or thumbebv. Patch by Gabor Ballabas. llvm-svn: 222129
-
Oliver Stannard authored
This was motivated by a bug which caused code like this to be miscompiled: declare void @take_ptr(i8*) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8* %addr1) ret void } This was emitting the following assembly to get the value of %addr1: add r0, sp, #1020 add r0, r0, #8 However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this could not be assembled. The generated object file contained this, resulting in r0 holding SP+8 rather tha SP+1028: add r0, sp, #1020 add r0, sp, #8 This function looked like it could have caused miscompilations for other combinations of registers and offsets (though I don't think it is currently called with these), and the heuristic it used did not match the emitted code in all cases. llvm-svn: 222125
-
Oliver Stannard authored
Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123
-
- Nov 14, 2014
-
-
Tim Northover authored
We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057
-
Tim Northover authored
When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056
-
Tim Northover authored
The test's DWARF stubs were there just to trigger the emission of .cfi directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind info, so it's easier to just use that triple. llvm-svn: 222055
-
- Nov 13, 2014
-
-
Tim Northover authored
We were using a naive heuristic to determine whether a basic block already had an unconditional branch at the end. This mostly corresponded to reality (assuming branches got optimised) because there's not much point in a branch to the next block, but could go wrong. llvm-svn: 221904
-
Tim Northover authored
Creating tests for the ConstantIslands pass is very difficult, since it depends on precise layout details. Having the ability to precisely inject a number of bytes into the stream helps greatly. llvm-svn: 221903
-
- Nov 11, 2014
-
-
Tom Roeder authored
This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708
-
Oliver Stannard authored
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with (set_cc !cc x y), which is only correct when the xor has type i1. Instead, we should check that the constant operand to the xor is all ones. llvm-svn: 221693
-
- Nov 06, 2014
-
-
Lang Hames authored
This test case was never actually testing the trivial spiller: the -spiller option has not been hooked up for a while now. llvm-svn: 221475
-
Rafael Espindola authored
llvm-svn: 221459
-
- Nov 05, 2014
-
-
Tim Northover authored
We currently try to push an even number of registers to preserve 8-byte alignment during a function's prologue, but only when the stack alignment is prcisely 8. Many of the reasons for doing it apply also when that alignment > 8 (the extra store is often free, and can save another stack adjustment, though less frequently for 16-byte stack alignment). llvm-svn: 221321
-
Tim Northover authored
We were making an attempt to do this by adding an extra callee-saved GPR (so that there was an even number in the list), but when that failed we went ahead and pushed anyway. This had a couple of potential issues: + The .cfi directives we emit misplaced dN because they were based on PrologEpilogInserter's calculation. + Unaligned stores can be less efficient. + Unaligned stores can actually fault (likely only an issue in niche cases, but possible). This adds a final explicit stack adjustment if all other options fail, so that the actual locations of the registers match up with where they should be. llvm-svn: 221320
-
- Nov 03, 2014
-
-
Charlie Turner authored
This CPU definition is redundant. The Cortex-A9 is defined as supporting multiprocessing extensions. Remove its definition and update appropriate tests. LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only difference between the two CPU definitions in ARM.td is that cortex-a9-mp contains the feature FeatureMP for multiprocessing extensions. This is redundant since the Cortex-A9 is defined as having multiprocessing extensions in the TRMs. armcc also defines the Cortex-A9 as having multiprocessing extensions by default. Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7 llvm-svn: 221166
-
- Oct 31, 2014
-
-
Quentin Colombet authored
This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. ** Context ** Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. ** Motivating Example ** Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. ** Proposed Solution ** To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978
-
- Oct 30, 2014
-
-
Tim Northover authored
It should be on for every target that supports unaligned accesses (e.g. not v6m). Patch by Charlie Turner. llvm-svn: 220912
-
- Oct 27, 2014
-
-
Oliver Stannard authored
Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671
-
- Oct 23, 2014
-
-
Renato Golin authored
This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486
-
Akira Hatanaka authored
This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470
-
- Oct 20, 2014
-
-
Tim Northover authored
The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236
-
Oliver Stannard authored
The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196
-
- Oct 15, 2014
-
-
Tim Northover authored
Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734
-
Tim Northover authored
There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733
-
- Oct 14, 2014
-
-
Tim Northover authored
Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719
-
- Oct 13, 2014
-
-
Renato Golin authored
Patch by Matthew Wahab. llvm-svn: 219606
-
- Oct 08, 2014
-
-
Renato Golin authored
Patch by Charlie Turner. llvm-svn: 219301
-