- Jul 13, 2013
-
-
Akira Hatanaka authored
llvm-svn: 186222
-
Eric Christopher authored
llvm-svn: 186212
-
- Jul 12, 2013
-
-
Benjamin Kramer authored
llvm-svn: 186196
-
Arnold Schwaighofer authored
radar://14351991 llvm-svn: 186189
-
Arnold Schwaighofer authored
Fixes a 35% degradation compared to unvectorized code in MiBench/automotive-susan and an equally serious regression on a private image processing benchmark. radar://14351991 llvm-svn: 186188
-
Arnold Schwaighofer authored
Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186182
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186181
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186180
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186179
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186178
-
Tom Stellard authored
Patch by: Niels Ole Salscheider Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186177
-
Benjamin Kramer authored
In particular: movsbw %al, %ax --> cbtw movswl %ax, %eax --> cwtl movslq %eax, %rax --> cltq According to Intel's manual those have the same performance characteristics but come with a smaller encoding. llvm-svn: 186174
-
Stephen Lin authored
Patch by Andrea Di Biagio llvm-svn: 186165
-
Vladimir Medic authored
llvm-svn: 186151
-
Richard Sandiford authored
Normal (sext (setcc ...)) sequences are optimised into (select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND. However, this is deliberately not done for vectors, and after vector type legalization we have (sext_inreg (setcc ...)) instead. I wondered about trying to extend DAGCombiner to handle this case too, but it seemed to be a loss on some other targets I tried, even those for which SETCC isn't "legal" and SELECT_CC is. llvm-svn: 186149
-
Richard Sandiford authored
GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly because the name-to-regno mapping depends on the value type and (because of that) the internal names in RegStrings are not the same as the AsmName. CC constraints like "{cc}" didn't work either because there was no associated register class. llvm-svn: 186148
-
Richard Sandiford authored
If the source of these instructions is spilled we should load the destination. If the destination is spilled we should store the source. llvm-svn: 186147
-
Charles Davis authored
Summary: This patch adds explicit calling convention types for the Win64 and System V/x86-64 ABIs. This allows code to override the default, and use the Win64 convention on a target that wants to use SysV (and vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU attributes. Reviewers: CC: llvm-svn: 186144
-
- Jul 11, 2013
-
-
Hal Finkel authored
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108
-
Hal Finkel authored
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI with an out-of-range operand. Most uses of the isRunOfOnes function are guarded by a condition that the value is not zero. This was not true in two places, and in both places a zero input would result in an out-of-rage MB value (= 32). To fix this, isRunOfOnes returns false on a zero input (and I've remove one now-redundant guard). llvm-svn: 186101
-
Richard Sandiford authored
Extend r186072 to handle shifts and ANDs. llvm-svn: 186073
-
Richard Sandiford authored
RISBG can handle some ANDs for which no AND IMMEDIATE exists. It also acts as a three-operand AND for some cases where an AND IMMEDIATE could be used instead. It might be worth adding a pass to replace RISBG with AND IMMEDIATE in cases where the register operands end up being the same and where AND IMMEDIATE is smaller. llvm-svn: 186072
-
Richard Sandiford authored
RISBG has three 8-bit operands (I3, I4 and I5). I'd originally restricted all three to 6 bits, since that's the only range we intended to use at the time. However, the top bit of I4 acts as a "zero" flag for RISBG, while the top bit of I3 acts as a "test" flag for RNSBG & co. This patch therefore allows them to have the full 8-bit range. I've left the fifth operand as a 6-bit value for now since the upper 2 bits have no defined meaning. llvm-svn: 186070
-
- Jul 10, 2013
-
-
Aaron Ballman authored
llvm-svn: 186017
-
Craig Topper authored
llvm-svn: 186013
-
Michel Danzer authored
Enough for the radeonsi driver to use it for calculating derivatives. Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186012
-
Michel Danzer authored
lit test coverage to follow in the next commit. Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186011
-
Michel Danzer authored
Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186010
-
Michel Danzer authored
Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186009
-
Michel Danzer authored
Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 186008
-
Hal Finkel authored
In discussing this change with Bill Schmidt, it was decided that the original comment about negative FIs was incorrect. We'll still exclude them for now, but now with a more-accurate explanation. llvm-svn: 186005
-
Vladimir Medic authored
llvm-svn: 186000
-
Vladimir Medic authored
llvm-svn: 185999
-
Stephen Lin authored
llvm-svn: 185995
-
Stephen Lin authored
Currently ARM is the only backend that supports FMA instructions (for at least some subtargets) but does not implement this virtual, so FMAs are never generated except from explicit fma intrinsic calls. Apparently this is due to the fact that it supports both fused (one rounding step) and unfused (two rounding step) multiply + add instructions. This patch clarifies that this the case without changing behavior by implementing the virtual function to simply return false, as the default TargetLoweringBase version does. It is possible that some cpus perform the fused version faster than the unfused version and vice-versa, so the function implementation should be revisited if hard data is found. llvm-svn: 185994
-
Jim Grosbach authored
Propagate the fix from r185712 to Thumb2 codegen as well. Original commit message applies here as well: A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them in the bottom half of "x". An arithmetic and logic shift are only equivalent in this context if the shift amount is 16. We would be shifting in ones into the bottom 16bits instead of zeros if "y" is negative. rdar://14338767 llvm-svn: 185982
-
- Jul 09, 2013
-
-
Bill Schmidt authored
A more complete example of the bug in PR16556 was recently provided, showing that the previous fix was not sufficient. The previous fix is reverted herein. The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as custom lowering for FP_TO_SINT during type legalization, without checking whether the input type is handled by that routine. LowerFP_TO_INT requires the input to be f32 or f64, so we fail when the input is ppcf128. I'm leaving the test case from the initial fix (r185821) in place, and adding the new test as another crash-only check. llvm-svn: 185959
-
Stephen Lin authored
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956
-
Ulrich Weigand authored
[PowerPC] Revert r185476 and fix up TLS variant kinds In the commit message to r185476 I wrote: >The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD >correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. >This causes some confusion with the asm parser, since VK_PPC_TLSGD >is output as @tlsgd, which is then read back in as VK_TLSGD. > >To avoid this confusion, this patch removes the PowerPC-specific >modifiers and uses the generic modifiers throughout. (The only >drawback is that the generic modifiers are printed in upper case >while the usual convention on PowerPC is to use lower-case modifiers. >But this is just a cosmetic issue.) This was unfortunately incorrect, there is is fact another, serious drawback to using the default VK_TLSLD/VK_TLSGD variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT to return true, which in turn causes the ELFObjectWriter to emit an undefined reference to _GLOBAL_OFFSET_TABLE_. This is a problem on powerpc64, because it uses the TOC instead of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_, so the symbol remains undefined. This means shared libraries using TLS built with the integrated assembler are currently broken. While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation probably ought to be properly fixed at some point, for now I'm simply reverting the r185476 commit. Now this in turn exposes the breakage of handling @tlsgd/@tlsld in the asm parser that this check-in was originally intended to fix. To avoid this regression, I'm also adding a different fix for this problem: while common code now parses @tlsgd as VK_TLSGD, a special hack in the asm parser translates this code to the platform-specific VK_PPC_TLSGD that the back-end now expects. While this is not really pretty, it's self-contained and shouldn't hurt anything else for now. One the underlying problem is fixed, this hack can be reverted again. llvm-svn: 185945
-