- Mar 10, 2014
-
-
Benjamin Kramer authored
MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination. The testcase is from PR19092, but I think the bug described there is actually a clang issue. llvm-svn: 203489
-
Evan Cheng authored
optimize a call to a llvm intrinsic to something that invovles a call to a C library call, make sure it sets the right calling convention on the call. e.g. extern double pow(double, double); double t(double x) { return pow(10, x); } Compiles to something like this for AAPCS-VFP: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x) ret double %0 } declare double @llvm.pow.f64(double, double) #1 Simplify libcall (part of instcombine) will turn the above into: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %__exp10 = call double @__exp10(double %x) #1 ret double %__exp10 } declare double @__exp10(double) The pre-instcombine code works because calls to LLVM builtins are special. Instruction selection will chose the right calling convention for the call. However, the code after instcombine is wrong. The call to __exp10 will use the C calling convention. I can think of 3 options to fix this. 1. Make "C" calling convention just work since the target should know what CC is being used. This doesn't work because each function can use different CC with the "pcs" attribute. 2. Have Clang add the right CC keyword on the calls to LLVM builtin. This will work but it doesn't match the LLVM IR specification which states these are "Standard C Library Intrinsics". 3. Fix simplify libcall so the resulting calls to the C routines will have the proper CC keyword. e.g. %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1 This works and is the solution I implemented here. Both solutions #2 and #3 would work. After carefully considering the pros and cons, I decided to implement #3 for the following reasons. 1. It doesn't change the "spec" of the intrinsics. 2. It's a self-contained fix. There are a couple of potential downsides. 1. There could be other places in the optimizer that is broken in the same way that's not addressed by this. 2. There could be other calling conventions that need to be propagated by simplify-libcall that's not handled. But for now, this is the fix that I'm most comfortable with. llvm-svn: 203488
-
Eli Bendersky authored
[forgot to 'svn add' before committing r203483] llvm-svn: 203485
-
Sasa Stankovic authored
* Add masking instructions before loads and stores (in MC layer). * Add masking instructions after SP changes (in MC layer). * Forbid loads, stores and SP changes in delay slots (in MI layer). Differential Revision: http://llvm-reviews.chandlerc.com/D2904 llvm-svn: 203484
-
Adam Nemet authored
llvm-svn: 203472
-
Reed Kotler authored
llvm-svn: 203469
-
JF Bastien authored
llvm-svn: 203468
-
Matheus Almeida authored
llvm-svn: 203459
-
Tim Northover authored
The function was making too many assumptions about its input: 1. The NEON_VDUP optimisation was far too aggressive, assuming (I think) that the input would always be BUILD_VECTOR. 2. We were treating most unknown concats as legal (by returning Op rather than SDValue()). I think only concats of pairs of vectors are actually legal. http://llvm.org/PR19094 llvm-svn: 203450
-
Venkatraman Govindaraju authored
llvm-svn: 203424
-
- Mar 09, 2014
-
-
NAKAMURA Takumi authored
It choked i686 stage2. llvm-svn: 203386
-
David Majnemer authored
The grammar for LLVM IR is not well specified in any document but seems to obey the following rules: - Attributes which have parenthesized arguments are never preceded by commas. This form of attribute is the only one which ever has optional arguments. However, not all of these attributes support optional arguments: 'thread_local' supports an optional argument but 'addrspace' does not. Interestingly, 'addrspace' is documented as being a "qualifier". What constitutes a qualifier? I cannot find a definition. - Some attributes use a space between the keyword and the value. Examples of this form are 'align' and 'section'. These are always preceded by a comma. - Otherwise, the attribute has no argument. These attributes do not have a preceding comma. Sometimes an attribute goes before the instruction, between the instruction and it's type, or after it's type. 'atomicrmw' has 'volatile' between the instruction and the type while 'call' has 'tail' preceding the instruction. With all this in mind, it seems most consistent for 'inalloca' on an 'inalloca' instruction to occur before between the instruction and the type. Unlike the current formulation, there would be no preceding comma. The combination 'alloca inalloca' doesn't look particularly appetizing, perhaps a better spelling of 'inalloca' is down the road. llvm-svn: 203376
-
- Mar 08, 2014
-
-
Adam Nemet authored
llvm-svn: 203361
-
David Blaikie authored
llvm-svn: 203337
-
David Blaikie authored
Will fix this harder in a moment. llvm-svn: 203329
-
David Blaikie authored
Suggested by Adrian Prantl in code review for r203187 llvm-svn: 203323
-
Eric Christopher authored
Add a testcase based on sret.cpp where we can now hash the entire compile unit. llvm-svn: 203319
-
Adam Nemet authored
This is the new idiom: x<<(y&31) | x>>((0-y)&31) which is recognized as: x ROTL (y&31) The change refines matchRotateSub. In Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is Pos' & (OpSize - 1) we can just use Pos' instead of Pos. llvm-svn: 203315
-
Arnold Schwaighofer authored
be split and the result type widened. When the condition of a vselect has to be split it makes no sense widening the vselect and thereby widening the condition. We end up in an endless loop of widening (vselect result type) and splitting (condition mask type) doing this. Instead, split both the condition and the vselect and widen the result. I ran this over the test suite with i686 and mattr=+sse and saw no regressions. Fixes PR18036. llvm-svn: 203311
-
Adrian Prantl authored
horrible/fragile. rdar://problem/16264854 llvm-svn: 203309
-
- Mar 07, 2014
-
-
Sasa Stankovic authored
llvm-svn: 203298
-
David Blaikie authored
Suggested by Adrian Prantl in code review for r203187. llvm-svn: 203296
-
David Blaikie authored
llvm-svn: 203295
-
Tom Stellard authored
Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 203281
-
Tom Stellard authored
These are sometimes created by the shrink to boolean optimization in the globalopt pass. Reviewed-by:
Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 203280
-
David Blaikie authored
Code review feedback to r203187 from Oliver Stannard. Thanks! llvm-svn: 203256
-
Duncan P. N. Exon Smith authored
Be case-insensitive when processing .unreq directives. Patch by Lin Zuojian! llvm-svn: 203251
-
Tim Northover authored
This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. Patch by Manuel Jacob. llvm-svn: 203230
-
Tim Northover authored
Sequences of insertelement/extractelements are sometimes used to build vectorsr; this code tries to put them back together into shuffles, but could only produce a completely uniform shuffle types (<N x T> from two <N x T> sources). This should allow shuffles with different numbers of elements on the input and output sides as well. llvm-svn: 203229
-
Rafael Espindola authored
The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
-
Karthik Bhat authored
llvm-svn: 203198
-
David Blaikie authored
llvm-svn: 203192
-
David Blaikie authored
This removes a relocation from each subprogram, reducing link times, etc. llvm-svn: 203187
-
David Blaikie authored
llvm-svn: 203186
-
David Blaikie authored
llvm-svn: 203184
-
- Mar 06, 2014
-
-
Rafael Espindola authored
Clang now uses llvm.compiler.used for these cases. llvm-svn: 203174
-
Rafael Espindola authored
llvm-svn: 203173
-
Andrea Di Biagio authored
This patch teaches the DAGCombiner how to fold a binary OR between two shufflevector into a single shuffle vector when possible. The rules are: 1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1) 2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2) The DAGCombiner can take advantage of the fact that OR is commutative and compute two possible shuffle masks (Mask1 and Mask2) for the resulting shuffle node. Before folding a dag according to either rule 1 or 2, DAGCombiner verifies that the resulting shuffle mask is legal for the target. DAGCombiner would firstly try to fold according to 1.; If not possible then it will try to fold according to 2. If both Mask1 and Mask2 are illegal then we conservatively don't fold the OR instruction. llvm-svn: 203156
-
Rafael Espindola authored
Despite the name, n_type contains the type of the symbol, but also if it is extern or private extern. llvm-svn: 203154
-
Matt Arsenault authored
This appears to only be working for global loads. Private and local break for other reasons. llvm-svn: 203135
-