- Mar 10, 2014
-
-
Benjamin Kramer authored
MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination. The testcase is from PR19092, but I think the bug described there is actually a clang issue. llvm-svn: 203489
-
Evan Cheng authored
optimize a call to a llvm intrinsic to something that invovles a call to a C library call, make sure it sets the right calling convention on the call. e.g. extern double pow(double, double); double t(double x) { return pow(10, x); } Compiles to something like this for AAPCS-VFP: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x) ret double %0 } declare double @llvm.pow.f64(double, double) #1 Simplify libcall (part of instcombine) will turn the above into: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %__exp10 = call double @__exp10(double %x) #1 ret double %__exp10 } declare double @__exp10(double) The pre-instcombine code works because calls to LLVM builtins are special. Instruction selection will chose the right calling convention for the call. However, the code after instcombine is wrong. The call to __exp10 will use the C calling convention. I can think of 3 options to fix this. 1. Make "C" calling convention just work since the target should know what CC is being used. This doesn't work because each function can use different CC with the "pcs" attribute. 2. Have Clang add the right CC keyword on the calls to LLVM builtin. This will work but it doesn't match the LLVM IR specification which states these are "Standard C Library Intrinsics". 3. Fix simplify libcall so the resulting calls to the C routines will have the proper CC keyword. e.g. %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1 This works and is the solution I implemented here. Both solutions #2 and #3 would work. After carefully considering the pros and cons, I decided to implement #3 for the following reasons. 1. It doesn't change the "spec" of the intrinsics. 2. It's a self-contained fix. There are a couple of potential downsides. 1. There could be other places in the optimizer that is broken in the same way that's not addressed by this. 2. There could be other calling conventions that need to be propagated by simplify-libcall that's not handled. But for now, this is the fix that I'm most comfortable with. llvm-svn: 203488
-
- Mar 09, 2014
-
-
NAKAMURA Takumi authored
It choked i686 stage2. llvm-svn: 203386
-
David Majnemer authored
The grammar for LLVM IR is not well specified in any document but seems to obey the following rules: - Attributes which have parenthesized arguments are never preceded by commas. This form of attribute is the only one which ever has optional arguments. However, not all of these attributes support optional arguments: 'thread_local' supports an optional argument but 'addrspace' does not. Interestingly, 'addrspace' is documented as being a "qualifier". What constitutes a qualifier? I cannot find a definition. - Some attributes use a space between the keyword and the value. Examples of this form are 'align' and 'section'. These are always preceded by a comma. - Otherwise, the attribute has no argument. These attributes do not have a preceding comma. Sometimes an attribute goes before the instruction, between the instruction and it's type, or after it's type. 'atomicrmw' has 'volatile' between the instruction and the type while 'call' has 'tail' preceding the instruction. With all this in mind, it seems most consistent for 'inalloca' on an 'inalloca' instruction to occur before between the instruction and the type. Unlike the current formulation, there would be no preceding comma. The combination 'alloca inalloca' doesn't look particularly appetizing, perhaps a better spelling of 'inalloca' is down the road. llvm-svn: 203376
-
- Mar 07, 2014
-
-
Tim Northover authored
This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. Patch by Manuel Jacob. llvm-svn: 203230
-
Tim Northover authored
Sequences of insertelement/extractelements are sometimes used to build vectorsr; this code tries to put them back together into shuffles, but could only produce a completely uniform shuffle types (<N x T> from two <N x T> sources). This should allow shuffles with different numbers of elements on the input and output sides as well. llvm-svn: 203229
-
Karthik Bhat authored
llvm-svn: 203198
-
- Mar 06, 2014
-
-
Karthik Bhat authored
llvm-svn: 203076
-
Raul E. Silvera authored
are operations that do not access memory but may be sensitive to floating-point environment changes. LLVM does not attempt to model FP environment changes, so this was unnecessarily conservative and was getting on the way of some optimizations, in particular SLP vectorization. llvm-svn: 203037
-
- Mar 05, 2014
-
-
Arnold Schwaighofer authored
Fixes PR19045. llvm-svn: 203008
-
Benjamin Kramer authored
llvm-svn: 202997
-
Raul E. Silvera authored
llvm-svn: 202924
-
Matt Arsenault authored
llvm-svn: 202914
-
- Mar 03, 2014
-
-
Diego Novillo authored
DWARF discriminators are used to distinguish multiple control flow paths on the same source location. When this happens, instructions across basic block boundaries will share the same debug location. This pass detects this situation and creates a new lexical scope to one of the two instructions. This lexical scope is a child scope of the original and contains a new discriminator value. This discriminator is then picked up from MCObjectStreamer::EmitDwarfLocDirective to be written on the object file. This fixes http://llvm.org/bugs/show_bug.cgi?id=18270. llvm-svn: 202752
-
- Feb 27, 2014
-
-
Eric Christopher authored
and update everything accordingly. This can be used to conditionalize the amount of output in the backend based on the amount of debug requested/metadata emission scheme by a front end (e.g. clang). Paired with a commit to clang. llvm-svn: 202332
-
- Feb 26, 2014
-
-
Nico Rieck authored
llvm-svn: 202308
-
Reid Kleckner authored
We should apply fastcc whenever profitable. We can expand this list, but there are lots of conventions with performance implications that we don't want to change. Differential Revision: http://llvm-reviews.chandlerc.com/D2705 llvm-svn: 202293
-
Nico Rieck authored
llvm-svn: 202291
-
Andrew Trick authored
Patch by Michael Zolotukhin! llvm-svn: 202273
-
Chandler Carruth authored
address spaces. This isn't really a correctness issue (the values are truncated) but its much cleaner. Patch by Matt Arsenault! llvm-svn: 202252
-
Chandler Carruth authored
the default. Based on the patch by Matt Arsenault, D1764! I switched one place to use the more direct pointer type to compute the desired address space, and I reworked the memcpy rewriting section to reflect significant refactorings that this patch helped inspire. Thanks to several of the folks who helped review and improve the patch as well. llvm-svn: 202247
-
Chandler Carruth authored
to work independently for the slice side and the other side. This allows us to only compute the minimum of the two when we actually rewrite to a memcpy that needs to take the minimum, and preserve higher alignment for one side or the other when rewriting to loads and stores. This fix was inspired by seeing the result of some refactoring that makes addrspace handling better. llvm-svn: 202242
-
Chandler Carruth authored
checking in SROA. The primary change is to just rely on uge for checking that the offset is within the allocation size. This removes the explicit checks against isNegative which were terribly error prone (including the reversed logic that led to PR18615) and prevented us from supporting stack allocations larger than half the address space.... Ok, so maybe the latter isn't *common* but it's a silly restriction to have. Also, we used to try to support a PHI node which loaded from before the start of the allocation if any of the loaded bytes were within the allocation. This doesn't make any sense, we have never really supported loading or storing *before* the allocation starts. The simplified logic just doesn't care. We continue to allow loading past the end of the allocation in part to support cases where there is a PHI and some loads are larger than others and the larger ones reach past the end of the allocation. We could solve this a different and more conservative way, but I'm still somewhat paranoid about this. llvm-svn: 202224
-
- Feb 25, 2014
-
-
Chandler Carruth authored
ordering. The fundamental problem that we're hitting here is that the use-def chain ordering is *itself* not a stable thing to be relying on in the rewriting for SROA. Further, we use a non-stable sort over the slices to arrange them based on the section of the alloca they're operating on. With a debugging STL implementation (or different implementations in stage2 and stage3) this can cause stage2 != stage3. The specific aspect of this problem fixed in this commit deals with the rewriting and load-speculation around PHIs and Selects. This, like many other aspects of the use-rewriting in SROA, is really part of the "strong SSA-formation" that is doen by SROA where it works very hard to canonicalize loads and stores in *just* the right way to satisfy the needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the same alloca, we test that loads downstream of the select are speculatable around it twice. If only one of the operands to the select needs to be rewritten, then if we get lucky we rewrite that one first and the select is immediately speculatable. This can cause the order of operand visitation, and thus the order of slices to be rewritten, to change an alloca from promotable to non-promotable and vice versa. The fix is to defer all of the speculation until *after* the rewrite phase is done. Once we've rewritten everything, we can accurately test for whether speculation will work (once, instead of twice!) and the order ceases to matter. This also happens to simplify the other subtlety of speculation -- we need to *not* speculate anything unless the result of speculating will make the alloca fully promotable by mem2reg. I had a previous attempt at simplifying this, but it was still pretty horrible. There is actually already a *really* nice test case for this in basictest.ll, but on multiple STL implementations and inputs, we just got "lucky". Fortunately, the test case is very small and we can essentially build it in exactly the opposite way to get reasonable coverage in both directions even from normal STL implementations. llvm-svn: 202092
-
- Feb 24, 2014
-
-
Arnold Schwaighofer authored
Vectorize sequential stores of a broadcasted value. 5% on eon. radar://16124699 llvm-svn: 202067
-
- Feb 21, 2014
-
-
Nick Lewycky authored
Make sure that value handle users see the transformation of an indirect call to a direct call. This is important for the CallGraph iteration. Patch by Björn Steinbrink! llvm-svn: 201822
-
- Feb 19, 2014
-
-
Tim Northover authored
If LLVM is built without X86 as a supported target then the test would mysteriously fail. llvm-svn: 201668
-
Tim Northover authored
Just a wild stab in the dark really, but in the absence of any ability to reproduce the problem... llvm-svn: 201658
-
Tim Northover authored
On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift *is* really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655
-
- Feb 17, 2014
-
-
Gerolf Hoflehner authored
fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178 llvm-svn: 201501
-
- Feb 16, 2014
-
-
Arnold Schwaighofer authored
During LSR of one loop we can run into a situation where we have to expand the start of a recurrence of a loop induction variable in this loop. This start value is a value derived of the induction variable of a preceeding loop. SCEV has cannonicalized this value to a different recurrence than the recurrence of the preceeding loop's induction variable (the type and/or step direction) has changed). When we come to instantiate this SCEV we created a second induction variable in this preceeding loop. This patch tries to base such derived induction variables of the preceeding loop's induction variable. This helps twolf on arm and seems to help scimark2 on x86. Reapply with a fix for the case of a value derived from a pointer. radar://15970709 llvm-svn: 201496
-
Nico Rieck authored
llvm-svn: 201479
-
- Feb 15, 2014
-
-
Arnold Schwaighofer authored
This reverts commit r201465. It broke an arm bot. llvm-svn: 201466
-
Arnold Schwaighofer authored
During LSR of one loop we can run into a situation where we have to expand the start of a recurrence of a loop induction variable in this loop. This start value is a value derived of the induction variable of a preceeding loop. SCEV has cannonicalized this value to a different recurrence than the recurrence of the preceeding loop's induction variable (the type and/or step direction) has changed). When we come to instantiate this SCEV we created a second induction variable in this preceeding loop. This patch tries to base such derived induction variables of the preceeding loop's induction variable. This helps twolf on arm and seems to help scimark2 on x86. radar://15970709 llvm-svn: 201465
-
- Feb 14, 2014
-
-
Matt Arsenault authored
Makes addrspacecast (gep) do addrspacecast (gep) instead. llvm-svn: 201376
-
- Feb 13, 2014
-
-
Daniel Sanders authored
Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Changes since review (and last commit attempt): - Fixed test failures that were missed due to configuration of local build. (fixes crash.ll and a couple others). - Fixed tests that happened to pass because the local build was on X86 (should fix 2007-12-17-InvokeAsm.ll) - mature-mc-support.ll's should no longer require all targets to be compiled. (should fix ARM and PPC buildbots) - Object output (-filetype=obj and similar) now forces the integrated assembler to be enabled regardless of default setting or -no-integrated-as. (should fix SystemZ buildbots) Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201333
-
Reid Kleckner authored
As defined in LangRef, aliases do not have sections. However, LLVM's GlobalAlias class inherits from GlobalValue, which means we can read and set its section. We should probably ban that as a separate change, since it doesn't make much sense for an alias to have a section that differs from its aliasee. Fixes PR18757, where the section was being lost on the global in code from Clang like: extern "C" { __attribute__((used, section("CUSTOM"))) static int in_custom_section; } Reviewers: rafael.espindola Differential Revision: http://llvm-reviews.chandlerc.com/D2758 llvm-svn: 201286
-
Owen Anderson authored
logical operations on the i1's driving them. This is a bad idea for every target I can think of (confirmed with micro tests on all of: x86-64, ARM, AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into a general purpose register, whereas consuming it directly into a select generally allows it to exist only transiently in a predicate or flags register. Chandler ran a set of performance tests with this change, and reported no measurable change on x86-64. llvm-svn: 201275
-
- Feb 12, 2014
-
-
Daniel Sanders authored
Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call It introduced multiple test failures in the buildbots. llvm-svn: 201241
-
Daniel Sanders authored
Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201237
-