- Feb 10, 2015
-
-
Bradley Smith authored
llvm-svn: 228696
-
Simon Pilgrim authored
Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns) Differential Revision: http://reviews.llvm.org/D7492 llvm-svn: 228688
-
Simon Pilgrim authored
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc. Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources. Differential Revision: http://reviews.llvm.org/D7484 llvm-svn: 228685
-
Andrea Di Biagio authored
This patch teaches X86FastISel how to select AVX instructions for scalar float/double convert operations. Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float). For example: \code define double @foo(float %f) { %conv = fpext float %f to double ret double %conv } \end code Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is legacy SSE: cvtss2sd %xmm0, %xmm0 With this patch, X86FastIsel selects a VCVTSS2SDrr instead: vcvtss2sd %xmm0, %xmm0, %xmm0 Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants. Differential Revision: http://reviews.llvm.org/D7438 llvm-svn: 228682
-
Nick Lewycky authored
llvm-svn: 228657
-
Chandler Carruth authored
nodes when folding bitcasts of constants. We can't fold things and then check after-the-fact whether it was legal. Once we have formed the DAG node, arbitrary other nodes may have been collapsed to it. There is no easy way to go back. Instead, we need to test for the specific folding cases we're interested in and ensure those are legal first. This could in theory make this less powerful for bitcasting from an integer to some vector type, but AFAICT, that can't actually happen in the SDAG so its fine. Now, we *only* whitelist specific int->fp and fp->int bitcasts for post-legalization folding. I've added the test case from the PR. (Also as a note, this does not appear to be in 3.6, no backport needed) llvm-svn: 228656
-
David Majnemer authored
Win64 has specific contraints on what valid prologues and epilogues look like. This constraint is born from the flexibility and descriptiveness of Win64's unwind opcodes. Prologues previously emitted by LLVM could not be represented by the unwind opcodes, preventing operations powered by stack unwinding to successfully work. Differential Revision: http://reviews.llvm.org/D7520 llvm-svn: 228641
-
- Feb 09, 2015
-
-
Colin LeMahieu authored
llvm-svn: 228602
-
Sanjay Patel authored
llvm-svn: 228581
-
Kit Barton authored
veqv (vector equivalence) vnand vorc I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions. Phabricator review: http://reviews.llvm.org/D7469 llvm-svn: 228580
-
Akira Hatanaka authored
wrong basic block. This would happen when the result of an invoke was used by a phi instruction in the invoke's normal destination block. An instruction to reload the invoke's value would get inserted before the critical edge was split and a new basic block (which is the correct insertion point for the reload) was created. This commit fixes the bug by splitting the critical edge before all the reload instructions are inserted. Also, hoist up the code which computes the insertion point to the only place that need that computation. rdar://problem/15978721 llvm-svn: 228566
-
- Feb 08, 2015
-
-
Sanjay Patel authored
llvm-svn: 228546
-
Sanjay Patel authored
llvm-svn: 228545
-
Sanjay Patel authored
llvm-svn: 228544
-
Sanjay Patel authored
llvm-svn: 228541
-
Sanjay Patel authored
llvm-svn: 228536
-
Sanjay Patel authored
llvm-svn: 228535
-
Sanjay Patel authored
llvm-svn: 228534
-
Sanjay Patel authored
llvm-svn: 228532
-
Simon Pilgrim authored
llvm-svn: 228528
-
Tim Northover authored
While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518
-
Simon Pilgrim authored
This adds tests for the remaining AVX2 instructions that currently support memory folding. llvm-svn: 228513
-
- Feb 07, 2015
-
-
Simon Pilgrim authored
llvm-svn: 228509
-
Simon Pilgrim authored
llvm-svn: 228506
-
Simon Pilgrim authored
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default). llvm-svn: 228495
-
Simon Pilgrim authored
llvm-svn: 228494
-
David Majnemer authored
COFF section flags are not idempotent: 'rd' will make a read-write section because 'd' implies write 'dr' will make a read-only section because 'r' disables write llvm-svn: 228490
-
Hal Finkel authored
If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488
-
Hal Finkel authored
llvm-svn: 228467
-
Simon Pilgrim authored
llvm-svn: 228462
-
Hal Finkel authored
Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native). Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now. llvm-svn: 228460
-
- Feb 06, 2015
-
-
Reid Kleckner authored
Fixes PR22488 llvm-svn: 228411
-
Michel Danzer authored
Reviewed-by:
Tom Stellard <tom@stellard.net> llvm-svn: 228374
-
Michel Danzer authored
Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by:
Tom Stellard <tom@stellard.net> llvm-svn: 228373
-
Michel Danzer authored
If whole quad mode isn't enabled for these, the level of detail is calculated incorrectly for pixels along diagonal triangle edges, causing artifacts. v2: Use a TSFlag instead of lots of switch cases v3: Add test coverage Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642 Reviewed-by:
Tom Stellard <tom@stellard.net> llvm-svn: 228372
-
Matthias Braun authored
Avoid the creation of select instructions which can result in different scheduling of the selects. I also added a bunch of additional store volatiles. Those avoid A CodeGen problem (bug?) where normalizes and denomarlizing the control moves all shift instructions into the first block where ISel can't match them together with the cmps. llvm-svn: 228362
-
Matthias Braun authored
Use FileCheck, make it more consistent and do not rely on unoptimized or(cmp,cmp) getting combined for max to be matched. llvm-svn: 228361
-
- Feb 05, 2015
-
-
Colin LeMahieu authored
llvm-svn: 228347
-
Hal Finkel authored
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX vector load/stores). Using these on embedded cores can be very important, but most loops are not naturally set up to use them. We can often change that, however, by placing loops into a non-canonical form. Generically, this means transforming loops like this: for (int i = 0; i < n; ++i) array[i] = c; to look like this: T *p = array[-1]; for (int i = 0; i < n; ++i) *++p = c; the key point is that addresses accessed are pulled into dedicated PHIs and "pre-decremented" in the loop preheader. This allows the use of pre-increment load/store instructions without loop peeling. A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is introduced to perform this transformation. I've used this code out-of-tree for generating code for the PPC A2 for over a year. Somewhat to my surprise, running the test suite + externals on a P7 with this transformation enabled showed no performance regressions, and one speedup: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -2.32514% +/- 1.03736% So I'm going to enable it on everything for now. I was surprised by this because, on the POWER cores, these pre-increment load/store instructions are cracked (and, thus, harder to schedule effectively). But seeing no regressions, and feeling that it is generally easier to split instructions apart late than it is to combine them late, this might be the better approach regardless. In the future, we might want to integrate this functionality into LSR (but currently LSR does not create new PHI nodes, so (for that and other reasons) significant work would need to be done). llvm-svn: 228328
-
Hal Finkel authored
PowerPC supports pre-increment floating-point load/store instructions, both r+r and r+i, and we had patterns for them, but they were not marked as legal. Mark them as legal (and add a test case). llvm-svn: 228327
-