- Aug 04, 2014
-
-
Chandler Carruth authored
patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. llvm-svn: 214702
-
Kevin Qin authored
This commit broke "make check" for several hours, so get it reverted. llvm-svn: 214697
-
Chandler Carruth authored
I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680
-
Chandler Carruth authored
Fundamentally, there isn't a really portable way to test the constant pool contents. Instead, pin this test to the bare-metal triple. This also makes it a 64-bit triple which allows us to only match a single constant pool rather than two. It can also just hard code the '.' prefix as the format should be stable now that it has a fixed triple. Finally, I've switched it to use CHECK-NEXT to be more precise in the instruction sequence expected and to use variables rather than hard coding decisions by the register allocator. llvm-svn: 214679
-
Sanjay Patel authored
llvm-svn: 214674
-
Sanjay Patel authored
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
-
Gerolf Hoflehner authored
sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669
-
- Aug 03, 2014
-
-
Matt Arsenault authored
This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660
-
- Aug 02, 2014
-
-
James Molloy authored
llvm-svn: 214637
-
James Molloy authored
[AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634
-
Chandler Carruth authored
Darwin x86 asm comment prefix designed to work around GAS on that platform. That makes the comment-matching of the test much more stable. llvm-svn: 214629
-
Chandler Carruth authored
lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628
-
Chandler Carruth authored
of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point *outputs* despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625
-
Akira Hatanaka authored
expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used in pic mode. This patch fixes the bug. <rdar://problem/17886592> llvm-svn: 214614
-
Matt Arsenault authored
llvm-svn: 214612
-
Chandler Carruth authored
makes a mess of the lit output when they ultimately fail. The 2012-10-02-DAGCycle test is really frustrating because the *only* explanation for what it is testing is a rdar link. I would really rather that rdar links (which are not public or part of the open source project) were not committed to the source code. Regardless, the actual problem *must* be described as the rdar link is completely opaque. The fact that this test didn't check for any particular output further exacerbates the inability of any other developer to debug failures. The mem-promote-integers test has nice comments and *seems* to be a great test for our lowering... except that we don't actually check that any of the generated code is correct or matches some pattern. We just avoid crashing. It would be great to go back and populate this test with the actual expectations. llvm-svn: 214605
-
Akira Hatanaka authored
Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580
-
NAKAMURA Takumi authored
llvm/test/CodeGen/Mips/cconv/arguments-varargs.ll: Add explicit -mtriple=(mips|mipsel)-linux on 4 lines. llvm-svn: 214578
-
- Aug 01, 2014
-
-
Tom Stellard authored
This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572
-
Reid Kleckner authored
If the symbol comes from an external DSO, it apparently requires indirection through a register. llvm-svn: 214571
-
Tom Stellard authored
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566
-
Reid Kleckner authored
This is consistent with how we parse them in a standalone .s file, and inline assembly shouldn't differ. This fixes errors about requiring more registers than available in cases like this: void f(); void __declspec(naked) g() { __asm pusha __asm call f __asm popa __asm ret } There are no registers available to pass the address of 'f' into the asm blob. The asm should now directly call 'f'. Tests will land in Clang shortly. llvm-svn: 214550
-
Juergen Ributzka authored
Fold simple offsets into the memory operation: add x0, x0, #8 ldr x0, [x0] --> ldr x0, [x0, #8] Fixes <rdar://problem/17887945>. llvm-svn: 214545
-
Juergen Ributzka authored
Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). Fixes <rdar://problem/17887137>. llvm-svn: 214537
-
Philip Reames authored
This change adds code to explicitly mark a function which requires runtime stack realignment as not having a fixed frame size in the StackMap section. As it happens, this is not actually a functional change. The size that would be reported without the check is also "-1", but as far as I can tell, that's an accident. The code change makes this explicit. Note: There's a separate bug in handling of stackmaps and patchpoints in functions which need dynamic frame realignment. The current code assumes that offsets can be calculated from RBP, but realigned frames must use RSP. (There's a variable gap between RBP and the spill slots.) This change set does not address that issue. Reviewers: atrick, ributzka Differential Revision: http://reviews.llvm.org/D4572 llvm-svn: 214534
-
Juergen Ributzka authored
This is a followup patch for r214366, which added the same behavior to the AArch64 and X86 FastISel code. This fix reproduces the already existing behavior of SelectionDAG in FastISel. llvm-svn: 214531
-
Matt Arsenault authored
Remove -CHECKs, use multiple prefixes, name values, also test the @llvm.fabs version llvm-svn: 214525
-
Chad Rosier authored
llvm-svn: 214521
-
Chad Rosier authored
The tbz/tbnz checks the sign bit to convert op w1, w1, w10 cmp w1, #0 b.lt .LBB0_0 to op w1, w1, w10 tbnz w1, #31, .LBB0_0 Differential Revision: http://reviews.llvm.org/D4440 llvm-svn: 214518
-
Daniel Sanders authored
Summary: Big-endian mode was not correctly adjusting the offset for types smaller than an ABI slot. Fixes PR19612 Reviewers: dsanders Reviewed By: dsanders Subscribers: sstankovic, llvm-commits Differential Revision: http://reviews.llvm.org/D4556 llvm-svn: 214493
-
Juergen Ributzka authored
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit. This fix checks if the immediate version can be used under this constraints and if we can convert ADDS to SUBS or vice versa to support negative immediates. Also update the test cases to test the immediate versions. llvm-svn: 214470
-
Hal Finkel authored
When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469
-
Tom Stellard authored
Abs/neg folding has moved out of foldOperands and into the instruction selection phase using complex patterns. As a consequence of this change, we now prefer to select the 64-bit encoding for most instructions and the modifier operands have been dropped from integer VOP3 instructions. llvm-svn: 214467
-
Tom Stellard authored
This will prevent us from using extra MOV instructions once we prefer selecting 64-bit instructions. llvm-svn: 214464
-
Tom Stellard authored
We were commuting the instruction by still shrinking it using the original opcode. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 214463
-
Jan Vesely authored
Signed-off-by:
Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by:
Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 214451
-
- Jul 31, 2014
-
-
Will Schmidt authored
This is a follow-up to the activity in the bug at http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has to do with how the KILL pseudo-instruction is handled. I defer to Hal/Jakob/Uli for additional details and background. This will disable the (bad?) assert, add an associated fixme comment, and add a pair of tests. The code change and the pr18663-2.ll test are copied from the referenced bug. That test does not immediately fail in my environment, but I have added the pr18663.ll test which does. (Comment from Hal) to provide everyone else with some context, this assert was not bad when it was written. At that time, we only generated KILL pseudo instructions around subregister copies. This logic, unfortunately, had its own problems. In r199797, the relevant logic in MachineCopyPropagation was replaced to generate KILLs for other kinds of copies too. This change in semantics broke this now-problematic assumption in AggressiveAntiDepBreaker. The AggressiveAntiDepBreaker really needs a proper cleanup to deal with the change, but removing the assert (which just allows the function to return false) is a safe conservative behavior, and should do for the time being. llvm-svn: 214429
-
Hal Finkel authored
It seems that when I fixed this, almost exactly a year ago, I did not quite do it correctly. When we have duplicate block predecessors, we can indeed not have different incoming values for the same block, but we *must* have duplicate entries. So, instead of skipping the duplicates, we explicitly add the duplicate incoming values. Fixes PR20442. llvm-svn: 214423
-
Juergen Ributzka authored
Fixes <rdar://problem/17867078>. llvm-svn: 214389
-
Juergen Ributzka authored
Fixes <rdar://problem/17867067>. llvm-svn: 214388
-