- Jul 29, 2011
-
-
Bruno Cardoso Lopes authored
generation to always catch the weird cases. llvm-svn: 136453
-
Bruno Cardoso Lopes authored
llvm-svn: 136451
-
- Jul 28, 2011
-
-
Bruno Cardoso Lopes authored
using vextractf128. This will reduce the number of issued instruction for several avx codes. llvm-svn: 136323
-
Bruno Cardoso Lopes authored
Take advantage that the 128-bit vpxor zeros the higher part and use it. This also fixes PR10491 llvm-svn: 136321
-
Bruno Cardoso Lopes authored
a convert pattern close to the instruction definition. llvm-svn: 136320
-
- Jul 27, 2011
-
-
Bruno Cardoso Lopes authored
usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. llvm-svn: 136200
-
Devang Patel authored
It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry. llvm-svn: 136196
-
Jakob Stoklund Olesen authored
These copies would coalesce easily, but the resulting value would be defined by a deleted instruction. Now we also remove the undefined value number from the destination register. This fixes PR10503. llvm-svn: 136174
-
Benjamin Kramer authored
llvm-svn: 136170
-
Benjamin Kramer authored
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add, which can be commuted and encoded efficiently. This code is generated for __builtin_clz and friends. llvm-svn: 136167
-
Bruno Cardoso Lopes authored
different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157
-
- Jul 26, 2011
-
-
Eli Friedman authored
Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130. llvm-svn: 136148
-
Eli Friedman authored
llvm-svn: 136131
-
Eli Friedman authored
llvm-svn: 136130
-
Bruno Cardoso Lopes authored
llvm-svn: 136051
-
Bruno Cardoso Lopes authored
This also fixes PR10452 llvm-svn: 136004
-
Bruno Cardoso Lopes authored
shuffle before inserting on a 256-bit vector. - Add AVX versions of movd/movq instructions - Introduce a few COPY patterns to match insert_subvector instructions. This turns a trivial insert_subvector instruction into a register copy, coalescing the xmm into a ymm and avoid emiting on more instruction. llvm-svn: 136002
-
Eli Friedman authored
llvm-svn: 135995
-
Eli Friedman authored
llvm-svn: 135993
-
- Jul 25, 2011
-
-
Eli Friedman authored
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle. llvm-svn: 135980
-
- Jul 24, 2011
-
-
Jakob Stoklund Olesen authored
This fixes PR10463. A two-address instruction with an <undef> use operand was incorrectly rewritten so the def and use no longer used the same register, violating the tie constraint. Fix this by always rewriting <undef> operands with the register a def operand would use. llvm-svn: 135885
-
- Jul 22, 2011
-
-
Bruno Cardoso Lopes authored
llvm-svn: 135802
-
Bruno Cardoso Lopes authored
load folding logic llvm-svn: 135801
-
Rafael Espindola authored
too. Patch by Jeff Muizelaar. llvm-svn: 135789
-
Bruno Cardoso Lopes authored
and was actually very wrong, fix it and make it simpler. Also remove the ConcatVectors function, which is unused now. - Fix a introduction of useless nodes in r126664 and r126264. The VUNPCKL* should never be introduced cause we don't want duplicate nodes for 128 AVX and non-AVX modes, the actual instruction difference only exists during isel, but not for target specific DAG nodes. We only introduce V* target nodes when there is no 128-bit version already there. - Fix a fragile test and make it more useful. llvm-svn: 135729
-
Bruno Cardoso Lopes authored
llvm-svn: 135728
-
Bruno Cardoso Lopes authored
vxorps + vinsertf128 pair of instructions llvm-svn: 135727
-
- Jul 21, 2011
-
-
Bruno Cardoso Lopes authored
- Add more bitcasts for v16i16 - Since 135661 and 135662 already added the splat logic, just add one more splat test for v16i16 llvm-svn: 135663
-
Bruno Cardoso Lopes authored
instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 llvm-svn: 135662
-
- Jul 20, 2011
-
-
Devang Patel authored
While emitting constant value, look through derived type and use underlying basic type to determine size and signness of the constant value. llvm-svn: 135627
-
Eli Friedman authored
llvm-svn: 135595
-
Eric Christopher authored
llvm-svn: 135562
-
Evan Cheng authored
llvm-svn: 135535
-
- Jul 19, 2011
-
-
Devang Patel authored
Revert r135423. llvm-svn: 135454
-
- Jul 18, 2011
-
-
Devang Patel authored
During bottom up fast-isel, instructions emitted to materalize registers are at top of basic block and do not have debug location. This may misguide debugger while entering the basic block and sometimes debugger provides semi useful view of current location to developer by picking up previous known location as current location. Assign a sensible location to the first instruction in a basic block, if it does not have one location derived from source file, so that debugger can provide meaningful user experience to developers in edge cases. [take 2] llvm-svn: 135423
-
Bruno Cardoso Lopes authored
llvm-svn: 135404
-
Nick Lewycky authored
llvm-svn: 135379
-
- Jul 16, 2011
-
-
Bruno Cardoso Lopes authored
llvm-svn: 135332
-
Bruno Cardoso Lopes authored
1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us canonize the loads and handle things the same way we use to handle for 128-bit registers. Despite of what one of the removed comments explained, the load promotion would not mess with VPERM, it's only a matter of doing the appropriate bitcasts when this instructions comes to be introduced. Also make LOAD v8i32 legal. 2) Doing 1) exposed two bugs: - v4i64 was being promoted to itself for several opcodes (introduced in r124447 by David Greene) causing endless recursion and the stack to explode. - there was no support for allOnes BUILD_VECTORs and ANDNP would fail to match because it was generating early target constant pools during lowering. 3) The testcases are already checked-in, doing 1) exposed the bugs in the current testcases. 4) Tidy up code to be more clear and explicit about AVX. llvm-svn: 135313
-
- Jul 14, 2011
-
-
Eric Christopher authored
when determining validity of matching constraint. Allow i1 types access to the GR8 reg class for x86. Fixes PR10352 and rdar://9777108 llvm-svn: 135180
-