- Jan 09, 2012
-
-
Craig Topper authored
llvm-svn: 147758
-
- Jan 08, 2012
-
-
Victor Umansky authored
llvm-svn: 147748
-
- Jan 07, 2012
-
-
Craig Topper authored
llvm-svn: 147739
-
Benjamin Kramer authored
llvm-svn: 147738
-
Craig Topper authored
llvm-svn: 147734
-
Eric Christopher authored
Fixes rdar://10614894 llvm-svn: 147704
-
- Jan 05, 2012
-
-
Craig Topper authored
llvm-svn: 147602
-
Victor Umansky authored
Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601
-
Bill Wendling authored
This small bit of ASM code is sufficient to do what the old algorithm did: movq %rax, %xmm0 punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U } subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 } #ifdef __SSE3__ haddpd %xmm0, %xmm0 #else pshufd $0x4e, %xmm0, %xmm1 addpd %xmm1, %xmm0 #endif It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on all processors. <rdar://problem/7719814> llvm-svn: 147593
-
- Jan 04, 2012
-
-
Benjamin Kramer authored
llvm-svn: 147553
-
Evan Cheng authored
(x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512
-
Chad Rosier authored
llvm-svn: 147495
-
- Jan 03, 2012
-
-
Nadav Rotem authored
llvm-svn: 147485
-
Chad Rosier authored
then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481
-
Devang Patel authored
llvm-svn: 147453
-
- Jan 02, 2012
-
-
Craig Topper authored
Miscellaneous shuffle lowering cleanup. No functional changes. Primarily converting the indexing loops to unsigned to be consistent across functions. llvm-svn: 147430
-
Craig Topper authored
Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection. llvm-svn: 147428
-
Nadav Rotem authored
Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426
-
- Jan 01, 2012
-
-
Craig Topper authored
llvm-svn: 147411
-
Craig Topper authored
Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409
-
Benjamin Kramer authored
llvm-svn: 147404
-
Craig Topper authored
llvm-svn: 147394
-
Craig Topper authored
llvm-svn: 147393
-
Craig Topper authored
Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected. llvm-svn: 147392
-
- Dec 30, 2011
-
-
Craig Topper authored
Make FMA4 imply AVX so that YMM registers would be available. Necessitates removing from Bulldozer CPU types since it would enable AVX code generation implicitly. Also make SSE4A imply SSE3. Without some level of SSE implied, XMM registers wouldn't be legal. llvm-svn: 147369
-
Craig Topper authored
llvm-svn: 147368
-
Craig Topper authored
llvm-svn: 147367
-
Craig Topper authored
Separate the concept of having memory access in operand 4 from the concept of having the W bit set for XOP instructons. Removes ORing W-bits in the encoder and will similarly simplify the disassembler implementation. llvm-svn: 147366
-
Craig Topper authored
llvm-svn: 147365
-
Craig Topper authored
llvm-svn: 147364
-
Craig Topper authored
Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms. llvm-svn: 147361
-
Craig Topper authored
Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere. llvm-svn: 147360
-
- Dec 29, 2011
-
-
Craig Topper authored
llvm-svn: 147353
-
Craig Topper authored
llvm-svn: 147351
-
Craig Topper authored
Make FMA3 imply AVX needs to be enabled. Particularly because 256-bit types aren't valid unless AVX is enabled. llvm-svn: 147349
-
Craig Topper authored
llvm-svn: 147348
-
Craig Topper authored
llvm-svn: 147347
-
Craig Topper authored
Mark non-VEX forms of PCLMUL instructions as requiring SSE2 to be enabled along with CLMUL. That's required for the XMM registers to be valid for integer data. Doesn't change any behavior since the CLMUL instructions don't have patterns yet. llvm-svn: 147345
-
Craig Topper authored
Mark non-VEX forms of AES instructions as requiring SSE2 to be enabled along with AES. Since that's required for the XMM registers to be valid for integer data. Doesn't change any behavior though since you can't use an intrinsic with an illegal type anyway. Just makes it consistent with the VEX forms. llvm-svn: 147344
-
Craig Topper authored
Remove the separate explicit AES instruction patterns. They are equivalent to the patterns specified by the instructions. Also remove unnecessary bitconverts from the AES patterns. llvm-svn: 147342
-