- Sep 03, 2010
-
-
Jakob Stoklund Olesen authored
llvm-svn: 112921
-
Anton Korobeynikov authored
Patch by Cameron Esfahani! llvm-svn: 112902
-
Bruno Cardoso Lopes authored
llvm-svn: 112896
-
Anton Korobeynikov authored
llvm-svn: 112885
-
Anton Korobeynikov authored
Patch by Jan Sjodin! llvm-svn: 112875
-
- Sep 02, 2010
-
-
Bruno Cardoso Lopes authored
Move decoding of insertps back to avoid unused warnings in x86 isel lowering, and fix movlhps/movhlps to decode 4 elements shuffles llvm-svn: 112869
-
Dan Gohman authored
there are clearly no stores between the load and the store. This fixes this miscompile reported as PR7833. This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is safe, but awkward to prove safe. Move it to X86's README.txt. llvm-svn: 112861
-
Bruno Cardoso Lopes authored
Move x86 specific shuffle mask decoding to its own header, it's also going to be used elsewhere. Also trim trailing whitespaces llvm-svn: 112846
-
Bruno Cardoso Lopes authored
llvm-svn: 112806
-
Bruno Cardoso Lopes authored
llvm-svn: 112805
-
Bruno Cardoso Lopes authored
llvm-svn: 112804
-
Bruno Cardoso Lopes authored
llvm-svn: 112799
-
Bruno Cardoso Lopes authored
llvm-svn: 112760
-
Bruno Cardoso Lopes authored
check more strict, breaking some cases not checked in the testsuite, but also exposes some foldings not done before, as this example: movaps (%rdi), %xmm0 movaps (%rax), %xmm1 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $36, %xmm2, %xmm0 now is generated as: movaps (%rdi), %xmm0 movaps %xmm0, %xmm1 movlps (%rax), %xmm1 shufps $36, %xmm1, %xmm0 llvm-svn: 112753
-
- Sep 01, 2010
-
-
Bruno Cardoso Lopes authored
Use movlps, movlpd, movss and movsd specific nodes instead of pattern matching with movlp pattern fragment llvm-svn: 112694
-
Bruno Cardoso Lopes authored
llvm-svn: 112689
-
Bruno Cardoso Lopes authored
llvm-svn: 112687
-
Bruno Cardoso Lopes authored
llvm-svn: 112661
-
Bruno Cardoso Lopes authored
llvm-svn: 112657
-
- Aug 31, 2010
-
-
Jakob Stoklund Olesen authored
No CCR virtual registers should exist, and %EFLAGS is used in ways that can surprise RegAllocFast. llvm-svn: 112650
-
Bruno Cardoso Lopes authored
llvm-svn: 112644
-
Bruno Cardoso Lopes authored
llvm-svn: 112642
-
Bruno Cardoso Lopes authored
Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles llvm-svn: 112570
-
- Aug 29, 2010
-
-
Eli Friedman authored
llvm-svn: 112411
-
Chris Lattner authored
llvm-svn: 112397
-
- Aug 28, 2010
-
-
Chris Lattner authored
times. This patch causes llc and llvm-mc (which both default to verbose-asm) to print out comments after a few common shuffle instructions which indicates the shuffle mask, e.g.: insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1] unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0] This is carefully factored to keep the information extraction (of the shuffle mask) separate from the printing logic. I plan to move the extraction part out somewhere else at some point for other parts of the x86 backend that want to introspect on the behavior of shuffles. llvm-svn: 112387
-
Chris Lattner authored
insertp[sd] $0, which is a noop. Before: _f32: ## @f32 pshufd $1, %xmm1, %xmm2 pshufd $1, %xmm0, %xmm3 addss %xmm2, %xmm3 addss %xmm1, %xmm0 ## kill: XMM0<def> XMM0<kill> XMM0<def> insertps $0, %xmm0, %xmm0 insertps $16, %xmm3, %xmm0 ret after: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movdqa %xmm2, %xmm0 insertps $16, %xmm3, %xmm0 ret The extra movs are due to a random (poor) scheduling decision. llvm-svn: 112379
-
Chris Lattner authored
when the top elements of a vector are undefined. This happens all the time for X86-64 ABI stuff because only the low 2 elements of a 4 element vector are defined. For example, on: _Complex float f32(_Complex float A, _Complex float B) { return A+B; } We used to produce (with SSE2, SSE4.1+ uses insertps): _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $16, %xmm2, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm0 addss %xmm1, %xmm0 pshufd $16, %xmm0, %xmm1 movdqa %xmm2, %xmm0 unpcklps %xmm1, %xmm0 ret We now produce: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movaps %xmm2, %xmm0 unpcklps %xmm3, %xmm0 ret This implements rdar://8368414 llvm-svn: 112378
-
Chris Lattner authored
a new EltStride variable instead of reusing NumElems variable for a non-obvious purpose. No functionality change. llvm-svn: 112377
-
Bruno Cardoso Lopes authored
Also teach this logic how to handle target specific shuffles if needed, this is necessary while searching recursively for zeroed scalar elements in vector shuffle operands. llvm-svn: 112348
-
- Aug 27, 2010
-
-
Anton Korobeynikov authored
value should be copied to the corresponding shadow reg as well. Patch by Cameron Esfahani! llvm-svn: 112262
-
Daniel Dunbar authored
X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler. llvm-svn: 112250
-
Jim Grosbach authored
to try to re-use scavenged frame index reference registers. rdar://8277890 llvm-svn: 112241
-
- Aug 26, 2010
-
-
Bruno Cardoso Lopes authored
llvm-svn: 112218
-
Bob Wilson authored
llvm-svn: 112202
-
Chris Lattner authored
llvm-svn: 112171
-
Chris Lattner authored
apparently try to support. llvm-svn: 112168
-
Bruno Cardoso Lopes authored
llvm-svn: 112128
-
Chris Lattner authored
llvm-svn: 112109
-
- Aug 25, 2010
-
-
Bruno Cardoso Lopes authored
llvm-svn: 112090
-