Commits · afa12db8a631e9b7a58c1baa8ce5fd2c711971ad · Roger Ferrer / llvm-epi-0.8

Aug 20, 2009
- Fixed PCMPESTRM128 to have opcode 0x60 instead of 0x62, as specified by the · 46bb77f2
  Sean Callanan authored Aug 20, 2009
```
Intel documentation.

llvm-svn: 79554
```
  46bb77f2
Aug 19, 2009

Implement sse4.2 string/text processing instructions: · 9fe912de

Eric Christopher authored Aug 18, 2009

Add patterns and instruction encoding information.
Add custom lowering to deal with hardwired return register of
uncertain type (xmm0).

llvm-svn: 79377

9fe912de

Aug 12, 2009

Add 'isCodeGenOnly' bit to Instruction .td records. · c4f8ea4c

Daniel Dunbar authored Aug 11, 2009

 - Used to mark fake instructions which don't correspond to an actual machine
   instruction (or are duplicates of a real instruction). This is to be used for
   "special cases" in the .td files, which should be ignored by things like the
   assembler and disassembler. We still need a good solution to handle pervasive
   duplication, like with the Int_ instructions.

 - Set the bit on fake "mov 0" style instructions, which allows turning an
   assembler matcher warning into a hard error.

 - -2 FIXMEs.

llvm-svn: 78731

c4f8ea4c

Aug 10, 2009
- Fix up whitespace, remove commented out code. · 458c9173
  Eric Christopher authored Aug 10, 2009
```
llvm-svn: 78600
```
  458c9173
- llvm-mc/AsmMatcher: Change assembler parser match classes to their own record · 17410a4b
  Daniel Dunbar authored Aug 10, 2009
```
structure.

llvm-svn: 78581
```
  17410a4b
Aug 09, 2009
- Extend comment on ParserMatchClass .td field, and add some missing · 447c4ab9
  Daniel Dunbar authored Aug 09, 2009
```
classes for X86.

llvm-svn: 78524
```
  447c4ab9
Aug 08, 2009

Add crc32 instruction and intrinsics. Add a new class of prefix · 7dfa9f2e

Eric Christopher authored Aug 08, 2009

bytes for F2 0F 38 and propagate. Add a FIXME for a set
of possibilities which correspond to intrinsics already used.

New test.

llvm-svn: 78508

7dfa9f2e

Jul 31, 2009
- Whitespace and 80-col cleanup. · 45d71851
  Eric Christopher authored Jul 31, 2009
```
llvm-svn: 77718
```
  45d71851
Jul 30, 2009

Add a new register class to describe operands that can't be SP, · 49a6f16b

Dan Gohman authored Jul 30, 2009

due to x86 encoding restrictions. This is currently off by default
because it may cause code quality regressions. This is for PR4572.

llvm-svn: 77565

49a6f16b

Jul 29, 2009
- Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower · f7802a33
  Eric Christopher authored Jul 29, 2009
```
to ptest instruction plus setcc. Revamp ptest instruction. Add test.

llvm-svn: 77407
```
  f7802a33
Jul 24, 2009

Update insertps handling based on feedback. Move to a v4f32 style · f37ea3ad

Eric Christopher authored Jul 24, 2009

to support vector arguments and scalar arguments correctly. Update
lowering and fix comment to refer to pinsr* instead of insertps.

llvm-svn: 76921

f37ea3ad

Jul 23, 2009
- Support insertps via the intrinsic and add a couple of simple · b1b77ca8
  Eric Christopher authored Jul 23, 2009
```
testcases to make sure it's being generated.

llvm-svn: 76843
```
  b1b77ca8
Jun 19, 2009
- Fix for PR2484: add an SSE1 pattern for a shuffle we normally prefer to · 2fc939c8
  Eli Friedman authored Jun 19, 2009
```
handle with an SSE2 instruction.

llvm-svn: 73760
```
  2fc939c8
Jun 06, 2009
- Fix an obvious typo. · 868bd6ab
  Eli Friedman authored Jun 06, 2009
```
llvm-svn: 72987
```
  868bd6ab
May 29, 2009

The MONITOR and MWAIT instructions have insufficient information for · 2e09bd3d

Bill Wendling authored May 28, 2009

decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.

Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.

llvm-svn: 72556

2e09bd3d

May 28, 2009
- Fix MOVMSKPDrr encoding. · cc3ae1f2
  Evan Cheng authored May 28, 2009
```
llvm-svn: 72535
```
  cc3ae1f2
- Fix PSIGND encoding bug. Patch by Sean Callanan. · 60618fe4
  Evan Cheng authored May 28, 2009
```
llvm-svn: 72534
```
  60618fe4
- "The instructions MMX_PSADBWrm and MMX_PSADBWrr have opcode 0b11100000 (e0), but · 0feb0e60
  Bill Wendling authored May 28, 2009
```
the Intel manual (screenshot) says it should be 0b11110110 (f6).  The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."

Patch by Sean Callanan!

llvm-svn: 72508
```
  0feb0e60
May 27, 2009
- Fix sfence jit encoding. Patch by Sean Callanan. · 4db1631a
  Evan Cheng authored May 27, 2009
```
llvm-svn: 72488
```
  4db1631a
May 12, 2009
- 80 col violations. · b41de478
  Evan Cheng authored May 12, 2009
```
llvm-svn: 71582
```
  b41de478
Apr 30, 2009
- Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. · 7e6e3527
  Nate Begeman authored Apr 29, 2009
```
llvm-svn: 70425
```
  7e6e3527
Apr 27, 2009

2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. · 8d6d4b92

Nate Begeman authored Apr 27, 2009

PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

llvm-svn: 70225

8d6d4b92

Apr 24, 2009

Revert 69952. Causes testsuite failures on linux x86-64. · b93db668
Rafael Espindola authored Apr 24, 2009
```
llvm-svn: 69967
```
b93db668

PR2957 · bb881d66

Nate Begeman authored Apr 24, 2009

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask. A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.

llvm-svn: 69952

bb881d66

Apr 08, 2009

Re-apply 68552. · 3b2df10c

Rafael Espindola authored Apr 08, 2009

Tested by bootstrapping llvm-gcc and using that to build llvm.

llvm-svn: 68645

3b2df10c

Temporarily revert r68552. This was causing a failure in the self-hosting LLVM · 4aa25b79

Bill Wendling authored Apr 07, 2009

builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td

llvm-svn: 68560

4aa25b79

Apr 07, 2009

Reduce code duplication on the TLS implementation. · 1edda067

Rafael Espindola authored Apr 07, 2009

This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.

llvm-svn: 68552

1edda067

Feb 26, 2009

ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these... · 40abb7b5

Evan Cheng authored Feb 26, 2009

ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these intrinsics expect the high bits will not be modified.

llvm-svn: 65499

40abb7b5

Feb 23, 2009

Generate better code for v8i16 shuffles on SSE2 · e684da3e

Nate Begeman authored Feb 23, 2009

Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311

e684da3e

Feb 10, 2009
- Handle llvm.x86.sse2.maskmov.dqu in 64-bit. · 589a5394
  Evan Cheng authored Feb 10, 2009
```
llvm-svn: 64240
```
  589a5394
Feb 05, 2009
- A few more isAsCheapAsAMove. · 64fdacc2
  Evan Cheng authored Feb 05, 2009
```
llvm-svn: 63852
```
  64fdacc2
Jan 28, 2009

The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are... · f31f2888

Evan Cheng authored Jan 28, 2009

The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement.

llvm-svn: 63195

f31f2888

Jan 09, 2009
- Whitespace and other minor adjustments to make SSE instructions have · e907a0a5
  Dan Gohman authored Jan 09, 2009
```
the same formatting as their corresponding SSE2 instructions, for
consistency.

llvm-svn: 61971
```
  e907a0a5
Dec 18, 2008
- Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1. · 998fd29c
  Mon P Wang authored Dec 18, 2008
```
llvm-svn: 61211
```
  998fd29c
Dec 03, 2008

Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning. · 69cc2cbb
Dan Gohman authored Dec 03, 2008
```
llvm-svn: 60487
```
69cc2cbb

Mark x86's V_SET0 and V_SETALLONES with isSimpleLoad, and teach X86's · cc78cdf2

Dan Gohman authored Dec 03, 2008

foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.

Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).

llvm-svn: 60461

cc78cdf2

Oct 17, 2008

Fix lfence and mfence encoding. These look like MRM5r and MRM6r instructions... · 27c37022

Evan Cheng authored Oct 17, 2008

Fix lfence and mfence encoding. These look like MRM5r and MRM6r instructions except they do not have any operands. The RegModRM byte is encoded with register number 0.

llvm-svn: 57692

27c37022

Oct 16, 2008
- Fix the predicate for memop64 to be a regular load, not just · 6bae5268
  Dan Gohman authored Oct 16, 2008
```
an unindexed load.

llvm-svn: 57612
```
  6bae5268
Oct 15, 2008

Now that predicates can be composed, simplify several of · 29ad4397

Dan Gohman authored Oct 15, 2008

the predicates by extending simple predicates to create
more complex predicates instead of duplicating the logic
for the simple predicates.

This doesn't reduce much redundancy in DAGISelEmitter.cpp's
generated source yet; that will require improvements to
DAGISelEmitter.cpp's instruction sorting, to make it more
effectively group nodes with similar predicates together.

llvm-svn: 57565

29ad4397

Oct 11, 2008

Fix SSE4.1 roundss, roundsd. While the instructions have · 05b54c2a

Dale Johannesen authored Oct 10, 2008

the same pattern as roundpd/roundps, the Intel compiler 
builtins do not:  rounds* has an extra operand.  Fixes
gcc.target/i386/sse4_1-rounds[sd]-[1234].c

llvm-svn: 57370

05b54c2a