Commits · cf0843ed93d7b83c97e3e5b20ee06c638e138bb6 · Roger Ferrer / llvm-epi-0.8

Mar 19, 2010

Fixed the encoding problems of the crc32 instructions. All had the Operand size · cf0843ed

Kevin Enderby authored Mar 19, 2010

override prefix and only the r/m16 forms should have had that.  Also for variant
one, the AT&T syntax, added suffixes to all forms.  Also added the missing
64-bit form for 'CRC32 r64, r/m8'.  Plus added test cases for all forms and
tweaked one test case to add the needed suffixes.

llvm-svn: 98980

cf0843ed

Now that tblgen can handle matching implicit defs of instructions · 83facb08

Chris Lattner authored Mar 19, 2010

to input patterns, we can fix X86ISD::CMP and X86ISD::BT as taking
two inputs (which have to be the same type) and *returning an i32*.
This is how the SDNodes get made in the graph, but we weren't able
to model it this way due to deficiencies in the pattern language.

Now we can change things like this:

 def UCOM_FpIr80: FpI_<(outs), (ins RFP80:$lhs, RFP80:$rhs), CompareFP,
-                  [(X86cmp RFP80:$lhs, RFP80:$rhs),
-                   (implicit EFLAGS)]>; // CC = ST(0) cmp ST(i)
+                  [(set EFLAGS, (X86cmp RFP80:$lhs, RFP80:$rhs))]>;

and fix terrible crimes like this:

-def : Pat<(parallel (X86cmp GR8:$src1, 0), (implicit EFLAGS)),
+def : Pat<(X86cmp GR8:$src1, 0),
           (TEST8rr GR8:$src1, GR8:$src1)>;

This relies on matching the result of TEST8rr (which is EFLAGS, which is
an implicit def) to the result of X86cmp, an i32.

llvm-svn: 98903

83facb08

Mar 15, 2010
- fix a few more ambiguous types. · 26e62737
  Chris Lattner authored Mar 15, 2010
```
llvm-svn: 98531
```
  26e62737
Mar 08, 2010
- fix some more ambiguous patterns, remove another nontemporalstore · d8045649
  Chris Lattner authored Mar 08, 2010
```
pattern which is broken (source and address swapped).

llvm-svn: 97958
```
  d8045649
- remove a non-temporal store pattern which is not tested and · ca8d590c
  Chris Lattner authored Mar 08, 2010
```
could never have matched because the operand list was backwards.

llvm-svn: 97933
```
  ca8d590c
Feb 28, 2010

Implement XMM subregs. · bdd6405f

Dan Gohman authored Feb 28, 2010

Extracting the low element of a vector is now done with EXTRACT_SUBREG,
and the zero-extension performed by load movss is now modeled with
SUBREG_TO_REG, and so on.

Register-to-register movss and movsd are no longer considered copies;
they are two-address instructions which insert a scalar into a vector.

llvm-svn: 97354

bdd6405f

The mayHaveSideEffects flag is no longer used. · 8c5d683a
Dan Gohman authored Feb 27, 2010
```
llvm-svn: 97348
```
8c5d683a

Feb 26, 2010
- Delete a bunch of redundant predicates. · 9300486d
  Dan Gohman authored Feb 26, 2010
```
llvm-svn: 97201
```
  9300486d
Feb 23, 2010
- remove a bunch of dead named arguments in input patterns, · d1708923
  Chris Lattner authored Feb 23, 2010
```
though some look dubious afaict, these are all ok.

llvm-svn: 96899
```
  d1708923
Feb 18, 2010
- add a missing type cast. · fd47c797
  Chris Lattner authored Feb 18, 2010
```
llvm-svn: 96574
```
  fd47c797
Feb 16, 2010

· 9641d068

David Greene authored Feb 16, 2010

Add support for emitting non-temporal stores for DAGs marked
non-temporal.  Fix from r96241 for botched encoding of MOVNTDQ.

Add documentation for !nontemporal metadata.

Add a simpler movnt testcase.

llvm-svn: 96386

9641d068

Feb 15, 2010
- revert r96241. It breaks two regression tests, isn't documented, · bcbaaba5
  Chris Lattner authored Feb 15, 2010
```
and the testcase needs improvement.

llvm-svn: 96265
```
  bcbaaba5
- · 63cedef7
  David Greene authored Feb 15, 2010
```
Add support for emitting non-temporal stores for DAGs marked
non-temporal.

llvm-svn: 96241
```
  63cedef7
Feb 13, 2010
- Remove special cases for [LM]FENCE, MONITOR and MWAIT from · 064e9263
  Chris Lattner authored Feb 12, 2010
```
encoder and decoder by using new MRM_ forms.

llvm-svn: 96048
```
  064e9263
Feb 12, 2010

Add a missing pattern for movhps so that we get: · c780af64

Nate Begeman authored Feb 12, 2010

movq	(%ecx,%edx,2), %xmm2
movhps	(%ecx,%eax,2), %xmm2

rather than:

movq     (%eax, %edx, 2), %xmm2		
movq     (%eax, %ebx, 2), %xmm3		
movlhps  %xmm3, %xmm2			

Testcase forthcoming.

llvm-svn: 95948

c780af64

Feb 10, 2010
- Fix the encoding of the movntdqa X86 instruction. It was missing the 0x66 · a7c1d6cf
  Kevin Enderby authored Feb 10, 2010
```
prefix which is part of the opcode encoding.

llvm-svn: 95729
```
  a7c1d6cf
Feb 05, 2010
- really kill off the last MRMInitReg inst, remove logic from encoder. · 86bd1942
  Chris Lattner authored Feb 05, 2010
```
llvm-svn: 95437
```
  86bd1942
- lower the last of the MRMInitReg instructions in MCInstLower. · e96d534c
  Chris Lattner authored Feb 05, 2010
```
llvm-svn: 95435
```
  e96d534c
Jan 11, 2010

· 206351a1

David Greene authored Jan 11, 2010

Implement a feature (-vector-unaligned-mem) to allow targets to
ignore alignment requirements for SIMD memory operands.  This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.

llvm-svn: 93151

206351a1

Dec 22, 2009

Remove target attribute break-sse-dep. Instead, do not fold load into sse... · 71d7eaa8

Evan Cheng authored Dec 22, 2009

Remove target attribute break-sse-dep. Instead, do not fold load into sse partial update instructions unless optimizing for size.

llvm-svn: 91910

71d7eaa8

Dec 18, 2009

On recent Intel u-arch's, folding loads into some unary SSE instructions can · 4cf30b72

Evan Cheng authored Dec 18, 2009

be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.

movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0

instead of
cvtss2sd (%rdi), %xmm0

An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0

llvm-svn: 91672

4cf30b72

Instruction fixes, added instructions, and AsmString changes in the · 04d8cb74

Sean Callanan authored Dec 18, 2009

X86 instruction tables.

Also (while I was at it) cleaned up the X86 tables, removing tabs and
80-line violations.

This patch was reviewed by Chris Lattner, but please let me know if
there are any problems.

* X86*.td
	Removed tabs and fixed 80-line violations

* X86Instr64bit.td
	(IRET, POPCNT, BT_, LSL, SWPGS, PUSH_S, POP_S, L_S, SMSW)
		Added
	(CALL, CMOV) Added qualifiers
	(JMP) Added PC-relative jump instruction
	(POPFQ/PUSHFQ) Added qualifiers; renamed PUSHFQ to indicate
		that it is 64-bit only (ambiguous since it has no
		REX prefix)
	(MOV) Added rr form going the other way, which is encoded
		differently
	(MOV) Changed immediates to offsets, which is more correct;
		also fixed MOV64o64a to have to a 64-bit offset
	(MOV) Fixed qualifiers
	(MOV) Added debug-register and condition-register moves
	(MOVZX) Added more forms
	(ADC, SUB, SBB, AND, OR, XOR) Added reverse forms, which
		(as with MOV) are encoded differently
	(ROL) Made REX.W required
	(BT) Uncommented mr form for disassembly only
	(CVT__2__) Added several missing non-intrinsic forms
	(LXADD, XCHG) Reordered operands to make more sense for
		MRMSrcMem
	(XCHG) Added register-to-register forms
	(XADD, CMPXCHG, XCHG) Added non-locked forms
* X86InstrSSE.td
	(CVTSS2SI, COMISS, CVTTPS2DQ, CVTPS2PD, CVTPD2PS, MOVQ)
		Added
* X86InstrFPStack.td
	(COM_FST0, COMP_FST0, COM_FI, COM_FIP, FFREE, FNCLEX, FNOP,
	 FXAM, FLDL2T, FLDL2E, FLDPI, FLDLG2, FLDLN2, F2XM1, FYL2X,
	 FPTAN, FPATAN, FXTRACT, FPREM1, FDECSTP, FINCSTP, FPREM,
	 FYL2XP1, FSINCOS, FRNDINT, FSCALE, FCOMPP, FXSAVE,
	 FXRSTOR)
		Added
	(FCOM, FCOMP) Added qualifiers
	(FSTENV, FSAVE, FSTSW) Fixed opcode names
	(FNSTSW) Added implicit register operand
* X86InstrInfo.td
	(opaque512mem) Added for FXSAVE/FXRSTOR
	(offset8, offset16, offset32, offset64) Added for MOV
	(NOOPW, IRET, POPCNT, IN, BTC, BTR, BTS, LSL, INVLPG, STR,
	 LTR, PUSHFS, PUSHGS, POPFS, POPGS, LDS, LSS, LES, LFS,
	 LGS, VERR, VERW, SGDT, SIDT, SLDT, LGDT, LIDT, LLDT,
	 LODSD, OUTSB, OUTSW, OUTSD, HLT, RSM, FNINIT, CLC, STC,
	 CLI, STI, CLD, STD, CMC, CLTS, XLAT, WRMSR, RDMSR, RDPMC,
	 SMSW, LMSW, CPUID, INVD, WBINVD, INVEPT, INVVPID, VMCALL,
	 VMCLEAR, VMLAUNCH, VMRESUME, VMPTRLD, VMPTRST, VMREAD,
	 VMWRITE, VMXOFF, VMXON) Added
	(NOOPL, POPF, POPFD, PUSHF, PUSHFD) Added qualifier
	(JO, JNO, JB, JAE, JE, JNE, JBE, JA, JS, JNS, JP, JNP, JL,
	 JGE, JLE, JG, JCXZ) Added 32-bit forms
	(MOV) Changed some immediate forms to offset forms
	(MOV) Added reversed reg-reg forms, which are encoded
		differently
	(MOV) Added debug-register and condition-register moves
	(CMOV) Added qualifiers
	(AND, OR, XOR, ADC, SUB, SBB) Added reverse forms, like MOV
	(BT) Uncommented memory-register forms for disassembler
	(MOVSX, MOVZX) Added forms
	(XCHG, LXADD) Made operand order make sense for MRMSrcMem
	(XCHG) Added register-register forms
	(XADD, CMPXCHG) Added unlocked forms
* X86InstrMMX.td
	(MMX_MOVD, MMV_MOVQ) Added forms
* X86InstrInfo.cpp: Changed PUSHFQ to PUSHFQ64 to reflect table
	change

* X86RegisterInfo.td: Added debug and condition register sets
* x86-64-pic-3.ll: Fixed testcase to reflect call qualifier
* peep-test-3.ll: Fixed testcase to reflect test qualifier
* cmov.ll: Fixed testcase to reflect cmov qualifier
* loop-blocks.ll: Fixed testcase to reflect call qualifier
* x86-64-pic-11.ll: Fixed testcase to reflect call qualifier
* 2009-11-04-SubregCoalescingBug.ll: Fixed testcase to reflect call
  qualifier
* x86-64-pic-2.ll: Fixed testcase to reflect call qualifier
* live-out-reg-info.ll: Fixed testcase to reflect test qualifier
* tail-opts.ll: Fixed testcase to reflect call qualifiers
* x86-64-pic-10.ll: Fixed testcase to reflect call qualifier
* bss-pagealigned.ll: Fixed testcase to reflect call qualifier
* x86-64-pic-1.ll: Fixed testcase to reflect call qualifier
* widen_load-1.ll: Fixed testcase to reflect call qualifier

llvm-svn: 91638

04d8cb74

Dec 09, 2009

Optimize splat of a scalar load into a shuffle of a vector load when it's legal. e.g. · 493b882f

Evan Cheng authored Dec 09, 2009

vector_shuffle (scalar_to_vector (i32 load (ptr + 4))), undef, <0, 0, 0, 0>
=>
vector_shuffle (v4i32 load ptr), undef, <1, 1, 1, 1>

iff ptr is 16-byte aligned (or can be made into 16-byte aligned).

llvm-svn: 90984

493b882f

Nov 20, 2009

Recommitting PALIGNR shift width fixes. · c1f532e9

Sean Callanan authored Nov 20, 2009

Thanks to Daniel Dunbar for fixing clang intrinsics:
  http://llvm.org/viewvc/llvm-project?view=rev&revision=89499

llvm-svn: 89500

c1f532e9

Reverting PALIGNR fix until I figure out how this · 19d92728
Sean Callanan authored Nov 20, 2009
```
broke the Clang testsuite.

llvm-svn: 89495
```
19d92728

Fixed PALIGNR to take 8-bit rotations in all cases. · fbed1301

Sean Callanan authored Nov 20, 2009

Also fixed the corresponding testcase, and the PALIGNR
  intrinsic (tested for correctness with llvm-gcc).

llvm-svn: 89491

fbed1301

Nov 17, 2009
- Re-apply 89011. It's not to be blamed. · 5392cc9d
  Evan Cheng authored Nov 17, 2009
```
llvm-svn: 89081
```
  5392cc9d
- Revert 89011. Buildbot thinks it might be breaking stuff. · 05938e81
  Evan Cheng authored Nov 17, 2009
```
llvm-svn: 89076
```
  05938e81
- A few more instructions that should be marked re-materializable. · ce28f6f4
  Evan Cheng authored Nov 17, 2009
```
llvm-svn: 89011
```
  ce28f6f4
Nov 16, 2009

- Check memoperand alignment instead of checking stack alignment. Most load /... · f25ef4ff

Evan Cheng authored Nov 16, 2009

- Check memoperand alignment instead of checking stack alignment. Most load / store folding instructions are not referencing spill stack slots.
- Mark MOVUPSrm re-materializable.

llvm-svn: 88974

f25ef4ff

Nov 08, 2009

x86 vector shuffle cleanup/fixes: · 3a313df6

Nate Begeman authored Nov 07, 2009

1. rename the movhp patfrag to movlhps, since thats what it actually matches
2. eliminate the bogus movhps load and store patterns, they were incorrect. The load transforms are already handled (correctly) by shufps/unpack.
3. revert a recent test change to its correct form.

llvm-svn: 86415

3a313df6

Nov 07, 2009

Fix a couple of shuffle patterns to use movhlps instead · bd05185e

Eric Christopher authored Nov 07, 2009

of movhps as the constraint.  Changes optimizations so
update testcases as appropriate as well.

llvm-svn: 86360

bd05185e

Oct 29, 2009

Rename usesCustomDAGSchedInserter to usesCustomInserter, and update a · 453d64c9

Dan Gohman authored Oct 29, 2009

bunch of associated comments, because it doesn't have anything to do
with DAGs or scheduling. This is another step in decoupling MachineInstr
emitting from scheduling.

llvm-svn: 85517

453d64c9

Oct 28, 2009
- X86 palignr intrinsics immediate field is in bits. ISel must transform it into bytes. · f64db3e1
  Evan Cheng authored Oct 28, 2009
```
llvm-svn: 85379
```
  f64db3e1
Oct 19, 2009
- Add support for matching shuffle patterns with palignr. · 18df82a2
  Nate Begeman authored Oct 19, 2009
```
llvm-svn: 84459
```
  18df82a2
Sep 21, 2009
- Add support for rematerializing FsFLD0SS and FsFLD0SD as constant-pool · 69499b13
  Dan Gohman authored Sep 21, 2009
```
loads in order to reduce register pressure.

llvm-svn: 82470
```
  69499b13
Sep 16, 2009

Added a variety of floating-point and SSE instructions. · e739ac89

Sean Callanan authored Sep 16, 2009

All of these do not have patterns (they're for the
disassembler).

Many of the floating-point instructions will probably
be rolled into definitions that have patterns, and may
eventually be superseded by mdefs.  So I put them
together and left a comment.

llvm-svn: 81979

e739ac89

Aug 20, 2009
- Fixed PCMPESTRM128 to have opcode 0x60 instead of 0x62, as specified by the · 46bb77f2
  Sean Callanan authored Aug 20, 2009
```
Intel documentation.

llvm-svn: 79554
```
  46bb77f2
Aug 19, 2009

Implement sse4.2 string/text processing instructions: · 9fe912de

Eric Christopher authored Aug 18, 2009

Add patterns and instruction encoding information.
Add custom lowering to deal with hardwired return register of
uncertain type (xmm0).

llvm-svn: 79377

9fe912de

Aug 12, 2009

Add 'isCodeGenOnly' bit to Instruction .td records. · c4f8ea4c

Daniel Dunbar authored Aug 11, 2009

 - Used to mark fake instructions which don't correspond to an actual machine
   instruction (or are duplicates of a real instruction). This is to be used for
   "special cases" in the .td files, which should be ignored by things like the
   assembler and disassembler. We still need a good solution to handle pervasive
   duplication, like with the Int_ instructions.

 - Set the bit on fake "mov 0" style instructions, which allows turning an
   assembler matcher warning into a hard error.

 - -2 FIXMEs.

llvm-svn: 78731

c4f8ea4c