Commits · 7dee697faa5361fc953909b8d4547f2d08af5cab · Roger Ferrer / llvm-epi-0.8

Jul 21, 2013
- Mark that the _ftol2 function used by windows on x86 to handle fptoui modifies ECX. · 8956fe0d
  Craig Topper authored Jul 21, 2013
```
llvm-svn: 186787
```
  8956fe0d
Jun 01, 2013

X86: change MOV64ri64i32 into MOV32ri64 · 3a1fd4c0

Tim Northover authored Jun 01, 2013

The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.

This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).

llvm-svn: 183068

3a1fd4c0

Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it · e1e57e5e
Eric Christopher authored May 31, 2013
```
seems to have caused PR16192 and other JIT related failures.

llvm-svn: 183059
```
e1e57e5e

May 31, 2013

X86: change MOV64ri64i32 into MOV32ri64 · d4736d67

Tim Northover authored May 31, 2013

The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.

llvm-svn: 182991

d4736d67

May 30, 2013

X86: use sub-register sequences for MOV*r0 operations · 64ec0ff4

Tim Northover authored May 30, 2013

Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.

Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.

llvm-svn: 182928

64ec0ff4

X86: change zext moves to use sub-register infrastructure. · 04eb4234

Tim Northover authored May 30, 2013

32-bit writes on amd64 zero out the high bits of the corresponding 64-bit
register. LLVM makes use of this for zero-extension, but until now relied on
custom MCLowering and other code to fixup instructions. Now we have proper
handling of sub-registers, this can be done by creating SUBREG_TO_REG
instructions at selection-time.

Should be no change in functionality.

llvm-svn: 182921

04eb4234

Mar 26, 2013
- Annotate X86InstrCompiler.td with SchedRW lists. · 5889ad6c
  Jakob Stoklund Olesen authored Mar 25, 2013
```
llvm-svn: 177936
```
  5889ad6c
Mar 19, 2013

Annotate X86InstrCompiler.td with SchedRW lists. · 9bd6b8bd

Jakob Stoklund Olesen authored Mar 19, 2013

Add a new WriteZero SchedWrite type for the common dependency-breaking
instructions that clear a register.

llvm-svn: 177442

9bd6b8bd

Remove an invalid and unnecessary Pat pattern from the X86 backend: · 80d9ad39

Ulrich Weigand authored Mar 19, 2013

  def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
            (MOV64rm tglobaltlsaddr :$dst)>;

This pattern is invalid because the MOV64rm instruction expects a
source operand of type "i64mem", which is a subclass of X86MemOperand
and thus actually consists of five MI operands, but the Pat provides
only a single MI operand ("tglobaltlsaddr" matches an SDnode of
type ISD::TargetGlobalTLSAddress and provides a single output).

Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
instruction pattern would access uninitialized memory.  In addition,
with the TableGen patch I'm about to check in, this would actually be
reported as a build-time error.

Fortunately, the pattern does in fact never match, for at least two
independent reasons.

First, the code generator actually never generates a pattern of the
form (load (X86Wrapper (tglobaltlsaddr))).  For most combinations of
TLS and code models, (tglobaltlsaddr) represents just an offset that
needs to be added to some base register, so it is never directly
dereferenced.  The only exception is the initial-exec model, where
(tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
which *is* in fact directly dereferenced: but in that case, the
X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.

Second, even if some patterns along those lines *were* ever generated,
we should not need an extra Pat pattern to match it.  Instead, the
original MOV64rm instruction pattern ought to match directly, since
it uses an "addr" operand, which is implemented via the SelectAddr
C++ routine; this routine is supposed to accept the full range of
input DAGs that may be implemented by a single mov instruction,
including those cases involving ISD::TargetGlobalTLSAddress (and
actually does so e.g. in the initial-exec case as above).

To avoid build breaks (due to the above-mentioned error) after the
TableGen patch is checked in, I'm removing this Pat here.

llvm-svn: 177426

80d9ad39

Feb 23, 2013
- X86: Disable cmov-memory patterns on subtargets without cmov. · ee23dcb4
  Benjamin Kramer authored Feb 23, 2013
```
Fixes PR15115.

llvm-svn: 175962
```
  ee23dcb4
Jan 22, 2013

Fix an issue of pseudo atomic instruction DAG schedule · 3dffc5e2

Michael Liao authored Jan 22, 2013

- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200

3dffc5e2

Jan 07, 2013
- Remove # from the beginning and end of def names. · 25cdf92b
  Craig Topper authored Jan 07, 2013
```
llvm-svn: 171696
```
  25cdf92b
Dec 27, 2012
- Add hasSideEffects=0 to some atomic instructions. · d47a70de
  Craig Topper authored Dec 26, 2012
```
llvm-svn: 171122
```
  d47a70de
Oct 16, 2012

Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a

Michael Liao authored Oct 15, 2012

- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989

97bf363a

Oct 07, 2012
- X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others. · 302178bf
  Benjamin Kramer authored Oct 07, 2012
```
Otherwise it will try to use SSE patterns and fail horribly if sse is disabled.
Fixes PR14035.

llvm-svn: 165377
```
  302178bf
Oct 05, 2012
- Remove some encoding bits I forgot to remove from SETB_C16r and SETB_C64r in r165302. · 0cb6acb7
  Craig Topper authored Oct 05, 2012
```
llvm-svn: 165303
```
  0cb6acb7
- Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos... · 9384902e
  Craig Topper authored Oct 05, 2012
```
Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos and mark them as pseudos in the td file.

llvm-svn: 165302
```
  9384902e
Sep 26, 2012

Add 'lock' prefix output support in assembly printer · 425c0dbc

Michael Liao authored Sep 26, 2012

- Instead of embedding 'lock' into each mnemonic of atomic
  instructions except 'xchg', we teach X86 assembly printer to output 'lock'
  prefix similar to or consistent with code emitter.

llvm-svn: 164659

425c0dbc

Sep 22, 2012
- Fix 16-bit atomic inst encoding and keep pseudo-inst starting with '#' · 2718b200
  Michael Liao authored Sep 22, 2012
```
llvm-svn: 164453
```
  2718b200
- Fix typo in r164357 · 2456b3ae
  Michael Liao authored Sep 22, 2012
```
llvm-svn: 164452
```
  2456b3ae
Sep 21, 2012
- Fix a typo in r164357 · 7325a9d0
  Michael Liao authored Sep 21, 2012
```
llvm-svn: 164372
```
  7325a9d0
- Revise td of X86 atomic instructions · c33bebff
  Michael Liao authored Sep 21, 2012
```
- Rewirte most atomic instructions in templates for both better
  maintenance and future extensions, such as HLE in TSX.

llvm-svn: 164357
```
  c33bebff
Sep 20, 2012

Re-work X86 code generation of atomic ops with spin-loop · 3237662b

Michael Liao authored Sep 20, 2012

- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281

3237662b

Sep 13, 2012

Fix the TCRETURNmi64 bug differently. · 3cf3ffce

Jakob Stoklund Olesen authored Sep 13, 2012

Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.

<rdar://problem/12282281>

llvm-svn: 163819

3cf3ffce

Revert r163761 "Don't fold indexed loads into TCRETURNmi64." · 78b9f8fc
Jakob Stoklund Olesen authored Sep 13, 2012
```
The patch caused "Wrong topological sorting" assertions.

llvm-svn: 163810
```
78b9f8fc

Don't fold indexed loads into TCRETURNmi64. · bfacef45

Jakob Stoklund Olesen authored Sep 13, 2012

We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.

This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.

<rdar://problem/12282281>

llvm-svn: 163761

bfacef45

Jun 01, 2012

Implement the local-dynamic TLS model for x86 (PR3985) · 789acfb6

Hans Wennborg authored Jun 01, 2012

This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818

789acfb6

May 09, 2012

Use ptr_rc_tailcall instead of GR32_TC. · 7e21d617

Jakob Stoklund Olesen authored May 09, 2012

The getPointerRegClass() hook will return GR32_TC, or whatever is
appropriate for the current function.

Patch by Yiannis Tsiouris!

llvm-svn: 156459

7e21d617

May 07, 2012

X86: optimization for -(x != 0) · ef4e0479

Manman Ren authored May 07, 2012

This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709
llvm-svn: 156312

ef4e0479

Apr 04, 2012

Always compute all the bits in ComputeMaskedBits. · ba0a6cab

Rafael Espindola authored Apr 04, 2012

This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011

ba0a6cab

Mar 29, 2012
- Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode. · 5569ce7d
  Lang Hames authored Mar 29, 2012
```
llvm-svn: 153680
```
  5569ce7d
Mar 19, 2012

This patch adds X86 instruction itineraries for non-pseudo opcodes in · 48ccc4df

Preston Gurd authored Mar 19, 2012

X86InstrCompiler.td.
 
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.

llvm-svn: 153033

48ccc4df

Feb 24, 2012
- Add WIN_FTOL_* psudo-instructions to model the unique calling convention · 248d65e7
  Michael J. Spencer authored Feb 24, 2012
```
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!

llvm-svn: 151382
```
  248d65e7
Feb 16, 2012

Use the same CALL instructions for Windows as for everything else. · 97e3115d

Jakob Stoklund Olesen authored Feb 16, 2012

The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.

llvm-svn: 150708

97e3115d

Jan 16, 2012
- Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768. · 206ca569
  Eli Friedman authored Jan 16, 2012
```
llvm-svn: 148240
```
  206ca569
- Get rid of unused codegen-only instruction. · 75e3db4c
  Eli Friedman authored Jan 16, 2012
```
llvm-svn: 148239
```
  75e3db4c
Jan 12, 2012
- X86: Generalize the x << (y & const) optimization to also catch masks with... · 5b3aa60b
  Benjamin Kramer authored Jan 12, 2012
```
X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.

llvm-svn: 148024
```
  5b3aa60b
Dec 24, 2011

Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the · 7e9453e9

Chandler Carruth authored Dec 24, 2011

X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244

7e9453e9

Dec 20, 2011

Begin teaching the X86 target how to efficiently codegen patterns that · 24680c24

Chandler Carruth authored Dec 20, 2011

use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974

24680c24

Oct 26, 2011
- Fixes an issue reported by -verify-machineinstrs. · b3285224
  Rafael Espindola authored Oct 26, 2011
```
Patch by Sanjoy Das.

llvm-svn: 143064
```
  b3285224