- Jul 21, 2013
-
-
Craig Topper authored
llvm-svn: 186787
-
- Jun 01, 2013
-
-
Tim Northover authored
The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. This fixes a typo in the opcode field of the original patch, which should make the legact JIT work again (& adds test for that problem). llvm-svn: 183068
-
Eric Christopher authored
seems to have caused PR16192 and other JIT related failures. llvm-svn: 183059
-
- May 31, 2013
-
-
Tim Northover authored
The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. llvm-svn: 182991
-
- May 30, 2013
-
-
Tim Northover authored
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions, it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg") and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is smaller and partial register updates can sometimes be avoided. Until recently, this sequence was a barrier to rematerialization though. That should now be fixed so it's an appropriate time to make the change. llvm-svn: 182928
-
Tim Northover authored
32-bit writes on amd64 zero out the high bits of the corresponding 64-bit register. LLVM makes use of this for zero-extension, but until now relied on custom MCLowering and other code to fixup instructions. Now we have proper handling of sub-registers, this can be done by creating SUBREG_TO_REG instructions at selection-time. Should be no change in functionality. llvm-svn: 182921
-
- Mar 26, 2013
-
-
Jakob Stoklund Olesen authored
llvm-svn: 177936
-
- Mar 19, 2013
-
-
Jakob Stoklund Olesen authored
Add a new WriteZero SchedWrite type for the common dependency-breaking instructions that clear a register. llvm-svn: 177442
-
Ulrich Weigand authored
def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))), (MOV64rm tglobaltlsaddr :$dst)>; This pattern is invalid because the MOV64rm instruction expects a source operand of type "i64mem", which is a subclass of X86MemOperand and thus actually consists of five MI operands, but the Pat provides only a single MI operand ("tglobaltlsaddr" matches an SDnode of type ISD::TargetGlobalTLSAddress and provides a single output). Thus, if the pattern were ever matched, subsequent uses of the MOV64rm instruction pattern would access uninitialized memory. In addition, with the TableGen patch I'm about to check in, this would actually be reported as a build-time error. Fortunately, the pattern does in fact never match, for at least two independent reasons. First, the code generator actually never generates a pattern of the form (load (X86Wrapper (tglobaltlsaddr))). For most combinations of TLS and code models, (tglobaltlsaddr) represents just an offset that needs to be added to some base register, so it is never directly dereferenced. The only exception is the initial-exec model, where (tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot, which *is* in fact directly dereferenced: but in that case, the X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match. Second, even if some patterns along those lines *were* ever generated, we should not need an extra Pat pattern to match it. Instead, the original MOV64rm instruction pattern ought to match directly, since it uses an "addr" operand, which is implemented via the SelectAddr C++ routine; this routine is supposed to accept the full range of input DAGs that may be implemented by a single mov instruction, including those cases involving ISD::TargetGlobalTLSAddress (and actually does so e.g. in the initial-exec case as above). To avoid build breaks (due to the above-mentioned error) after the TableGen patch is checked in, I'm removing this Pat here. llvm-svn: 177426
-
- Feb 23, 2013
-
-
Benjamin Kramer authored
Fixes PR15115. llvm-svn: 175962
-
- Jan 22, 2013
-
-
Michael Liao authored
- Add list of physical registers clobbered in pseudo atomic insts Physical registers are clobbered when pseudo atomic instructions are expanded. Add them in clobber list to prevent DAG scheduler to mis-schedule them after these insns are declared side-effect free. - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com> llvm-svn: 173200
-
- Jan 07, 2013
-
-
Craig Topper authored
llvm-svn: 171696
-
- Dec 27, 2012
-
-
Craig Topper authored
llvm-svn: 171122
-
- Oct 16, 2012
-
-
Michael Liao authored
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989
-
- Oct 07, 2012
-
-
Benjamin Kramer authored
Otherwise it will try to use SSE patterns and fail horribly if sse is disabled. Fixes PR14035. llvm-svn: 165377
-
- Oct 05, 2012
-
-
Craig Topper authored
llvm-svn: 165303
-
Craig Topper authored
Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos and mark them as pseudos in the td file. llvm-svn: 165302
-
- Sep 26, 2012
-
-
Michael Liao authored
- Instead of embedding 'lock' into each mnemonic of atomic instructions except 'xchg', we teach X86 assembly printer to output 'lock' prefix similar to or consistent with code emitter. llvm-svn: 164659
-
- Sep 22, 2012
-
-
Michael Liao authored
llvm-svn: 164453
-
Michael Liao authored
llvm-svn: 164452
-
- Sep 21, 2012
-
-
Michael Liao authored
llvm-svn: 164372
-
Michael Liao authored
- Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357
-
- Sep 20, 2012
-
-
Michael Liao authored
- Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281
-
- Sep 13, 2012
-
-
Jakob Stoklund Olesen authored
Add a PatFrag to match X86tcret using 6 fixed registers or less. This avoids folding loads into TCRETURNmi64 using 7 or more volatile registers. <rdar://problem/12282281> llvm-svn: 163819
-
Jakob Stoklund Olesen authored
The patch caused "Wrong topological sorting" assertions. llvm-svn: 163810
-
Jakob Stoklund Olesen authored
We don't have enough GR64_TC registers when calling a varargs function with 6 arguments. Since %al holds the number of vector registers used, only %r11 is available as a scratch register. This means that addressing modes using both base and index registers can't be folded into TCRETURNmi64. <rdar://problem/12282281> llvm-svn: 163761
-
- Jun 01, 2012
-
-
Hans Wennborg authored
This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818
-
- May 09, 2012
-
-
Jakob Stoklund Olesen authored
The getPointerRegClass() hook will return GR32_TC, or whatever is appropriate for the current function. Patch by Yiannis Tsiouris! llvm-svn: 156459
-
- May 07, 2012
-
-
Manman Ren authored
This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312
-
- Apr 04, 2012
-
-
Rafael Espindola authored
This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011
-
- Mar 29, 2012
-
-
Lang Hames authored
llvm-svn: 153680
-
- Mar 19, 2012
-
-
Preston Gurd authored
X86InstrCompiler.td. It also adds –mcpu-generic to the legalize-shift-64.ll test so the test will pass if run on an Intel Atom CPU, which would otherwise produce an instruction schedule which differs from that which the test expects. llvm-svn: 153033
-
- Feb 24, 2012
-
-
Michael J. Spencer authored
used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382
-
- Feb 16, 2012
-
-
Jakob Stoklund Olesen authored
The different calling conventions and call-preserved registers are represented with regmask operands that are added dynamically. llvm-svn: 150708
-
- Jan 16, 2012
-
-
Eli Friedman authored
llvm-svn: 148240
-
Eli Friedman authored
llvm-svn: 148239
-
- Jan 12, 2012
-
-
Benjamin Kramer authored
X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63. llvm-svn: 148024
-
- Dec 24, 2011
-
-
Chandler Carruth authored
X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)*8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a *lot* to be improve on in this type of code. llvm-svn: 147244
-
- Dec 20, 2011
-
-
Chandler Carruth authored
use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974
-
- Oct 26, 2011
-
-
Rafael Espindola authored
Patch by Sanjoy Das. llvm-svn: 143064
-