Skip to content
  1. Jul 21, 2013
  2. Jun 01, 2013
    • Tim Northover's avatar
      X86: change MOV64ri64i32 into MOV32ri64 · 3a1fd4c0
      Tim Northover authored
      The MOV64ri64i32 instruction required hacky MCInst lowering because it
      was allocated as setting a GR64, but the eventual instruction ("movl")
      only set a GR32. This converts it into a so-called "MOV32ri64" which
      still accepts a (appropriate) 64-bit immediate but defines a GR32.
      This is then converted to the full GR64 by a SUBREG_TO_REG operation,
      thus keeping everyone happy.
      
      This fixes a typo in the opcode field of the original patch, which
      should make the legact JIT work again (& adds test for that problem).
      
      llvm-svn: 183068
      3a1fd4c0
    • Eric Christopher's avatar
      Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it · e1e57e5e
      Eric Christopher authored
      seems to have caused PR16192 and other JIT related failures.
      
      llvm-svn: 183059
      e1e57e5e
  3. May 31, 2013
    • Tim Northover's avatar
      X86: change MOV64ri64i32 into MOV32ri64 · d4736d67
      Tim Northover authored
      The MOV64ri64i32 instruction required hacky MCInst lowering because it was
      allocated as setting a GR64, but the eventual instruction ("movl") only set a
      GR32. This converts it into a so-called "MOV32ri64" which still accepts a
      (appropriate) 64-bit immediate but defines a GR32. This is then converted to
      the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.
      
      llvm-svn: 182991
      d4736d67
  4. May 30, 2013
    • Tim Northover's avatar
      X86: use sub-register sequences for MOV*r0 operations · 64ec0ff4
      Tim Northover authored
      Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
      it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
      and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
      smaller and partial register updates can sometimes be avoided.
      
      Until recently, this sequence was a barrier to rematerialization though. That
      should now be fixed so it's an appropriate time to make the change.
      
      llvm-svn: 182928
      64ec0ff4
    • Tim Northover's avatar
      X86: change zext moves to use sub-register infrastructure. · 04eb4234
      Tim Northover authored
      32-bit writes on amd64 zero out the high bits of the corresponding 64-bit
      register. LLVM makes use of this for zero-extension, but until now relied on
      custom MCLowering and other code to fixup instructions. Now we have proper
      handling of sub-registers, this can be done by creating SUBREG_TO_REG
      instructions at selection-time.
      
      Should be no change in functionality.
      
      llvm-svn: 182921
      04eb4234
  5. Mar 26, 2013
  6. Mar 19, 2013
    • Jakob Stoklund Olesen's avatar
      Annotate X86InstrCompiler.td with SchedRW lists. · 9bd6b8bd
      Jakob Stoklund Olesen authored
      Add a new WriteZero SchedWrite type for the common dependency-breaking
      instructions that clear a register.
      
      llvm-svn: 177442
      9bd6b8bd
    • Ulrich Weigand's avatar
      Remove an invalid and unnecessary Pat pattern from the X86 backend: · 80d9ad39
      Ulrich Weigand authored
        def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
                  (MOV64rm tglobaltlsaddr :$dst)>;
      
      This pattern is invalid because the MOV64rm instruction expects a
      source operand of type "i64mem", which is a subclass of X86MemOperand
      and thus actually consists of five MI operands, but the Pat provides
      only a single MI operand ("tglobaltlsaddr" matches an SDnode of
      type ISD::TargetGlobalTLSAddress and provides a single output).
      
      Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
      instruction pattern would access uninitialized memory.  In addition,
      with the TableGen patch I'm about to check in, this would actually be
      reported as a build-time error.
      
      Fortunately, the pattern does in fact never match, for at least two
      independent reasons.
      
      First, the code generator actually never generates a pattern of the
      form (load (X86Wrapper (tglobaltlsaddr))).  For most combinations of
      TLS and code models, (tglobaltlsaddr) represents just an offset that
      needs to be added to some base register, so it is never directly
      dereferenced.  The only exception is the initial-exec model, where
      (tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
      which *is* in fact directly dereferenced: but in that case, the
      X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.
      
      Second, even if some patterns along those lines *were* ever generated,
      we should not need an extra Pat pattern to match it.  Instead, the
      original MOV64rm instruction pattern ought to match directly, since
      it uses an "addr" operand, which is implemented via the SelectAddr
      C++ routine; this routine is supposed to accept the full range of
      input DAGs that may be implemented by a single mov instruction,
      including those cases involving ISD::TargetGlobalTLSAddress (and
      actually does so e.g. in the initial-exec case as above).
      
      To avoid build breaks (due to the above-mentioned error) after the
      TableGen patch is checked in, I'm removing this Pat here.
      
      llvm-svn: 177426
      80d9ad39
  7. Feb 23, 2013
  8. Jan 22, 2013
    • Michael Liao's avatar
      Fix an issue of pseudo atomic instruction DAG schedule · 3dffc5e2
      Michael Liao authored
      - Add list of physical registers clobbered in pseudo atomic insts
        Physical registers are clobbered when pseudo atomic instructions are
        expanded. Add them in clobber list to prevent DAG scheduler to
        mis-schedule them after these insns are declared side-effect free.
      - Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>
      
      llvm-svn: 173200
      3dffc5e2
  9. Jan 07, 2013
  10. Dec 27, 2012
  11. Oct 16, 2012
    • Michael Liao's avatar
      Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a
      Michael Liao authored
      - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
        used as a light-weight replacement of setjmp/longjmp which are used to
        implementation continuation, user-level threading, and etc. The support added
        in this patch ONLY addresses this usage and is NOT intended to support SjLj
        exception handling as zero-cost DWARF exception handling is used by default
        in X86.
      
      llvm-svn: 165989
      97bf363a
  12. Oct 07, 2012
  13. Oct 05, 2012
  14. Sep 26, 2012
  15. Sep 22, 2012
  16. Sep 21, 2012
  17. Sep 20, 2012
    • Michael Liao's avatar
      Re-work X86 code generation of atomic ops with spin-loop · 3237662b
      Michael Liao authored
      - Rewrite/merge pseudo-atomic instruction emitters to address the
        following issue:
        * Reduce one unnecessary load in spin-loop
      
          previously the spin-loop looks like
      
              thisMBB:
              newMBB:
                ld  t1 = [bitinstr.addr]
                op  t2 = t1, [bitinstr.val]
                not t3 = t2  (if Invert)
                mov EAX = t1
                lcs dest = [bitinstr.addr], t3  [EAX is implicit]
                bz  newMBB
                fallthrough -->nextMBB
      
          the 'ld' at the beginning of newMBB should be lift out of the loop
          as lcs (or CMPXCHG on x86) will load the current memory value into
          EAX. This loop is refined as:
      
              thisMBB:
                EAX = LOAD [MI.addr]
              mainMBB:
                t1 = OP [MI.val], EAX
                LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
                JNE mainMBB
              sinkMBB:
      
        * Remove immopc as, so far, all pseudo-atomic instructions has
          all-register form only, there is no immedidate operand.
      
        * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
          td
      
        * Fix issues in PR13458
      
      - Add comprehensive tests on atomic ops on various data types.
        NOTE: Some of them are turned off due to missing functionality.
      
      - Revise tests due to the new spin-loop generated.
      
      llvm-svn: 164281
      3237662b
  18. Sep 13, 2012
  19. Jun 01, 2012
    • Hans Wennborg's avatar
      Implement the local-dynamic TLS model for x86 (PR3985) · 789acfb6
      Hans Wennborg authored
      This implements codegen support for accesses to thread-local variables
      using the local-dynamic model, and adds a clean-up pass so that the base
      address for the TLS block can be re-used between local-dynamic access on
      an execution path.
      
      llvm-svn: 157818
      789acfb6
  20. May 09, 2012
  21. May 07, 2012
    • Manman Ren's avatar
      X86: optimization for -(x != 0) · ef4e0479
      Manman Ren authored
      This patch will optimize -(x != 0) on X86
      FROM 
      cmpl	$0x01,%edi
      sbbl	%eax,%eax
      notl	%eax
      TO
      negl %edi
      sbbl %eax %eax
      
      In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
      def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
      
      rdar: 10961709
      llvm-svn: 156312
      ef4e0479
  22. Apr 04, 2012
    • Rafael Espindola's avatar
      Always compute all the bits in ComputeMaskedBits. · ba0a6cab
      Rafael Espindola authored
      This allows us to keep passing reduced masks to SimplifyDemandedBits, but
      know about all the bits if SimplifyDemandedBits fails. This allows instcombine
      to simplify cases like the one in the included testcase.
      
      llvm-svn: 154011
      ba0a6cab
  23. Mar 29, 2012
  24. Mar 19, 2012
  25. Feb 24, 2012
  26. Feb 16, 2012
  27. Jan 16, 2012
  28. Jan 12, 2012
  29. Dec 24, 2011
    • Chandler Carruth's avatar
      Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the · 7e9453e9
      Chandler Carruth authored
      X86ISelLowering C++ code. Because this is lowered via an xor wrapped
      around a bsr, we want the dagcombine which runs after isel lowering to
      have a chance to clean things up. In particular, it is very common to
      see code which looks like:
      
        (sizeof(x)*8 - 1) ^ __builtin_clz(x)
      
      Which is trying to compute the most significant bit of 'x'. That's
      actually the value computed directly by the 'bsr' instruction, but if we
      match it too late, we'll get completely redundant xor instructions.
      
      The more naive code for the above (subtracting rather than using an xor)
      still isn't handled correctly due to the dagcombine getting confused.
      
      Also, while here fix an issue spotted by inspection: we should have been
      expanding the zero-undef variants to the normal variants when there is
      an 'lzcnt' instruction. Do so, and test for this. We don't want to
      generate unnecessary 'bsr' instructions.
      
      These two changes fix some regressions in encoding and decoding
      benchmarks. However, there is still a *lot* to be improve on in this
      type of code.
      
      llvm-svn: 147244
      7e9453e9
  30. Dec 20, 2011
    • Chandler Carruth's avatar
      Begin teaching the X86 target how to efficiently codegen patterns that · 24680c24
      Chandler Carruth authored
      use the zero-undefined variants of CTTZ and CTLZ. These are just simple
      patterns for now, there is more to be done to make real world code using
      these constructs be optimized and codegen'ed properly on X86.
      
      The existing tests are spiffed up to check that we no longer generate
      unnecessary cmov instructions, and that we generate the very important
      'xor' to transform bsr which counts the index of the most significant
      one bit to the number of leading (most significant) zero bits. Also they
      now check that when the variant with defined zero result is used, the
      cmov is still produced.
      
      llvm-svn: 146974
      24680c24
  31. Oct 26, 2011
Loading