Skip to content
  1. Nov 15, 2011
  2. Nov 03, 2011
  3. Oct 29, 2011
  4. Oct 28, 2011
    • Dan Gohman's avatar
      Reapply r143177 and r143179 (reverting r143188), with scheduler · 73057ad2
      Dan Gohman authored
      fixes: Use a separate register, instead of SP, as the
      calling-convention resource, to avoid spurious conflicts with
      actual uses of SP. Also, fix unscheduling of calling sequences,
      which can be triggered by pseudo-two-address dependencies.
      
      llvm-svn: 143206
      73057ad2
    • Duncan Sands's avatar
      Speculatively disable Dan's commits 143177 and 143179 to see if · 225a7037
      Duncan Sands authored
      it fixes the dragonegg self-host (it looks like gcc is miscompiled).
      Original commit messages:
      Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
      on every node as it legalizes them. This makes it easier to use
      hasOneUse() heuristics, since unneeded nodes can be removed from the
      DAG earlier.
      
      Make LegalizeOps visit the DAG in an operands-last order. It previously
      used operands-first, because LegalizeTypes has to go operands-first, and
      LegalizeTypes used to be part of LegalizeOps, but they're now split.
      The operands-last order is more natural for several legalization tasks.
      For example, it allows lowering code for nodes with floating-point or
      vector constants to see those constants directly instead of seeing the
      lowered form (often constant-pool loads). This makes some things
      somewhat more complicated today, though it ought to allow things to be
      simpler in the future. It also fixes some bugs exposed by Legalizing
      using RAUW aggressively.
      
      Remove the part of LegalizeOps that attempted to patch up invalid chain
      operands on libcalls generated by LegalizeTypes, since it doesn't work
      with the new LegalizeOps traversal order. Instead, define what
      LegalizeTypes is doing to be correct, and transfer the responsibility
      of keeping calls from having overlapping calling sequences into the
      scheduler.
      
      Teach the scheduler to model callseq_begin/end pairs as having a
      physical register definition/use to prevent calls from having
      overlapping calling sequences. This is also somewhat complicated, though
      there are ways it might be simplified in the future.
      
      This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
      Please direct high-level questions about this patch to management.
      
      Delete #if 0 code accidentally left in.
      
      llvm-svn: 143188
      225a7037
    • Dan Gohman's avatar
      Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW · 4db3f7dd
      Dan Gohman authored
      on every node as it legalizes them. This makes it easier to use
      hasOneUse() heuristics, since unneeded nodes can be removed from the
      DAG earlier.
      
      Make LegalizeOps visit the DAG in an operands-last order. It previously
      used operands-first, because LegalizeTypes has to go operands-first, and
      LegalizeTypes used to be part of LegalizeOps, but they're now split.
      The operands-last order is more natural for several legalization tasks.
      For example, it allows lowering code for nodes with floating-point or
      vector constants to see those constants directly instead of seeing the
      lowered form (often constant-pool loads). This makes some things
      somewhat more complicated today, though it ought to allow things to be
      simpler in the future. It also fixes some bugs exposed by Legalizing
      using RAUW aggressively.
      
      Remove the part of LegalizeOps that attempted to patch up invalid chain
      operands on libcalls generated by LegalizeTypes, since it doesn't work
      with the new LegalizeOps traversal order. Instead, define what
      LegalizeTypes is doing to be correct, and transfer the responsibility
      of keeping calls from having overlapping calling sequences into the
      scheduler.
      
      Teach the scheduler to model callseq_begin/end pairs as having a
      physical register definition/use to prevent calls from having
      overlapping calling sequences. This is also somewhat complicated, though
      there are ways it might be simplified in the future.
      
      This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
      Please direct high-level questions about this patch to management.
      
      llvm-svn: 143177
      4db3f7dd
  5. Oct 08, 2011
    • Jakob Stoklund Olesen's avatar
      Add TEST8ri_NOREX pseudo to constrain sub_8bit_hi copies. · 729abd36
      Jakob Stoklund Olesen authored
      In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
      instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
      target all GR8 registers, only those in GR8_NOREX.
      
      TO enforce this, we ensure that all instructions using the
      EXTRACT_SUBREG are GR8_NOREX constrained.
      
      This fixes PR11088.
      
      llvm-svn: 141499
      729abd36
  6. Aug 01, 2011
  7. Jul 13, 2011
  8. Jul 02, 2011
  9. Jun 30, 2011
  10. May 20, 2011
  11. May 17, 2011
  12. May 11, 2011
  13. Apr 23, 2011
  14. Apr 22, 2011
    • Benjamin Kramer's avatar
      X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X &... · 4c816247
      Benjamin Kramer authored
      X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)
      
      This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
      uint64_t foo(uint64_t x) { return (x&1) << 42; }
      which used to compile into bloated code:
      	shlq	$42, %rdi               ## encoding: [0x48,0xc1,0xe7,0x2a]
      	movabsq	$4398046511104, %rax    ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
      	andq	%rdi, %rax              ## encoding: [0x48,0x21,0xf8]
      	ret                             ## encoding: [0xc3]
      
      with this patch we can fold the immediate into the and:
      	andq	$1, %rdi                ## encoding: [0x48,0x83,0xe7,0x01]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	shlq	$42, %rax               ## encoding: [0x48,0xc1,0xe0,0x2a]
      	ret                             ## encoding: [0xc3]
      
      It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
      that without making this code even more complicated. See the TODOs in the code.
      
      llvm-svn: 129990
      4c816247
  15. Feb 16, 2011
  16. Feb 13, 2011
    • Chris Lattner's avatar
      Enhance ComputeMaskedBits to know that aligned frameindexes · 46c01a30
      Chris Lattner authored
      have their low bits set to zero.  This allows us to optimize
      out explicit stack alignment code like in stack-align.ll:test4 when
      it is redundant.
      
      Doing this causes the code generator to start turning FI+cst into
      FI|cst all over the place, which is general goodness (that is the
      canonical form) except that various pieces of the code generator
      don't handle OR aggressively.  Fix this by introducing a new
      SelectionDAG::isBaseWithConstantOffset predicate, and using it
      in places that are looking for ADD(X,CST).  The ARM backend in
      particular was missing a lot of addressing mode folding opportunities
      around OR.
      
      llvm-svn: 125470
      46c01a30
  17. Jan 27, 2011
  18. Jan 16, 2011
  19. Jan 14, 2011
  20. Jan 06, 2011
  21. Dec 21, 2010
  22. Dec 05, 2010
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
  23. Oct 27, 2010
  24. Oct 06, 2010
  25. Sep 22, 2010
Loading