Skip to content
  1. Jan 11, 2012
    • Chandler Carruth's avatar
      Hoist the logic to transform shift+mask combinations into sub-register · 51d3076b
      Chandler Carruth authored
      extracts and scaled addressing modes into its own helper function. No
      functionality changed here, just hoisting and layout fixes falling out
      of that hoisting.
      
      llvm-svn: 147937
      51d3076b
    • Chandler Carruth's avatar
      Teach the X86 instruction selection to do some heroic transforms to · 55b2cdee
      Chandler Carruth authored
      detect a pattern which can be implemented with a small 'shl' embedded in
      the addressing mode scale. This happens in real code as follows:
      
        unsigned x = my_accelerator_table[input >> 11];
      
      Here we have some lookup table that we look into using the high bits of
      'input'. Each entity in the table is 4-bytes, which means this
      implicitly gets turned into (once lowered out of a GEP):
      
        *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
      
      The shift right followed by a shift left is canonicalized to a smaller
      shift right and masking off the low bits. That hides the shift right
      which x86 has an addressing mode designed to support. We now detect
      masks of this form, and produce the longer shift right followed by the
      proper addressing mode. In addition to saving a (rather large)
      instruction, this also reduces stalls in Intel chips on benchmarks I've
      measured.
      
      In order for all of this to work, one part of the DAG needs to be
      canonicalized *still further* than it currently is. This involves
      removing pointless 'trunc' nodes between a zextload and a zext. Without
      that, we end up generating spurious masks and hiding the pattern.
      
      llvm-svn: 147936
      55b2cdee
  2. Jan 09, 2012
    • Chandler Carruth's avatar
      Don't rely on the fact that shift values are never very large, and thus · c16622da
      Chandler Carruth authored
      this substraction will result in small negative numbers at worst which
      become very large positive numbers on assignment and are thus caught by
      the <=4 check on the next line. The >0 check clearly intended to catch
      these as negative numbers.
      
      Spotted by inspection, and impossible to trigger given the shift widths
      that can be used.
      
      llvm-svn: 147773
      c16622da
  3. Nov 16, 2011
  4. Nov 15, 2011
  5. Nov 03, 2011
  6. Oct 29, 2011
  7. Oct 28, 2011
    • Dan Gohman's avatar
      Reapply r143177 and r143179 (reverting r143188), with scheduler · 73057ad2
      Dan Gohman authored
      fixes: Use a separate register, instead of SP, as the
      calling-convention resource, to avoid spurious conflicts with
      actual uses of SP. Also, fix unscheduling of calling sequences,
      which can be triggered by pseudo-two-address dependencies.
      
      llvm-svn: 143206
      73057ad2
    • Duncan Sands's avatar
      Speculatively disable Dan's commits 143177 and 143179 to see if · 225a7037
      Duncan Sands authored
      it fixes the dragonegg self-host (it looks like gcc is miscompiled).
      Original commit messages:
      Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
      on every node as it legalizes them. This makes it easier to use
      hasOneUse() heuristics, since unneeded nodes can be removed from the
      DAG earlier.
      
      Make LegalizeOps visit the DAG in an operands-last order. It previously
      used operands-first, because LegalizeTypes has to go operands-first, and
      LegalizeTypes used to be part of LegalizeOps, but they're now split.
      The operands-last order is more natural for several legalization tasks.
      For example, it allows lowering code for nodes with floating-point or
      vector constants to see those constants directly instead of seeing the
      lowered form (often constant-pool loads). This makes some things
      somewhat more complicated today, though it ought to allow things to be
      simpler in the future. It also fixes some bugs exposed by Legalizing
      using RAUW aggressively.
      
      Remove the part of LegalizeOps that attempted to patch up invalid chain
      operands on libcalls generated by LegalizeTypes, since it doesn't work
      with the new LegalizeOps traversal order. Instead, define what
      LegalizeTypes is doing to be correct, and transfer the responsibility
      of keeping calls from having overlapping calling sequences into the
      scheduler.
      
      Teach the scheduler to model callseq_begin/end pairs as having a
      physical register definition/use to prevent calls from having
      overlapping calling sequences. This is also somewhat complicated, though
      there are ways it might be simplified in the future.
      
      This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
      Please direct high-level questions about this patch to management.
      
      Delete #if 0 code accidentally left in.
      
      llvm-svn: 143188
      225a7037
    • Dan Gohman's avatar
      Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW · 4db3f7dd
      Dan Gohman authored
      on every node as it legalizes them. This makes it easier to use
      hasOneUse() heuristics, since unneeded nodes can be removed from the
      DAG earlier.
      
      Make LegalizeOps visit the DAG in an operands-last order. It previously
      used operands-first, because LegalizeTypes has to go operands-first, and
      LegalizeTypes used to be part of LegalizeOps, but they're now split.
      The operands-last order is more natural for several legalization tasks.
      For example, it allows lowering code for nodes with floating-point or
      vector constants to see those constants directly instead of seeing the
      lowered form (often constant-pool loads). This makes some things
      somewhat more complicated today, though it ought to allow things to be
      simpler in the future. It also fixes some bugs exposed by Legalizing
      using RAUW aggressively.
      
      Remove the part of LegalizeOps that attempted to patch up invalid chain
      operands on libcalls generated by LegalizeTypes, since it doesn't work
      with the new LegalizeOps traversal order. Instead, define what
      LegalizeTypes is doing to be correct, and transfer the responsibility
      of keeping calls from having overlapping calling sequences into the
      scheduler.
      
      Teach the scheduler to model callseq_begin/end pairs as having a
      physical register definition/use to prevent calls from having
      overlapping calling sequences. This is also somewhat complicated, though
      there are ways it might be simplified in the future.
      
      This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
      Please direct high-level questions about this patch to management.
      
      llvm-svn: 143177
      4db3f7dd
  8. Oct 08, 2011
    • Jakob Stoklund Olesen's avatar
      Add TEST8ri_NOREX pseudo to constrain sub_8bit_hi copies. · 729abd36
      Jakob Stoklund Olesen authored
      In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
      instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
      target all GR8 registers, only those in GR8_NOREX.
      
      TO enforce this, we ensure that all instructions using the
      EXTRACT_SUBREG are GR8_NOREX constrained.
      
      This fixes PR11088.
      
      llvm-svn: 141499
      729abd36
  9. Aug 01, 2011
  10. Jul 13, 2011
  11. Jul 02, 2011
  12. Jun 30, 2011
  13. May 20, 2011
  14. May 17, 2011
  15. May 11, 2011
  16. Apr 23, 2011
  17. Apr 22, 2011
    • Benjamin Kramer's avatar
      X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X &... · 4c816247
      Benjamin Kramer authored
      X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)
      
      This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
      uint64_t foo(uint64_t x) { return (x&1) << 42; }
      which used to compile into bloated code:
      	shlq	$42, %rdi               ## encoding: [0x48,0xc1,0xe7,0x2a]
      	movabsq	$4398046511104, %rax    ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
      	andq	%rdi, %rax              ## encoding: [0x48,0x21,0xf8]
      	ret                             ## encoding: [0xc3]
      
      with this patch we can fold the immediate into the and:
      	andq	$1, %rdi                ## encoding: [0x48,0x83,0xe7,0x01]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	shlq	$42, %rax               ## encoding: [0x48,0xc1,0xe0,0x2a]
      	ret                             ## encoding: [0xc3]
      
      It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
      that without making this code even more complicated. See the TODOs in the code.
      
      llvm-svn: 129990
      4c816247
  18. Feb 16, 2011
  19. Feb 13, 2011
    • Chris Lattner's avatar
      Enhance ComputeMaskedBits to know that aligned frameindexes · 46c01a30
      Chris Lattner authored
      have their low bits set to zero.  This allows us to optimize
      out explicit stack alignment code like in stack-align.ll:test4 when
      it is redundant.
      
      Doing this causes the code generator to start turning FI+cst into
      FI|cst all over the place, which is general goodness (that is the
      canonical form) except that various pieces of the code generator
      don't handle OR aggressively.  Fix this by introducing a new
      SelectionDAG::isBaseWithConstantOffset predicate, and using it
      in places that are looking for ADD(X,CST).  The ARM backend in
      particular was missing a lot of addressing mode folding opportunities
      around OR.
      
      llvm-svn: 125470
      46c01a30
  20. Jan 27, 2011
  21. Jan 16, 2011
  22. Jan 14, 2011
  23. Jan 06, 2011
  24. Dec 21, 2010
  25. Dec 05, 2010
    • Chris Lattner's avatar
      it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0
      Chris Lattner authored
      backend that they were all implemented except umul.  This one fell back
      to the default implementation that did a hi/lo multiply and compared the
      top.  Fix this to check the overflow flag that the 'mul' instruction
      sets, so we can avoid an explicit test.  Now we compile:
      
      void *func(long count) {
            return new int[count];
      }
      
      into:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
      	testb	%cl, %cl                ## encoding: [0x84,0xc9]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      instead of:
      
      __Z4funcl:                              ## @_Z4funcl
      	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
      	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
      	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
      	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
      	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
      	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
      	jmp	__Znam                  ## TAILCALL
      
      Other than the silly seto+test, this is using the o bit directly, so it's going in the right
      direction.
      
      llvm-svn: 120935
      364bb0a0
  26. Oct 27, 2010
  27. Oct 06, 2010
  28. Sep 22, 2010
Loading