Skip to content
  1. Mar 27, 2009
  2. Mar 13, 2009
    • Evan Cheng's avatar
      Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd
      Evan Cheng authored
      Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
      
      1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
      2. MachineConstantPool alignment field is also a log2 value.
      3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
      4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
      5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
      6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
      
      
      Solutions:
      1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
      2. MachineConstantPool alignment field is also changed to keep non-log2 value.
      3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
      4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
      5. Asm printer uses cheaper data structure to group constant pool entries.
      6. Asm printer compute entry offsets after grouping is done.
      7. Change JIT code to compute entry offsets on the fly.
      
      llvm-svn: 66875
      1fb8aedd
  3. Mar 04, 2009
  4. Feb 22, 2009
  5. Feb 18, 2009
  6. Feb 13, 2009
  7. Feb 11, 2009
  8. Feb 10, 2009
  9. Feb 09, 2009
  10. Feb 06, 2009
  11. Feb 03, 2009
  12. Jan 20, 2009
  13. Jan 15, 2009
  14. Jan 09, 2009
  15. Jan 07, 2009
  16. Jan 05, 2009
  17. Dec 23, 2008
  18. Dec 18, 2008
  19. Dec 05, 2008
    • Evan Cheng's avatar
      Reason #3 from 60595 doesn't hold true. If we can fold a PIC load from... · 43c08918
      Evan Cheng authored
      Reason #3 from 60595 doesn't hold true. If we can fold a PIC load from constpool into a use, the rewrite happens at time of spill (not in VirtRegMap). Later on, if the GlobalBaseReg is spilled, the spiller can see the use uses GlobalBaseReg and do the right thing.
      
      llvm-svn: 60596
      43c08918
    • Evan Cheng's avatar
      Effectively undo 60461 in PIC mode which simply transform V_SET0 /... · fd8c4d59
      Evan Cheng authored
      Effectively undo 60461 in PIC mode which simply transform V_SET0 / V_SETALLONES into a load from constpool in order to fold into restores. This is not safe to do when PIC base is being used for a number of reasons:
      1. GlobalBaseReg may have been spilled.
      2. It may not be live at the use.
      3. Spiller doesn't know this is happening so it won't prevent GlobalBaseReg from being spilled later (That by itself is a nasty hack. It's needed because we don't insert the reload until later).
      
      llvm-svn: 60595
      fd8c4d59
  20. Dec 03, 2008
  21. Dec 02, 2008
  22. Nov 26, 2008
  23. Nov 18, 2008
  24. Oct 27, 2008
  25. Oct 25, 2008
  26. Oct 21, 2008
    • Dan Gohman's avatar
      Optimized FCMP_OEQ and FCMP_UNE for x86. · 97d95d6d
      Dan Gohman authored
      Where previously LLVM might emit code like this:
      
              ucomisd %xmm1, %xmm0
              setne   %al
              setp    %cl
              orb     %al, %cl
              jne     .LBB4_2
      
      it now emits this:
      
              ucomisd %xmm1, %xmm0
              jne     .LBB4_2
              jp      .LBB4_2
      
      It has fewer instructions and uses fewer registers, but it does
      have more branches. And in the case that this code is followed by
      a non-fallthrough edge, it may be followed by a jmp instruction,
      resulting in three branch instructions in sequence. Some effort
      is made to avoid this situation.
      
      To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
      FCMP_UNE in lowered form, and replace them with code that emits
      two branches, except in the case where it would require converting
      a fall-through edge to an explicit branch.
      
      Also, X86InstrInfo.cpp's branch analysis and transform code now
      knows now to handle blocks with multiple conditional branches. It
      uses loops instead of having fixed checks for up to two
      instructions. It can now analyze and transform code generated
      from FCMP_OEQ and FCMP_UNE.
      
      llvm-svn: 57873
      97d95d6d
    • Dan Gohman's avatar
      When the coalescer is doing rematerializing, have it remove · c835458d
      Dan Gohman authored
      the copy instruction from the instruction list before asking the
      target to create the new instruction. This gets the old instruction
      out of the way so that it doesn't interfere with the target's
      rematerialization code. In the case of x86, this helps it find
      more cases where EFLAGS is not live.
      
      Also, in the X86InstrInfo.cpp, teach isSafeToClobberEFLAGS to check
      to see if it reached the end of the block after scanning each
      instruction, instead of just before. This lets it notice when the
      end of the block is only two instructions away, without doing any
      additional scanning.
      
      These changes allow rematerialization to clobber EFLAGS in more
      cases, for example using xor instead of mov to set the return value
      to zero in the included testcase.
      
      llvm-svn: 57872
      c835458d
  27. Oct 17, 2008
    • Dan Gohman's avatar
      Define patterns for shld and shrd that match immediate · a39b0a1f
      Dan Gohman authored
      shift counts, and patterns that match dynamic shift counts
      when the subtract is obscured by a truncate node.
      
      Add DAGCombiner support for recognizing rotate patterns
      when the shift counts are defined by truncate nodes.
      
      Fix and simplify the code for commuting shld and shrd
      instructions to work even when the given instruction doesn't
      have a parent, and when the caller needs a new instruction.
      
      These changes allow LLVM to use the shld, shrd, rol, and ror
      instructions on x86 to replace equivalent code using two
      shifts and an or in many more cases.
      
      llvm-svn: 57662
      a39b0a1f
Loading