Skip to content
  1. Mar 13, 2009
    • Rafael Espindola's avatar
      Improve sext and zext of TLS variables. · 71144973
      Rafael Espindola authored
      llvm-svn: 66922
      71144973
    • Chris Lattner's avatar
      generalize this code so that fast isel handles integer truncates to i1, which · 3fb71c8f
      Chris Lattner authored
      codegen to the same thing as integer truncates to i8 (the top bits are 
      just undefined).  This implements rdar://6667338
      
      llvm-svn: 66902
      3fb71c8f
    • Bill Wendling's avatar
      These instructions have special lowering that may lower them to SSE · 798fd56d
      Bill Wendling authored
      instructions. Prevent that if we don't want implicit uses of SSE.
      
      llvm-svn: 66877
      798fd56d
    • Evan Cheng's avatar
      Fix some significant problems with constant pools that resulted in unnecessary... · 1fb8aedd
      Evan Cheng authored
      Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues.
      
      1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
      2. MachineConstantPool alignment field is also a log2 value.
      3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
      4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
      5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
      6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
      
      
      Solutions:
      1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
      2. MachineConstantPool alignment field is also changed to keep non-log2 value.
      3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
      4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
      5. Asm printer uses cheaper data structure to group constant pool entries.
      6. Asm printer compute entry offsets after grouping is done.
      7. Change JIT code to compute entry offsets on the fly.
      
      llvm-svn: 66875
      1fb8aedd
    • Chris Lattner's avatar
      generalize the previous code to use the full generality of LEA · 99cc1337
      Chris Lattner authored
      for i32/i64 expressions (we could also do i16 on cpus where
      i16 lea is fast, but I didn't add this).  On the example, we now
      generate:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$42, (%eax)
      	setl	%al
      	movzbl	%al, %eax
      	leal	4(%eax,%eax,8), %eax
      	ret
      
      instead of:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	movl	$4, %ecx
      	movl	$13, %eax
      	cmovg	%ecx, %eax
      	ret
      
      llvm-svn: 66869
      99cc1337
    • Chris Lattner's avatar
      optimize the case of cond ? 42 : 41 and friends. This compiles the · 4be6df5d
      Chris Lattner authored
      example to:
      
      _test:
      	movl	4(%esp), %eax
      	cmpl	$41, (%eax)
      	setg	%al
      	movzbl	%al, %eax
      	orl	$4294967294, %eax
      	ret
      
      instead of:
      
              movl    4(%esp), %eax
              cmpl    $41, (%eax)
      	movl	$4294967294, %ecx
      	movl	$4294967295, %eax
      	cmova	%ecx, %eax
      	ret
      
      which is smaller in code size and faster. rdar://6668608
      
      llvm-svn: 66868
      4be6df5d
    • Dan Gohman's avatar
      Enhance address-mode folding of ISD::ADD to handle cases where the · a1d92423
      Dan Gohman authored
      operands can't both be fully folded at the same time. For example,
      in the included testcase, a global variable is being added with
      an add of two values. The global variable wants RIP-relative
      addressing, so it can't share the address with another base
      register, but it's still possible to fold the initial add.
      
      llvm-svn: 66865
      a1d92423
  2. Mar 12, 2009
  3. Mar 11, 2009
  4. Mar 10, 2009
  5. Mar 08, 2009
  6. Mar 07, 2009
    • Duncan Sands's avatar
      Introduce new linkage types linkonce_odr, weak_odr, common_odr · 12da8ce3
      Duncan Sands authored
      and extern_weak_odr.  These are the same as the non-odr versions,
      except that they indicate that the global will only be overridden
      by an *equivalent* global.  In C, a function with weak linkage can
      be overridden by a function which behaves completely differently.
      This means that IP passes have to skip weak functions, since any
      deductions made from the function definition might be wrong, since
      the definition could be replaced by something completely different
      at link time.   This is not allowed in C++, thanks to the ODR
      (One-Definition-Rule): if a function is replaced by another at
      link-time, then the new function must be the same as the original
      function.  If a language knows that a function or other global can
      only be overridden by an equivalent global, it can give it the
      weak_odr linkage type, and the optimizers will understand that it
      is alright to make deductions based on the function body.  The
      code generators on the other hand map weak and weak_odr linkage
      to the same thing.
      
      llvm-svn: 66339
      12da8ce3
    • Dan Gohman's avatar
      Arithmetic instructions don't set EFLAGS bits OF and CF bits · ff659b5b
      Dan Gohman authored
      the same say the "test" instruction does in overflow cases,
      so eliminating the test is only safe when those bits aren't
      needed, as is the case for COND_E and COND_NE, or if it
      can be proven that no overflow will occur. For now, just
      restrict the optimization to COND_E and COND_NE and don't
      do any overflow analysis.
      
      llvm-svn: 66318
      ff659b5b
  7. Mar 05, 2009
  8. Mar 04, 2009
  9. Mar 03, 2009
  10. Feb 28, 2009
  11. Feb 27, 2009
    • Rafael Espindola's avatar
      Refactor TLS code and add some tests. The tests and expected results are: · 000421ea
      Rafael Espindola authored
       pic |  declaration | linkage  | visibility |
      
      !pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
       pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
      !pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
       pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic
      
      !pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
       pic |  declaration | external | hidden     | X                       | local dynamic
      !pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
       pic | !declaration | external | hidden     | X                       | local dynamic
      
      !pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
       pic |  declaration | internal | default    | X                       | local dynamic
      
      The ones marked with an X have not been implemented since local dynamic is not implemented.
      
      llvm-svn: 65632
      000421ea
  12. Feb 26, 2009
Loading