Skip to content
  1. Dec 27, 2012
  2. Dec 21, 2012
  3. Dec 19, 2012
  4. Dec 17, 2012
  5. Dec 15, 2012
  6. Dec 12, 2012
  7. Dec 11, 2012
  8. Dec 09, 2012
  9. Dec 08, 2012
    • Chandler Carruth's avatar
      Revert the patches adding a popcount loop idiom recognition pass. · 91e47532
      Chandler Carruth authored
      There are still bugs in this pass, as well as other issues that are
      being worked on, but the bugs are crashers that occur pretty easily in
      the wild. Test cases have been sent to the original commit's review
      thread.
      
      This reverts the commits:
        r169671: Fix a logic error.
        r169604: Move the popcnt tests to an X86 subdirectory.
        r168931: Initial commit adding the pass.
      
      llvm-svn: 169683
      91e47532
  10. Dec 06, 2012
    • Evan Cheng's avatar
      Replace r169459 with something safer. Rather than having computeMaskedBits to · 9ec512d7
      Evan Cheng authored
      understand target implementation of any_extend / extload, just generate
      zero_extend in place of any_extend for liveouts when the target knows the
      zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
      
      rdar://12771555
      
      llvm-svn: 169536
      9ec512d7
    • Evan Cheng's avatar
      Let targets provide hooks that compute known zero and ones for any_extend · 5213139f
      Evan Cheng authored
      and extload's. If they are implemented as zero-extend, or implicitly
      zero-extend, then this can enable more demanded bits optimizations. e.g.
      
      define void @foo(i16* %ptr, i32 %a) nounwind {
      entry:
        %tmp1 = icmp ult i32 %a, 100
        br i1 %tmp1, label %bb1, label %bb2
      bb1:
        %tmp2 = load i16* %ptr, align 2
        br label %bb2
      bb2:
        %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ]
        %cmp = icmp ult i16 %tmp3, 24
        br i1 %cmp, label %bb3, label %exit
      bb3:
        call void @bar() nounwind
        br label %exit
      exit:
        ret void
      }
      
      This compiles to the followings before:
              push    {lr}
              mov     r2, #0
              cmp     r1, #99
              bhi     LBB0_2
      @ BB#1:                                 @ %bb1
              ldrh    r2, [r0]
      LBB0_2:                                 @ %bb2
              uxth    r0, r2
              cmp     r0, #23
              bhi     LBB0_4
      @ BB#3:                                 @ %bb3
              bl      _bar
      LBB0_4:                                 @ %exit
              pop     {lr}
              bx      lr
      
      The uxth is not needed since ldrh implicitly zero-extend the high bits. With
      this change it's eliminated.
      
      rdar://12771555
      
      llvm-svn: 169459
      5213139f
  11. Dec 05, 2012
  12. Dec 04, 2012
  13. Nov 29, 2012
    • Shuxin Yang's avatar
      rdar://12100355 (part 1) · abcc3704
      Shuxin Yang authored
      This revision attempts to recognize following population-count pattern:
      
       while(a) { c++; ... ; a &= a - 1; ... },
        where <c> and <a>could be used multiple times in the loop body.
      
       TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent 
      instruction sequence, which need to be improved in the following commits.
      
      Reviewed by Nadav, really appreciate!
      
      llvm-svn: 168931
      abcc3704
  14. Nov 11, 2012
  15. Nov 10, 2012
  16. Nov 08, 2012
    • Michael Liao's avatar
      Add support of RTM from TSX extension · 73cffddb
      Michael Liao authored
      - Add RTM code generation support throught 3 X86 intrinsics:
        xbegin()/xend() to start/end a transaction region, and xabort() to abort a
        tranaction region
      
      llvm-svn: 167573
      73cffddb
  17. Nov 06, 2012
  18. Nov 03, 2012
  19. Oct 31, 2012
  20. Oct 30, 2012
  21. Oct 23, 2012
  22. Oct 19, 2012
    • Michael Liao's avatar
      Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 · 4b7ccfca
      Michael Liao authored
      - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
        sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
        INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
        (so far 1) elements being inserted.
      
      llvm-svn: 166288
      4b7ccfca
  23. Oct 16, 2012
    • Michael Liao's avatar
      Support v8f32 to v8i8/vi816 conversion through custom lowering · 02ca3454
      Michael Liao authored
      - Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
        vector element-wise widening) to reduce DAG combiner and its overhead added
        in X86 backend.
      
      llvm-svn: 166036
      02ca3454
    • Michael Liao's avatar
      Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a
      Michael Liao authored
      - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
        used as a light-weight replacement of setjmp/longjmp which are used to
        implementation continuation, user-level threading, and etc. The support added
        in this patch ONLY addresses this usage and is NOT intended to support SjLj
        exception handling as zero-cost DWARF exception handling is used by default
        in X86.
      
      llvm-svn: 165989
      97bf363a
  24. Oct 10, 2012
    • Michael Liao's avatar
      Add support for FP_ROUND from v2f64 to v2f32 · e999b865
      Michael Liao authored
      - Due to the current matching vector elements constraints in
        ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
        v2f32) is scalarized. Add a customized v2f32 widening to convert it
        into a target-specific X86ISD::VFPROUND to work around this
        constraints.
      
      llvm-svn: 165631
      e999b865
    • Michael Liao's avatar
      Add alternative support for FP_ROUND from v2f32 to v2f64 · effae0c8
      Michael Liao authored
      - Due to the current matching vector elements constraints in ISD::FP_EXTEND,
        rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
        to convert it into a target-specific X86ISD::VFPEXT to work around this
        constraints. This patch also reverts a previous attempt to fix this issue by
        recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
        reduces the overhead of supporting non-power-2 vector FP extend.
      
      llvm-svn: 165625
      effae0c8
Loading