Skip to content
  1. Oct 11, 2012
  2. Oct 10, 2012
    • Nadav Rotem's avatar
      Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 17418964
      Nadav Rotem authored
      Original message:
      
      The attached is the fix to radar://11663049. The optimization can be outlined by following rules:
      
         (select (x != c), e, c) -> select (x != c), e, x),
         (select (x == c), c, e) -> select (x == c), x, e)
      where the <c> is an integer constant.
      
       The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
      however, conditional-move-from-register need only one instruction.
      
        While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.
      
        The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".
      
      llvm-svn: 165661
      17418964
    • Michael Liao's avatar
      Add support for FP_ROUND from v2f64 to v2f32 · e999b865
      Michael Liao authored
      - Due to the current matching vector elements constraints in
        ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
        v2f32) is scalarized. Add a customized v2f32 widening to convert it
        into a target-specific X86ISD::VFPROUND to work around this
        constraints.
      
      llvm-svn: 165631
      e999b865
    • Michael Liao's avatar
      Add alternative support for FP_ROUND from v2f32 to v2f64 · effae0c8
      Michael Liao authored
      - Due to the current matching vector elements constraints in ISD::FP_EXTEND,
        rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
        to convert it into a target-specific X86ISD::VFPEXT to work around this
        constraints. This patch also reverts a previous attempt to fix this issue by
        recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
        reduces the overhead of supporting non-power-2 vector FP extend.
      
      llvm-svn: 165625
      effae0c8
    • Evan Cheng's avatar
  3. Oct 09, 2012
  4. Oct 08, 2012
  5. Oct 04, 2012
  6. Sep 30, 2012
  7. Sep 26, 2012
  8. Sep 25, 2012
  9. Sep 21, 2012
  10. Sep 20, 2012
    • Michael Liao's avatar
      Re-work X86 code generation of atomic ops with spin-loop · 3237662b
      Michael Liao authored
      - Rewrite/merge pseudo-atomic instruction emitters to address the
        following issue:
        * Reduce one unnecessary load in spin-loop
      
          previously the spin-loop looks like
      
              thisMBB:
              newMBB:
                ld  t1 = [bitinstr.addr]
                op  t2 = t1, [bitinstr.val]
                not t3 = t2  (if Invert)
                mov EAX = t1
                lcs dest = [bitinstr.addr], t3  [EAX is implicit]
                bz  newMBB
                fallthrough -->nextMBB
      
          the 'ld' at the beginning of newMBB should be lift out of the loop
          as lcs (or CMPXCHG on x86) will load the current memory value into
          EAX. This loop is refined as:
      
              thisMBB:
                EAX = LOAD [MI.addr]
              mainMBB:
                t1 = OP [MI.val], EAX
                LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
                JNE mainMBB
              sinkMBB:
      
        * Remove immopc as, so far, all pseudo-atomic instructions has
          all-register form only, there is no immedidate operand.
      
        * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
          td
      
        * Fix issues in PR13458
      
      - Add comprehensive tests on atomic ops on various data types.
        NOTE: Some of them are turned off due to missing functionality.
      
      - Revise tests due to the new spin-loop generated.
      
      llvm-svn: 164281
      3237662b
  11. Sep 15, 2012
  12. Sep 13, 2012
  13. Sep 12, 2012
    • Michael Liao's avatar
      Fix PR11985 · abb87d48
      Michael Liao authored
          
      - BlockAddress has no support of BA + offset form and there is no way to
        propagate that offset into machine operand;
      - Add BA + offset support and a new interface 'getTargetBlockAddress' to
        simplify target block address forming;
      - All targets are modified to use new interface and X86 backend is enhanced to
        support BA + offset addressing.
      
      llvm-svn: 163743
      abb87d48
    • Craig Topper's avatar
      Indentation fixes. No functional change. · ad495964
      Craig Topper authored
      llvm-svn: 163682
      ad495964
  14. Sep 11, 2012
  15. Sep 10, 2012
  16. Sep 08, 2012
  17. Sep 06, 2012
  18. Sep 04, 2012
    • Preston Gurd's avatar
      Generic Bypass Slow Div · cdf540d5
      Preston Gurd authored
      - CodeGenPrepare pass for identifying div/rem ops
      - Backend specifies the type mapping using addBypassSlowDivType
      - Enabled only for Intel Atom with O2 32-bit -> 8-bit
      - Replace IDIV with instructions which test its value and use DIVB if the value
      is positive and less than 256.
      - In the case when the quotient and remainder of a divide are used a DIV
      and a REM instruction will be present in the IR. In the non-Atom case
      they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
      using the quotient and remainder from the first IDIV. However,
      due to this optimization CSE is not able to eliminate redundant
      IDIV instructions because they are located in different basic blocks.
      This is overcome by calculating both the quotient (DIV) and remainder (REM)
      in each basic block that is inserted by the optimization and reusing the result
      values when a subsequent DIV or REM instruction uses the same operands.
      - Test cases check for the presents of the optimization when calculating
      either the quotient, remainder,  or both.
      
      Patch by Tyler Nowicki!
      
      llvm-svn: 163150
      cdf540d5
    • Elena Demikhovsky's avatar
      This patch optimizes shuffle instruction - generates 2 instructions instead of 4. · cbe99bbb
      Elena Demikhovsky authored
      Since this specific shuffle is widely used in many workloads we have ~10% performance on them.
      
      shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
      
      vmovaps (%rdx), %ymm0
      vshufps $8, %ymm0, %ymm0, %ymm0
      vmovaps (%rcx), %ymm1
      vshufps $8, %ymm0, %ymm1, %ymm1
      vunpcklps       %ymm0, %ymm1, %ymm0
      
      vmovaps (%rcx), %ymm0
      vmovsldup       (%rdx), %ymm1
      vblendps        $85, %ymm0, %ymm1, %ymm0
      
      llvm-svn: 163134
      cbe99bbb
Loading