Skip to content
  1. Aug 25, 2016
  2. Aug 24, 2016
    • Kyle Butt's avatar
      CodeGen: If Convert blocks that would form a diamond when tail-merged. · a8c7371d
      Kyle Butt authored
      The following function currently relies on tail-merging for if
      conversion to succeed. The common tail of cond_true and cond_false is
      extracted, and this then forms a diamond pattern that can be
      successfully if converted.
      
      If this block does not get extracted, either because tail-merging is
      disabled or the threshold is higher, we should still recognize this
      pattern and if-convert it.
      
      Fixed a regression in the original commit. Need to un-reverse branches after
      reversing them, or other conversions go awry.
      
      define i32 @t2(i32 %a, i32 %b) nounwind {
      entry:
              %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
              br i1 %tmp1434, label %bb17, label %bb.outer
      
      bb.outer:               ; preds = %cond_false, %entry
              %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
              %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
              br label %bb
      
      bb:             ; preds = %cond_true, %bb.outer
              %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
              %tmp. = sub i32 0, %b_addr.021.0.ph
              %tmp.40 = mul i32 %indvar, %tmp.
              %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
              %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
              br i1 %tmp3, label %cond_true, label %cond_false
      
      cond_true:              ; preds = %bb
              %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
              %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
              %indvar.next = add i32 %indvar, 1
              br i1 %tmp1437, label %bb17, label %bb
      
      cond_false:             ; preds = %bb
              %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
              %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
              br i1 %tmp14, label %bb17, label %bb.outer
      
      bb17:           ; preds = %cond_false, %cond_true, %entry
              %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
              ret i32 %a_addr.026.1
      }
      
      Without tail-merging or diamond-tail if conversion:
      LBB1_1:                                 @ %bb
                                              @ =>This Inner Loop Header: Depth=1
              cmp     r0, r1
              ble     LBB1_3
      @ BB#2:                                 @ %cond_true
                                              @   in Loop: Header=BB1_1 Depth=1
              subs    r0, r0, r1
              cmp     r1, r0
              it      ne
              cmpne   r0, r1
              bgt     LBB1_4
      LBB1_3:                                 @ %cond_false
                                              @   in Loop: Header=BB1_1 Depth=1
              subs    r1, r1, r0
              cmp     r1, r0
              bne     LBB1_1
      LBB1_4:                                 @ %bb17
              bx      lr
      
      With diamond-tail if conversion, but without tail-merging:
      @ BB#0:                                 @ %entry
              cmp     r0, r1
              it      eq
              bxeq    lr
      LBB1_1:                                 @ %bb
                                              @ =>This Inner Loop Header: Depth=1
              cmp     r0, r1
              ite     le
              suble   r1, r1, r0
              subgt   r0, r0, r1
              cmp     r1, r0
              bne     LBB1_1
      @ BB#2:                                 @ %bb17
              bx      lr
      
      llvm-svn: 279671
      a8c7371d
    • Kyle Butt's avatar
      IfConversion: Rescan diamonds. · 6262ca34
      Kyle Butt authored
      The cost of predicating a diamond is only the instructions that are not shared
      between the two branches. Additionally If a predicate clobbering instruction
      occurs in the shared portion of the branches (e.g. a cond move), it may still
      be possible to if convert the sub-cfg. This change handles these two facts by
      rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
      predication cost and whether both blocks are pred-clobbering.
      
      Fixed 2 bugs before recommitting. Branch instructions must be compared and found
      identical before diamond conversion. Also, predicate-clobbering instructions in
      the shared prefix disqualifies a potential diamond conversion. Includes tests
      for both.
      
      llvm-svn: 279670
      6262ca34
    • Tim Northover's avatar
      ARM: don't diagnose cbz/cbnz to Thumb functions. · 9c3633f5
      Tim Northover authored
      A branch-distance to a Thumb function shouldn't be forced to be odd for
      CBZ/CBNZ instructions because (assuming it's within range), it's going to be a
      valid, even offset.
      
      llvm-svn: 279665
      9c3633f5
    • Changpeng Fang's avatar
      AMDGCN/SI: Implement readlane/readfirstlane intrinsics · 75f0968b
      Changpeng Fang authored
      Summary:
        This patch implements readlane/readfirstlane intrinsics.
      TODO: need to define a new register class to consider the case
      that the source could be a vector register or M0.
      
      Reviewed by:
        arsenm and tstellarAMD
      
      Differential Revision:
        http://reviews.llvm.org/D22489
      
      llvm-svn: 279660
      75f0968b
    • Rafael Espindola's avatar
      Use isTargetMachO instead of isTargetDarwin. · 70c6a397
      Rafael Espindola authored
      llvm-svn: 279655
      70c6a397
    • Simon Pilgrim's avatar
      [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support · e14653e1
      Simon Pilgrim authored
      These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad
      
      llvm-svn: 279652
      e14653e1
Loading