Skip to content
  1. Oct 19, 2016
  2. Oct 18, 2016
    • Dehao Chen's avatar
      Using branch probability to guide critical edge splitting. · ea62ae98
      Dehao Chen authored
      Summary:
      The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
      
      The performance impact on speccpu2006 on Intel sandybridge machines:
      
      spec/2006/fp/C++/444.namd                  25.3  +0.26%
      spec/2006/fp/C++/447.dealII               45.96  -0.10%
      spec/2006/fp/C++/450.soplex               41.97  +1.49%
      spec/2006/fp/C++/453.povray               36.83  -0.96%
      spec/2006/fp/C/433.milc                   23.81  +0.32%
      spec/2006/fp/C/470.lbm                    41.17  +0.34%
      spec/2006/fp/C/482.sphinx3                48.13  +0.69%
      spec/2006/int/C++/471.omnetpp             22.45  +3.25%
      spec/2006/int/C++/473.astar               21.35  -2.06%
      spec/2006/int/C++/483.xalancbmk           36.02  -2.39%
      spec/2006/int/C/400.perlbench              33.7  -0.17%
      spec/2006/int/C/401.bzip2                  22.9  +0.52%
      spec/2006/int/C/403.gcc                   32.42  -0.54%
      spec/2006/int/C/429.mcf                   39.59  +0.19%
      spec/2006/int/C/445.gobmk                 26.98  -0.00%
      spec/2006/int/C/456.hmmer                 24.52  -0.18%
      spec/2006/int/C/458.sjeng                 28.26  +0.02%
      spec/2006/int/C/462.libquantum            55.44  +3.74%
      spec/2006/int/C/464.h264ref               46.67  -0.39%
      
      geometric mean                                   +0.20%
      
      Manually checked 473 and 471 to verify the diff is in the noise range.
      
      Reviewers: rengolin, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D24818
      
      llvm-svn: 284541
      ea62ae98
  3. Aug 12, 2016
    • Wei Mi's avatar
      Recommit 'Remove the restriction that MachineSinking is now stopped by · 7e103d92
      Wei Mi authored
      "insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
      adjusting some unittest checks.
      
      This is to solve PR28852. The restriction was added at 2010 to make better register
      coalescing. We assumed that it was not necessary any more. Testing results on x86
      supported the assumption.
      
      We will look closely to any performance impact it will bring and will be prepared
      to help analyzing performance problem found on other architectures.
      
      Differential Revision: https://reviews.llvm.org/D23210
      
      llvm-svn: 278466
      7e103d92
  4. Jul 17, 2016
  5. Jul 09, 2016
    • Matthias Braun's avatar
      VirtRegMap: Replace some identity copies with KILL instructions. · 152e7c8b
      Matthias Braun authored
      An identity COPY like this:
         %AL = COPY %AL, %EAX<imp-def>
      has no semantic effect, but encodes liveness information: Further users
      of %EAX only depend on this instruction even though it does not define
      the full register.
      
      Replace the COPY with a KILL instruction in those cases to maintain this
      liveness information. (This reverts a small part of r238588 but this
      time adds a comment explaining why a KILL instruction is useful).
      
      llvm-svn: 274952
      152e7c8b
  6. Apr 25, 2016
  7. Nov 19, 2015
    • Sanjay Patel's avatar
      [CGP] despeculate expensive cttz/ctlz intrinsics · 4699b8ab
      Sanjay Patel authored
      This is another step towards allowing SimplifyCFG to speculate harder, but then have 
      CGP clean things up if the target doesn't like it.
      
      Previous patches in this series:
      http://reviews.llvm.org/D12882
      http://reviews.llvm.org/D13297
      
      D13297 should catch most expensive ops, but speculation of cttz/ctlz requires special
      handling because of weirdness in the intrinsic definition for handling a zero input 
      (that definition can probably be blamed on x86).
      
      For example, if we have the usual speculated-by-select expensive op pattern like this:
      
        %tobool = icmp eq i64 %A, 0
        %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)   ; is_zero_undef == true
        %cond = select i1 %tobool, i64 64, i64 %0
        ret i64 %cond
      
      There's an instcombine that will turn it into:
      
        %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 false)   ; is_zero_undef == false
      
      This CGP patch is looking for that case and despeculating it back into:
      
        entry:
          %tobool = icmp eq i64 %A, 0
          br i1 %tobool, label %cond.end, label %cond.true
      
        cond.true:
          %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)    ; is_zero_undef == true
          br label %cond.end
      
        cond.end:
          %cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
          ret i64 %cond
      
      This unfortunately may lead to poorer codegen (see the changes in the existing x86 test), 
      but if we increase speculation in SimplifyCFG (the next step in this patch series), then
      we should avoid those kinds of cases in the first place.
      
      The need for this patch was originally mentioned here:
      http://reviews.llvm.org/D7506
      with follow-up here:
      http://reviews.llvm.org/D7554
      
      Differential Revision: http://reviews.llvm.org/D14630
      
      llvm-svn: 253573
      4699b8ab
  8. Nov 12, 2015
  9. Jul 14, 2013
    • Stephen Lin's avatar
      Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to... · d24ab20e
      Stephen Lin authored
      Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
      
      This update was done with the following bash script:
      
        find test/CodeGen -name "*.ll" | \
        while read NAME; do
          echo "$NAME"
          if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
            TEMP=`mktemp -t temp`
            cp $NAME $TEMP
            sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
            while read FUNC; do
              sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
            done
            sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
            sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
            sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
            sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
            mv $TEMP $NAME
          fi
        done
      
      llvm-svn: 186280
      d24ab20e
  10. Dec 24, 2011
    • Chandler Carruth's avatar
      Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the · a3d54fe0
      Chandler Carruth authored
      LZCNT instructions are available. Force promotion to i32 to get
      a smaller encoding since the fix-ups necessary are just as complex for
      either promoted type
      
      We can't do standard promotion for CTLZ when lowering through BSR
      because it results in poor code surrounding the 'xor' at the end of this
      instruction. Essentially, if we promote the entire CTLZ node to i32, we
      end up doing the xor on a 32-bit CTLZ implementation, and then
      subtracting appropriately to get back to an i8 value. Instead, our
      custom logic just uses the knowledge of the incoming size to compute
      a perfect xor. I'd love to know of a way to fix this, but so far I'm
      drawing a blank. I suspect the legalizer could be more clever and/or it
      could collude with the DAG combiner, but how... ;]
      
      llvm-svn: 147251
      a3d54fe0
    • Chandler Carruth's avatar
      Add systematic testing for cttz as well, and fix the bug I spotted by · 38ce2445
      Chandler Carruth authored
      inspection earlier.
      
      llvm-svn: 147250
      38ce2445
    • Chandler Carruth's avatar
      103ca80f
    • Chandler Carruth's avatar
      Tidy up this rather crufty test. Put the declarations at the top to make · 44cf0722
      Chandler Carruth authored
      my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
      nounwind. Remove stray uses comments. Name things semantically rather
      than tN so that adding a new test in the middle doesn't cause pain, and
      so that new tests can be grouped semantically.
      
      This exposes how little systematic testing is going on here. I noticed
      this by finding several bugs via inspection and wondering why this test
      wasn't catching any of them. =[
      
      llvm-svn: 147248
      44cf0722
    • Chandler Carruth's avatar
      Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the · 7e9453e9
      Chandler Carruth authored
      X86ISelLowering C++ code. Because this is lowered via an xor wrapped
      around a bsr, we want the dagcombine which runs after isel lowering to
      have a chance to clean things up. In particular, it is very common to
      see code which looks like:
      
        (sizeof(x)*8 - 1) ^ __builtin_clz(x)
      
      Which is trying to compute the most significant bit of 'x'. That's
      actually the value computed directly by the 'bsr' instruction, but if we
      match it too late, we'll get completely redundant xor instructions.
      
      The more naive code for the above (subtracting rather than using an xor)
      still isn't handled correctly due to the dagcombine getting confused.
      
      Also, while here fix an issue spotted by inspection: we should have been
      expanding the zero-undef variants to the normal variants when there is
      an 'lzcnt' instruction. Do so, and test for this. We don't want to
      generate unnecessary 'bsr' instructions.
      
      These two changes fix some regressions in encoding and decoding
      benchmarks. However, there is still a *lot* to be improve on in this
      type of code.
      
      llvm-svn: 147244
      7e9453e9
  11. Dec 20, 2011
    • Chandler Carruth's avatar
      Begin teaching the X86 target how to efficiently codegen patterns that · 24680c24
      Chandler Carruth authored
      use the zero-undefined variants of CTTZ and CTLZ. These are just simple
      patterns for now, there is more to be done to make real world code using
      these constructs be optimized and codegen'ed properly on X86.
      
      The existing tests are spiffed up to check that we no longer generate
      unnecessary cmov instructions, and that we generate the very important
      'xor' to transform bsr which counts the index of the most significant
      one bit to the number of leading (most significant) zero bits. Also they
      now check that when the variant with defined zero result is used, the
      cmov is still produced.
      
      llvm-svn: 146974
      24680c24
  12. Dec 12, 2011
    • Chandler Carruth's avatar
      Manually upgrade the test suite to specify the flag to cttz and ctlz. · 6b0e34c4
      Chandler Carruth authored
      I followed three heuristics for deciding whether to set 'true' or
      'false':
      
      - Everything target independent got 'true' as that is the expected
        common output of the GCC builtins.
      - If the target arch only has one way of implementing this operation,
        set the flag in the way that exercises the most of codegen. For most
        architectures this is also the likely path from a GCC builtin, with
        'true' being set. It will (eventually) require lowering away that
        difference, and then lowering to the architecture's operation.
      - Otherwise, set the flag differently dependending on which target
        operation should be tested.
      
      Let me know if anyone has any issue with this pattern or would like
      specific tests of another form. This should allow the x86 codegen to
      just iteratively improve as I teach the backend how to differentiate
      between the two forms, and everything else should remain exactly the
      same.
      
      llvm-svn: 146370
      6b0e34c4
  13. May 24, 2011
  14. Mar 14, 2010
  15. Sep 09, 2009
  16. Dec 14, 2007
Loading