Commits · 62d0e64e9e6b2b630d21f6fb9e6d6dd2183770ea · Roger Ferrer / llvm-epi

Oct 19, 2016
- revert r284541. · 62d0e64e
  Dehao Chen authored Oct 18, 2016
```
llvm-svn: 284544
```
  62d0e64e
Oct 18, 2016

Using branch probability to guide critical edge splitting. · ea62ae98

Dehao Chen authored Oct 18, 2016

Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.

The performance impact on speccpu2006 on Intel sandybridge machines:

spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%

geometric mean +0.20%

Manually checked 473 and 471 to verify the diff is in the noise range.

Reviewers: rengolin, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24818

llvm-svn: 284541

ea62ae98

Aug 12, 2016

Recommit 'Remove the restriction that MachineSinking is now stopped by · 7e103d92

Wei Mi authored Aug 12, 2016

"insert_subreg, subreg_to_reg, and reg_sequence" instructions' after
adjusting some unittest checks.

This is to solve PR28852. The restriction was added at 2010 to make better register
coalescing. We assumed that it was not necessary any more. Testing results on x86
supported the assumption.

We will look closely to any performance impact it will bring and will be prepared
to help analyzing performance problem found on other architectures.

Differential Revision: https://reviews.llvm.org/D23210

llvm-svn: 278466

7e103d92

Jul 17, 2016
- [X86] Regenerated ctlz/cttz scalar tests for 32/64-bit targets with/without LZCNT/TZCNT support · d1e941ae
  Simon Pilgrim authored Jul 17, 2016
```
llvm-svn: 275710
```
  d1e941ae
Jul 09, 2016

VirtRegMap: Replace some identity copies with KILL instructions. · 152e7c8b

Matthias Braun authored Jul 09, 2016

An identity COPY like this:
   %AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.

Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).

llvm-svn: 274952

152e7c8b

Apr 25, 2016

[X86] Add a complete set of tests for all operand sizes of cttz/ctlz with and... · 61a14911

Craig Topper authored Apr 25, 2016

[X86] Add a complete set of tests for all operand sizes of cttz/ctlz with and without zero undef being lowered to bsf/bsr.

llvm-svn: 267373

61a14911

Nov 19, 2015

[CGP] despeculate expensive cttz/ctlz intrinsics · 4699b8ab

Sanjay Patel authored Nov 19, 2015

This is another step towards allowing SimplifyCFG to speculate harder, but then have 
CGP clean things up if the target doesn't like it.

Previous patches in this series:
http://reviews.llvm.org/D12882
http://reviews.llvm.org/D13297

D13297 should catch most expensive ops, but speculation of cttz/ctlz requires special
handling because of weirdness in the intrinsic definition for handling a zero input 
(that definition can probably be blamed on x86).

For example, if we have the usual speculated-by-select expensive op pattern like this:

  %tobool = icmp eq i64 %A, 0
  %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)   ; is_zero_undef == true
  %cond = select i1 %tobool, i64 64, i64 %0
  ret i64 %cond

There's an instcombine that will turn it into:

  %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 false)   ; is_zero_undef == false

This CGP patch is looking for that case and despeculating it back into:

  entry:
    %tobool = icmp eq i64 %A, 0
    br i1 %tobool, label %cond.end, label %cond.true

  cond.true:
    %0 = tail call i64 @llvm.cttz.i64(i64 %A, i1 true)    ; is_zero_undef == true
    br label %cond.end

  cond.end:
    %cond = phi i64 [ %0, %cond.true ], [ 64, %entry ]
    ret i64 %cond

This unfortunately may lead to poorer codegen (see the changes in the existing x86 test), 
but if we increase speculation in SimplifyCFG (the next step in this patch series), then
we should avoid those kinds of cases in the first place.

The need for this patch was originally mentioned here:
http://reviews.llvm.org/D7506
with follow-up here:
http://reviews.llvm.org/D7554

Differential Revision: http://reviews.llvm.org/D14630

llvm-svn: 253573

4699b8ab

Nov 12, 2015
- specify triple and tighten checks using update_llc_test_checks.py · fbaf5a95
  Sanjay Patel authored Nov 12, 2015
```
llvm-svn: 252962
```
  fbaf5a95
Jul 14, 2013

Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to... · d24ab20e

Stephen Lin authored Jul 14, 2013

Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.

This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280

d24ab20e

Dec 24, 2011

Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the · a3d54fe0

Chandler Carruth authored Dec 24, 2011

LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

llvm-svn: 147251

a3d54fe0

Add systematic testing for cttz as well, and fix the bug I spotted by · 38ce2445
Chandler Carruth authored Dec 24, 2011
```
inspection earlier.

llvm-svn: 147250
```
38ce2445
Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test. · 103ca80f
Chandler Carruth authored Dec 24, 2011
```
llvm-svn: 147249
```
103ca80f

Tidy up this rather crufty test. Put the declarations at the top to make · 44cf0722

Chandler Carruth authored Dec 24, 2011

my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.

This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[

llvm-svn: 147248

44cf0722

Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the · 7e9453e9

Chandler Carruth authored Dec 24, 2011

X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244

7e9453e9

Dec 20, 2011

Begin teaching the X86 target how to efficiently codegen patterns that · 24680c24

Chandler Carruth authored Dec 20, 2011

use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974

24680c24

Dec 12, 2011

Manually upgrade the test suite to specify the flag to cttz and ctlz. · 6b0e34c4

Chandler Carruth authored Dec 12, 2011

I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

llvm-svn: 146370

6b0e34c4

May 24, 2011

- Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is · 88f9137f

Evan Cheng authored May 24, 2011

  non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
  when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.

rdar://9490949

llvm-svn: 131948

88f9137f

Mar 14, 2010
- filecheckize a test and mark these wiht a cpu so it passes · d03a956a
  Chris Lattner authored Mar 14, 2010
```
on hosts without cmovs.

llvm-svn: 98521
```
  d03a956a
Sep 09, 2009
- Eliminate more uses of llvm-as and llvm-dis. · 40503396
  Dan Gohman authored Sep 08, 2009
```
llvm-svn: 81290
```
  40503396
Dec 14, 2007
- Fix ctlz and cttz. llvm definition requires them to return number of bits in... · 0e640812
  Evan Cheng authored Dec 14, 2007
```
Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero.

llvm-svn: 45029
```
  0e640812
- Implement ctlz and cttz with bsr and bsf. · e9fbc3f0
  Evan Cheng authored Dec 14, 2007
```
llvm-svn: 45024
```
  e9fbc3f0