Skip to content
  1. Oct 19, 2016
  2. Oct 18, 2016
    • Dehao Chen's avatar
      Using branch probability to guide critical edge splitting. · ea62ae98
      Dehao Chen authored
      Summary:
      The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
      
      The performance impact on speccpu2006 on Intel sandybridge machines:
      
      spec/2006/fp/C++/444.namd                  25.3  +0.26%
      spec/2006/fp/C++/447.dealII               45.96  -0.10%
      spec/2006/fp/C++/450.soplex               41.97  +1.49%
      spec/2006/fp/C++/453.povray               36.83  -0.96%
      spec/2006/fp/C/433.milc                   23.81  +0.32%
      spec/2006/fp/C/470.lbm                    41.17  +0.34%
      spec/2006/fp/C/482.sphinx3                48.13  +0.69%
      spec/2006/int/C++/471.omnetpp             22.45  +3.25%
      spec/2006/int/C++/473.astar               21.35  -2.06%
      spec/2006/int/C++/483.xalancbmk           36.02  -2.39%
      spec/2006/int/C/400.perlbench              33.7  -0.17%
      spec/2006/int/C/401.bzip2                  22.9  +0.52%
      spec/2006/int/C/403.gcc                   32.42  -0.54%
      spec/2006/int/C/429.mcf                   39.59  +0.19%
      spec/2006/int/C/445.gobmk                 26.98  -0.00%
      spec/2006/int/C/456.hmmer                 24.52  -0.18%
      spec/2006/int/C/458.sjeng                 28.26  +0.02%
      spec/2006/int/C/462.libquantum            55.44  +3.74%
      spec/2006/int/C/464.h264ref               46.67  -0.39%
      
      geometric mean                                   +0.20%
      
      Manually checked 473 and 471 to verify the diff is in the noise range.
      
      Reviewers: rengolin, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D24818
      
      llvm-svn: 284541
      ea62ae98
  3. Oct 11, 2016
    • Kyle Butt's avatar
      Codegen: Tail-duplicate during placement. · 0846e56e
      Kyle Butt authored
      The tail duplication pass uses an assumed layout when making duplication
      decisions. This is fine, but passes up duplication opportunities that
      may arise when blocks are outlined. Because we want the updated CFG to
      affect subsequent placement decisions, this change must occur during
      placement.
      
      In order to achieve this goal, TailDuplicationPass is split into a
      utility class, TailDuplicator, and the pass itself. The pass delegates
      nearly everything to the TailDuplicator object, except for looping over
      the blocks in a function. This allows the same code to be used for tail
      duplication in both places.
      
      This change, in concert with outlining optional branches, allows
      triangle shaped code to perform much better, esepecially when the
      taken/untaken branches are correlated, as it creates a second spine when
      the tests are small enough.
      
      Issue from previous rollback fixed, and a new test was added for that
      case as well. Issue was worklist/scheduling/taildup issue in layout.
      
      Issue from 2nd rollback fixed, with 2 additional tests. Issue was
      tail merging/loop info/tail-duplication causing issue with loops that share
      a header block.
      
      Issue with early tail-duplication of blocks that branch to a fallthrough
      predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
      
      Differential revision: https://reviews.llvm.org/D18226
      
      llvm-svn: 283934
      0846e56e
    • Daniel Jasper's avatar
      Revert "Codegen: Tail-duplicate during placement." · 0c42dc47
      Daniel Jasper authored
      This reverts commit r283842.
      
      test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
      internal testing. I'll share a link with you.
      
      llvm-svn: 283857
      0c42dc47
    • Kyle Butt's avatar
      Codegen: Tail-duplicate during placement. · ae068a32
      Kyle Butt authored
      The tail duplication pass uses an assumed layout when making duplication
      decisions. This is fine, but passes up duplication opportunities that
      may arise when blocks are outlined. Because we want the updated CFG to
      affect subsequent placement decisions, this change must occur during
      placement.
      
      In order to achieve this goal, TailDuplicationPass is split into a
      utility class, TailDuplicator, and the pass itself. The pass delegates
      nearly everything to the TailDuplicator object, except for looping over
      the blocks in a function. This allows the same code to be used for tail
      duplication in both places.
      
      This change, in concert with outlining optional branches, allows
      triangle shaped code to perform much better, esepecially when the
      taken/untaken branches are correlated, as it creates a second spine when
      the tests are small enough.
      
      Issue from previous rollback fixed, and a new test was added for that
      case as well. Issue was worklist/scheduling/taildup issue in layout.
      
      Issue from 2nd rollback fixed, with 2 additional tests. Issue was
      tail merging/loop info/tail-duplication causing issue with loops that share
      a header block.
      
      Issue with early tail-duplication of blocks that branch to a fallthrough
      predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
      
      Differential revision: https://reviews.llvm.org/D18226
      
      llvm-svn: 283842
      ae068a32
  4. Oct 08, 2016
    • Kyle Butt's avatar
      Revert "Codegen: Tail-duplicate during placement." · 2facd194
      Kyle Butt authored
      This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.
      
      llvm-svn: 283647
      2facd194
    • Kyle Butt's avatar
      Codegen: Tail-duplicate during placement. · 37e676d8
      Kyle Butt authored
      The tail duplication pass uses an assumed layout when making duplication
      decisions. This is fine, but passes up duplication opportunities that
      may arise when blocks are outlined. Because we want the updated CFG to
      affect subsequent placement decisions, this change must occur during
      placement.
      
      In order to achieve this goal, TailDuplicationPass is split into a
      utility class, TailDuplicator, and the pass itself. The pass delegates
      nearly everything to the TailDuplicator object, except for looping over
      the blocks in a function. This allows the same code to be used for tail
      duplication in both places.
      
      This change, in concert with outlining optional branches, allows
      triangle shaped code to perform much better, esepecially when the
      taken/untaken branches are correlated, as it creates a second spine when
      the tests are small enough.
      
      Issue from previous rollback fixed, and a new test was added for that
      case as well. Issue was worklist/scheduling/taildup issue in layout.
      
      Issue from 2nd rollback fixed, with 2 additional tests. Issue was
      tail merging/loop info/tail-duplication causing issue with loops that share
      a header block.
      
      Differential revision: https://reviews.llvm.org/D18226
      
      llvm-svn: 283619
      37e676d8
  5. Oct 05, 2016
    • Kyle Butt's avatar
      Revert "Codegen: Tail-duplicate during placement." · 25ac35d8
      Kyle Butt authored
      This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.
      
      Issue with loop info and block removal revealed by polly.
      I have a fix for this issue already in another patch, I'll re-roll this
      together with that fix, and a test case.
      
      llvm-svn: 283292
      25ac35d8
    • Kyle Butt's avatar
      Codegen: Tail-duplicate during placement. · adabac2d
      Kyle Butt authored
      The tail duplication pass uses an assumed layout when making duplication
      decisions. This is fine, but passes up duplication opportunities that
      may arise when blocks are outlined. Because we want the updated CFG to
      affect subsequent placement decisions, this change must occur during
      placement.
      
      In order to achieve this goal, TailDuplicationPass is split into a
      utility class, TailDuplicator, and the pass itself. The pass delegates
      nearly everything to the TailDuplicator object, except for looping over
      the blocks in a function. This allows the same code to be used for tail
      duplication in both places.
      
      This change, in concert with outlining optional branches, allows
      triangle shaped code to perform much better, esepecially when the
      taken/untaken branches are correlated, as it creates a second spine when
      the tests are small enough.
      
      Issue from previous rollback fixed, and a new test was added for that
      case as well.
      
      Differential revision: https://reviews.llvm.org/D18226
      
      llvm-svn: 283274
      adabac2d
  6. Oct 04, 2016
    • Kyle Butt's avatar
      Revert "Codegen: Tail-duplicate during placement." · 3ffb8529
      Kyle Butt authored
      This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.
      
      Causing crashes on aarch64 build.
      
      llvm-svn: 283172
      3ffb8529
    • Kyle Butt's avatar
      Codegen: Tail-duplicate during placement. · 396bfdd7
      Kyle Butt authored
      The tail duplication pass uses an assumed layout when making duplication
      decisions. This is fine, but passes up duplication opportunities that
      may arise when blocks are outlined. Because we want the updated CFG to
      affect subsequent placement decisions, this change must occur during
      placement.
      
      In order to achieve this goal, TailDuplicationPass is split into a
      utility class, TailDuplicator, and the pass itself. The pass delegates
      nearly everything to the TailDuplicator object, except for looping over
      the blocks in a function. This allows the same code to be used for tail
      duplication in both places.
      
      This change, in concert with outlining optional branches, allows
      triangle shaped code to perform much better, esepecially when the
      taken/untaken branches are correlated, as it creates a second spine when
      the tests are small enough.
      
      llvm-svn: 283164
      396bfdd7
  7. Jul 29, 2016
    • Kyle Butt's avatar
      Codegen: MachineBlockPlacement Improve probability layout. · 02d8d054
      Kyle Butt authored
      The following pattern was being layed out poorly:
      
                    A
                   / \
                  B   C
                 / \ / \
                D   E   ? (Doesn't matter)
      
      Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)
      
      The current algorithm gives:
      A,B,C,E (D goes on worklist)
      
      It does this even if C has a frequency count of 0. This patch
      adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
      then we go ahead and layout E rather than C. Fallthrough half the time
      is better than fallthrough never, or fallthrough very rarely. The
      resulting layout is:
      
      A,B,E, (C and D are in a worklist)
      
      llvm-svn: 277187
      02d8d054
  8. Jun 15, 2016
  9. Jun 08, 2016
    • Dehao Chen's avatar
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and... · 769219b1
      Dehao Chen authored
      Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
      
      Summary:
      Consider the following diamond CFG:
      
       A
      / \
      B C
       \/
       D
      
      Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
      
      Reviewers: djasper, davidxl
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D20989
      
      llvm-svn: 272203
      769219b1
  10. Apr 07, 2016
    • Amaury Sechet's avatar
      Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule · c53ad4f3
      Amaury Sechet authored
      Summary:
      EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.
      
      EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.
      
      Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17625
      
      llvm-svn: 265726
      c53ad4f3
  11. Apr 05, 2016
    • Chuang-Yu Cheng's avatar
      Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge · d3fb38ca
      Chuang-Yu Cheng authored
      Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
      basic blocks. This pass can delete loop preheaders which frequently creates
      critical edges. A preheader can be a convenient place to spill registers to
      the stack. If the entrance to a loop body is a critical edge, then spills
      may occur in the loop body rather than immediately before it. This patch
      protects loop preheaders from deletion in CodeGenPrepare even if they are
      nearly empty.
      
      Since the patch alters the CFG, it affects a large number of test cases.
      In most cases, the changes are merely cosmetic (basic blocks have different
      names or instruction orders change slightly). I am somewhat concerned about
      the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
      deleted, then the MIPS backend does not take advantage of a branch delay
      slot. Consequently, I would like some close review by a MIPS expert.
      
      The patch also partially subsumes D16893 from George Burgess IV. George
      correctly notes that CodeGenPrepare does not actually preserve the dominator
      tree. I think the dominator tree was usually not valid when CodeGenPrepare
      ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
      now always valid before CodeGenPrepare.
      
      Author: Tom Jablin (tjablin)
      Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng
      
      http://reviews.llvm.org/D16984
      
      llvm-svn: 265397
      d3fb38ca
  12. Mar 23, 2016
    • Cong Hou's avatar
      Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. · 94710840
      Cong Hou authored
      Currently, AnalyzeBranch() fails non-equality comparison between floating points
      on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
      function can modify the branch by reversing the conditional jump and removing
      unconditional jump if there is a proper fall-through. However, in the case of
      non-equality comparison between floating points, this can turn the branch
      "unanalyzable". Consider the following case:
      
      jne.BB1
      jp.BB1
      jmp.BB2
      .BB1:
      ...
      .BB2:
      ...
      
      AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
      removed:
      
      jne.BB1
      jnp.BB2
      .BB1:
      ...
      .BB2:
      ...
      
      However, AnalyzeBranch() cannot analyze this branch anymore as there are two
      conditional jumps with different targets. This may disable some optimizations
      like block-placement: in this case the fall-through behavior is enforced even if
      the fall-through block is very cold, which is suboptimal.
      
      Actually this optimization is also done in block-placement pass, which means we
      can remove this optimization from AnalyzeBranch(). However, currently
      X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
      negation conditions for them.
      
      In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
      and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
      Here only the second conditional jump is reversed. This is valid as we only need
      them to do this "unconditional jump removal" optimization.
      
      
      Differential Revision: http://reviews.llvm.org/D11393
      
      llvm-svn: 264199
      94710840
  13. Jan 27, 2016
  14. Jan 26, 2016
    • Cong Hou's avatar
      Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. · 551a57f7
      Cong Hou authored
      Currently, AnalyzeBranch() fails non-equality comparison between floating points
      on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
      function can modify the branch by reversing the conditional jump and removing
      unconditional jump if there is a proper fall-through. However, in the case of
      non-equality comparison between floating points, this can turn the branch
      "unanalyzable". Consider the following case:
      
      jne.BB1
      jp.BB1
      jmp.BB2
      .BB1:
      ...
      .BB2:
      ...
      
      AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
      removed:
      
      jne.BB1
      jnp.BB2
      .BB1:
      ...
      .BB2:
      ...
      
      However, AnalyzeBranch() cannot analyze this branch anymore as there are two
      conditional jumps with different targets. This may disable some optimizations
      like block-placement: in this case the fall-through behavior is enforced even if
      the fall-through block is very cold, which is suboptimal.
      
      Actually this optimization is also done in block-placement pass, which means we
      can remove this optimization from AnalyzeBranch(). However, currently
      X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
      negation conditions for them.
      
      In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
      and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
      Here only the second conditional jump is reversed. This is valid as we only need
      them to do this "unconditional jump removal" optimization.
      
      
      Differential Revision: http://reviews.llvm.org/D11393
      
      llvm-svn: 258847
      551a57f7
    • Dan Gohman's avatar
      [MC] Use .p2align instead of .align · 61d15ae4
      Dan Gohman authored
      For historic reasons, the behavior of .align differs between targets.
      Fortunately, there are alternatives, .p2align and .balign, which make the
      interpretation of the parameter explicit, and which behave consistently across
      targets.
      
      This patch teaches MC to use .p2align instead of .align, so that people reading
      code for multiple architectures don't have to remember which way each platform
      does its .align directive.
      
      Differential Revision: http://reviews.llvm.org/D16549
      
      llvm-svn: 258750
      61d15ae4
  15. Jun 17, 2015
    • David Majnemer's avatar
      Move the personality function from LandingPadInst to Function · 7fddeccb
      David Majnemer authored
      The personality routine currently lives in the LandingPadInst.
      
      This isn't desirable because:
      - All LandingPadInsts in the same function must have the same
        personality routine.  This means that each LandingPadInst beyond the
        first has an operand which produces no additional information.
      
      - There is ongoing work to introduce EH IR constructs other than
        LandingPadInst.  Moving the personality routine off of any one
        particular Instruction and onto the parent function seems a lot better
        than have N different places a personality function can sneak onto an
        exceptional function.
      
      Differential Revision: http://reviews.llvm.org/D10429
      
      llvm-svn: 239940
      7fddeccb
  16. Mar 05, 2015
    • Chandler Carruth's avatar
      [MBP] Revert r231238 which attempted to fix a nasty bug where MBP is · af7e99f2
      Chandler Carruth authored
      just arbitrarily interleaving unrelated control flows once they get
      moved "out-of-line" (both outside of natural CFG ordering and with
      diamonds that cannot be fully laid out by chaining fallthrough edges).
      
      This easy solution doesn't work in practice, and it isn't just a small
      bug. It looks like a very different strategy will be required. I'm
      working on that now, and it'll again go behind some flag so that
      everyone can experiment and make sure it is working well for them.
      
      llvm-svn: 231332
      af7e99f2
  17. Mar 04, 2015
    • Chandler Carruth's avatar
      [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind · 9a53fbe2
      Chandler Carruth authored
      a flag for now.
      
      First off, thanks to Daniel Jasper for really pointing out the issue
      here. It's been here forever (at least, I think it was there when
      I first wrote this code) without getting really noticed or fixed.
      
      The key problem is what happens when two reasonably common patterns
      happen at the same time: we outline multiple cold regions of code, and
      those regions in turn have diamonds or other CFGs for which we can't
      just topologically lay them out. Consider some C code that looks like:
      
        if (a1()) { if (b1()) c1(); else d1(); f1(); }
        if (a2()) { if (b2()) c2(); else d2(); f2(); }
        done();
      
      Now consider the case where a1() and a2() are unlikely to be true. In
      that case, we might lay out the first part of the function like:
      
        a1, a2, done;
      
      And then we will be out of successors in which to build the chain. We go
      to find the best block to continue the chain with, which is perfectly
      reasonable here, and find "b1" let's say. Laying out successors gets us
      to:
      
        a1, a2, done; b1, c1;
      
      At this point, we will refuse to lay out the successor to c1 (f1)
      because there are still un-placed predecessors of f1 and we want to try
      to preserve the CFG structure. So we go get the next best block, d1.
      
      ... wait for it ...
      
      Except that the next best block *isn't* d1. It is b2! d1 is waaay down
      inside these conditionals. It is much less important than b2. Except
      that this is exactly what we didn't want. If we keep going we get the
      entire set of the rest of the CFG *interleaved*!!!
      
        a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
      
      So we clearly need a better strategy here. =] My current favorite
      strategy is to actually try to place the block whose predecessor is
      closest. This very simply ensures that we unwind these kinds of CFGs the
      way that is natural and fitting, and should minimize the number of cache
      lines instructions are spread across.
      
      It also happens to be *dead simple*. It's like the datastructure was
      specifically set up for this use case or something. We only push blocks
      onto the work list when the last predecessor for them is placed into the
      chain. So the back of the worklist *is* the nearest next block.
      
      Unfortunately, a change like this is going to cause *soooo* many
      benchmarks to swing wildly. So for now I'm adding this under a flag so
      that we and others can validate that this is fixing the problems
      described, that it seems possible to enable, and hopefully that it fixes
      more of our problems long term.
      
      llvm-svn: 231238
      9a53fbe2
  18. Feb 27, 2015
    • David Blaikie's avatar
      [opaque pointer type] Add textual IR support for explicit type parameter to load instruction · a79ac14f
      David Blaikie authored
      Essentially the same as the GEP change in r230786.
      
      A similar migration script can be used to update test cases, though a few more
      test case improvements/changes were required this time around: (r229269-r229278)
      
      import fileinput
      import sys
      import re
      
      pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
      
      for line in sys.stdin:
        sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
      
      Reviewers: rafael, dexonsmith, grosser
      
      Differential Revision: http://reviews.llvm.org/D7649
      
      llvm-svn: 230794
      a79ac14f
    • David Blaikie's avatar
      [opaque pointer type] Add textual IR support for explicit type parameter to... · 79e6c749
      David Blaikie authored
      [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
      
      One of several parallel first steps to remove the target type of pointers,
      replacing them with a single opaque pointer type.
      
      This adds an explicit type parameter to the gep instruction so that when the
      first parameter becomes an opaque pointer type, the type to gep through is
      still available to the instructions.
      
      * This doesn't modify gep operators, only instructions (operators will be
        handled separately)
      
      * Textual IR changes only. Bitcode (including upgrade) and changing the
        in-memory representation will be in separate changes.
      
      * geps of vectors are transformed as:
          getelementptr <4 x float*> %x, ...
        ->getelementptr float, <4 x float*> %x, ...
        Then, once the opaque pointer type is introduced, this will ultimately look
        like:
          getelementptr float, <4 x ptr> %x
        with the unambiguous interpretation that it is a vector of pointers to float.
      
      * address spaces remain on the pointer, not the type:
          getelementptr float addrspace(1)* %x
        ->getelementptr float, float addrspace(1)* %x
        Then, eventually:
          getelementptr float, ptr addrspace(1) %x
      
      Importantly, the massive amount of test case churn has been automated by
      same crappy python code. I had to manually update a few test cases that
      wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
      python script just massages stdin and writes the result to stdout, I
      then wrapped that in a shell script to handle replacing files, then
      using the usual find+xargs to migrate all the files.
      
      update.py:
      import fileinput
      import sys
      import re
      
      ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
      normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
      
      def conv(match, line):
        if not match:
          return line
        line = match.groups()[0]
        if len(match.groups()[5]) == 0:
          line += match.groups()[2]
        line += match.groups()[3]
        line += ", "
        line += match.groups()[1]
        line += "\n"
        return line
      
      for line in sys.stdin:
        if line.find("getelementptr ") == line.find("getelementptr inbounds"):
          if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
            line = conv(re.match(ibrep, line), line)
        elif line.find("getelementptr ") != line.find("getelementptr ("):
          line = conv(re.match(normrep, line), line)
        sys.stdout.write(line)
      
      apply.sh:
      for name in "$@"
      do
        python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
        rm -f "$name.tmp"
      done
      
      The actual commands:
      From llvm/src:
      find test/ -name *.ll | xargs ./apply.sh
      From llvm/src/tools/clang:
      find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
      From llvm/src/tools/polly:
      find test/ -name *.ll | xargs ./apply.sh
      
      After that, check-all (with llvm, clang, clang-tools-extra, lld,
      compiler-rt, and polly all checked out).
      
      The extra 'rm' in the apply.sh script is due to a few files in clang's test
      suite using interesting unicode stuff that my python script was throwing
      exceptions on. None of those files needed to be migrated, so it seemed
      sufficient to ignore those cases.
      
      Reviewers: rafael, dexonsmith, grosser
      
      Differential Revision: http://reviews.llvm.org/D7636
      
      llvm-svn: 230786
      79e6c749
  19. Dec 15, 2014
    • Duncan P. N. Exon Smith's avatar
      IR: Make metadata typeless in assembly · be7ea19b
      Duncan P. N. Exon Smith authored
      Now that `Metadata` is typeless, reflect that in the assembly.  These
      are the matching assembly changes for the metadata/value split in
      r223802.
      
        - Only use the `metadata` type when referencing metadata from a call
          intrinsic -- i.e., only when it's used as a `Value`.
      
        - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
          when referencing it from call intrinsics.
      
      So, assembly like this:
      
          define @foo(i32 %v) {
            call void @llvm.foo(metadata !{i32 %v}, metadata !0)
            call void @llvm.foo(metadata !{i32 7}, metadata !0)
            call void @llvm.foo(metadata !1, metadata !0)
            call void @llvm.foo(metadata !3, metadata !0)
            call void @llvm.foo(metadata !{metadata !3}, metadata !0)
            ret void, !bar !2
          }
          !0 = metadata !{metadata !2}
          !1 = metadata !{i32* @global}
          !2 = metadata !{metadata !3}
          !3 = metadata !{}
      
      turns into this:
      
          define @foo(i32 %v) {
            call void @llvm.foo(metadata i32 %v, metadata !0)
            call void @llvm.foo(metadata i32 7, metadata !0)
            call void @llvm.foo(metadata i32* @global, metadata !0)
            call void @llvm.foo(metadata !3, metadata !0)
            call void @llvm.foo(metadata !{!3}, metadata !0)
            ret void, !bar !2
          }
          !0 = !{!2}
          !1 = !{i32* @global}
          !2 = !{!3}
          !3 = !{}
      
      I wrote an upgrade script that handled almost all of the tests in llvm
      and many of the tests in cfe (even handling many `CHECK` lines).  I've
      attached it (or will attach it in a moment if you're speedy) to PR21532
      to help everyone update their out-of-tree testcases.
      
      This is part of PR21532.
      
      llvm-svn: 224257
      be7ea19b
  20. Jul 23, 2014
    • Chandler Carruth's avatar
      [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate · 9a0051cd
      Chandler Carruth authored
      insertions.
      
      The old behavior could cause arbitrarily bad memory usage in the DAG
      combiner if there was heavy traffic of adding nodes already on the
      worklist to it. This commit switches the DAG combine worklist to work
      the same way as the instcombine worklist where we null-out removed
      entries and only add new entries to the worklist. My measurements of
      codegen time shows slight improvement. The memory utilization is
      unsurprisingly dominated by other factors (the IR and DAG itself
      I suspect).
      
      This change results in subtle, frustrating churn in the particular order
      in which DAG combines are applied which causes a number of minor
      regressions where we fail to match a pattern previously matched by
      accident. AFAICT, all of these should be using AddToWorklist to directly
      or should be written in a less brittle way. None of the changes seem
      drastically bad, and a few of the changes seem distinctly better.
      
      A major change required to make this work is to significantly harden the
      way in which the DAG combiner handle nodes which become dead
      (zero-uses). Previously, we relied on the ability to "priority-bump"
      them on the combine worklist to achieve recursive deletion of these
      nodes and ensure that the frontier of remaining live nodes all were
      added to the worklist. Instead, I've introduced a routine to just
      implement that precise logic with no indirection. It is a significantly
      simpler operation than that of the combiner worklist proper. I suspect
      this will also fix some other problems with the combiner.
      
      I think the x86 changes are really minor and uninteresting, but the
      avx512 change at least is hiding a "regression" (despite the test case
      being just noise, not testing some performance invariant) that might be
      looked into. Not sure if any of the others impact specific "important"
      code paths, but they didn't look terribly interesting to me, or the
      changes were really minor. The consensus in review is to fix any
      regressions that show up after the fact here.
      
      Thanks to the other reviewers for checking the output on other
      architectures. There is a specific regression on ARM that Tim already
      has a fix prepped to commit.
      
      Differential Revision: http://reviews.llvm.org/D4616
      
      llvm-svn: 213727
      9a0051cd
  21. Jan 24, 2014
    • Alp Toker's avatar
      Fix known typos · cb402911
      Alp Toker authored
      Sweep the codebase for common typos. Includes some changes to visible function
      names that were misspelt.
      
      llvm-svn: 200018
      cb402911
  22. Jul 13, 2013
    • Stephen Lin's avatar
      Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging.... · f799e3f9
      Stephen Lin authored
      Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion.
      
      This was done with the following sed invocation to catch label lines demarking function boundaries:
          sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
      which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
      
      llvm-svn: 186258
      f799e3f9
  23. Jun 24, 2013
    • Andrew Trick's avatar
      Fix tail merging to assign the (more) correct BasicBlock when splitting. · 97a1d7c4
      Andrew Trick authored
      This makes it possible to write unit tests that are less susceptible
      to minor code motion, particularly copy placement. block-placement.ll
      covers this case with -pre-RA-sched=source which will soon be
      default. One incorrectly named block is already fixed, but without
      this fix, enabling new coalescing and scheduling would cause more
      failures.
      
      llvm-svn: 184680
      97a1d7c4
  24. May 24, 2013
    • Diego Novillo's avatar
      Add a new function attribute 'cold' to functions. · c6399539
      Diego Novillo authored
      Other than recognizing the attribute, the patch does little else.
      It changes the branch probability analyzer so that edges into
      blocks postdominated by a cold function are given low weight.
      
      Added analysis and code generation tests.  Added documentation for the
      new attribute.
      
      llvm-svn: 182638
      c6399539
  25. Apr 30, 2013
  26. Aug 07, 2012
  27. Apr 16, 2012
    • Chandler Carruth's avatar
      Flip the new block-placement pass to be on by default. · 4190b507
      Chandler Carruth authored
      This is mostly to test the waters. I'd like to get results from FNT
      build bots and other bots running on non-x86 platforms.
      
      This feature has been pretty heavily tested over the last few months by
      me, and it fixes several of the execution time regressions caused by the
      inlining work by preventing inlining decisions from radically impacting
      block layout.
      
      I've seen very large improvements in yacr2 and ackermann benchmarks,
      along with the expected noise across all of the benchmark suite whenever
      code layout changes. I've analyzed all of the regressions and fixed
      them, or found them to be impossible to fix. See my email to llvmdev for
      more details.
      
      I'd like for this to be in 3.1 as it complements the inliner changes,
      but if any failures are showing up or anyone has concerns, it is just
      a flag flip and so can be easily turned off.
      
      I'm switching it on tonight to try and get at least one run through
      various folks' performance suites in case SPEC or something else has
      serious issues with it. I'll watch bots and revert if anything shows up.
      
      llvm-svn: 154816
      4190b507
    • Chandler Carruth's avatar
      Add a somewhat hacky heuristic to do something different from whole-loop · 8c0b41d6
      Chandler Carruth authored
      rotation. When there is a loop backedge which is an unconditional
      branch, we will end up with a branch somewhere no matter what. Try
      placing this backedge in a fallthrough position above the loop header as
      that will definitely remove at least one branch from the loop iteration,
      where whole loop rotation may not.
      
      I haven't seen any benchmarks where this is important but loop-blocks.ll
      tests for it, and so this will be covered when I flip the default.
      
      llvm-svn: 154812
      8c0b41d6
    • Chandler Carruth's avatar
      Tweak the loop rotation logic to check whether the loop is naturally · 8c74c7b1
      Chandler Carruth authored
      laid out in a form with a fallthrough into the header and a fallthrough
      out of the bottom. In that case, leave the loop alone because any
      rotation will introduce unnecessary branches. If either side looks like
      it will require an explicit branch, then the rotation won't add any, do
      it to ensure the branch occurs outside of the loop (if possible) and
      maximize the benefit of the fallthrough in the bottom.
      
      llvm-svn: 154806
      8c74c7b1
    • Chandler Carruth's avatar
      Rewrite how machine block placement handles loop rotation. · ccc7e42b
      Chandler Carruth authored
      This is a complex change that resulted from a great deal of
      experimentation with several different benchmarks. The one which proved
      the most useful is included as a test case, but I don't know that it
      captures all of the relevant changes, as I didn't have specific
      regression tests for each, they were more the result of reasoning about
      what the old algorithm would possibly do wrong. I'm also failing at the
      moment to craft more targeted regression tests for these changes, if
      anyone has ideas, it would be welcome.
      
      The first big thing broken with the old algorithm is the idea that we
      can take a basic block which has a loop-exiting successor and a looping
      successor and use the looping successor as the layout top in order to
      get that particular block to be the bottom of the loop after layout.
      This happens to work in many cases, but not in all.
      
      The second big thing broken was that we didn't try to select the exit
      which fell into the nearest enclosing loop (to which we exit at all). As
      a consequence, even if the rotation worked perfectly, it would result in
      one of two bad layouts. Either the bottom of the loop would get
      fallthrough, skipping across a nearer enclosing loop and thereby making
      it discontiguous, or it would be forced to take an explicit jump over
      the nearest enclosing loop to earch its successor. The point of the
      rotation is to get fallthrough, so we need it to fallthrough to the
      nearest loop it can.
      
      The fix to the first issue is to actually layout the loop from the loop
      header, and then rotate the loop such that the correct exiting edge can
      be a fallthrough edge. This is actually much easier than I anticipated
      because we can handle all the hard parts of finding a viable rotation
      before we do the layout. We just store that, and then rotate after
      layout is finished. No inner loops get split across the post-rotation
      backedge because we check for them when selecting the rotation.
      
      That fix exposed a latent problem with our exitting block selection --
      we should allow the backedge to point into the middle of some inner-loop
      chain as there is no real penalty to it, the whole point is that it
      *won't* be a fallthrough edge. This may have blocked the rotation at all
      in some cases, I have no idea and no test case as I've never seen it in
      practice, it was just noticed by inspection.
      
      Finally, all of these fixes, and studying the loops they produce,
      highlighted another problem: in rotating loops like this, we sometimes
      fail to align the destination of these backwards jumping edges. Fix this
      by actually walking the backwards edges rather than relying on loopinfo.
      
      This fixes regressions on heapsort if block placement is enabled as well
      as lots of other cases where the previous logic would introduce an
      abundance of unnecessary branches into the execution.
      
      llvm-svn: 154783
      ccc7e42b
  28. Nov 27, 2011
    • Chandler Carruth's avatar
      Take two on rotating the block ordering of loops. My previous attempt · 03adbd46
      Chandler Carruth authored
      was centered around the premise of laying out a loop in a chain, and
      then rotating that chain. This is good for preserving contiguous layout,
      but bad for actually making sane rotations. In order to keep it safe,
      I had to essentially make it impossible to rotate deeply nested loops.
      The information needed to correctly reason about a deeply nested loop is
      actually available -- *before* we layout the loop. We know the inner
      loops are already fused into chains, etc. We lose information the moment
      we actually lay out the loop.
      
      The solution was the other alternative for this algorithm I discussed
      with Benjamin and some others: rather than rotating the loop
      after-the-fact, try to pick a profitable starting block for the loop's
      layout, and then use our existing layout logic. I was worried about the
      complexity of this "pick" step, but it turns out such complexity is
      needed to handle all the important cases I keep teasing out of benchmarks.
      
      This is, I'm afraid, a bit of a work-in-progress. It is still
      misbehaving on some likely important cases I'm investigating in Olden.
      It also isn't really tested. I'm going to try to craft some interesting
      nested-loop test cases, but it's likely to be extremely time consuming
      and I don't want to go there until I'm sure I'm testing the correct
      behavior. Sadly I can't come up with a way of getting simple, fine
      grained test cases for this logic. We need complex loop structures to
      even trigger much of it.
      
      llvm-svn: 145183
      03adbd46
Loading