Skip to content
  • Chandler Carruth's avatar
    [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind · 9a53fbe2
    Chandler Carruth authored
    a flag for now.
    
    First off, thanks to Daniel Jasper for really pointing out the issue
    here. It's been here forever (at least, I think it was there when
    I first wrote this code) without getting really noticed or fixed.
    
    The key problem is what happens when two reasonably common patterns
    happen at the same time: we outline multiple cold regions of code, and
    those regions in turn have diamonds or other CFGs for which we can't
    just topologically lay them out. Consider some C code that looks like:
    
      if (a1()) { if (b1()) c1(); else d1(); f1(); }
      if (a2()) { if (b2()) c2(); else d2(); f2(); }
      done();
    
    Now consider the case where a1() and a2() are unlikely to be true. In
    that case, we might lay out the first part of the function like:
    
      a1, a2, done;
    
    And then we will be out of successors in which to build the chain. We go
    to find the best block to continue the chain with, which is perfectly
    reasonable here, and find "b1" let's say. Laying out successors gets us
    to:
    
      a1, a2, done; b1, c1;
    
    At this point, we will refuse to lay out the successor to c1 (f1)
    because there are still un-placed predecessors of f1 and we want to try
    to preserve the CFG structure. So we go get the next best block, d1.
    
    ... wait for it ...
    
    Except that the next best block *isn't* d1. It is b2! d1 is waaay down
    inside these conditionals. It is much less important than b2. Except
    that this is exactly what we didn't want. If we keep going we get the
    entire set of the rest of the CFG *interleaved*!!!
    
      a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
    
    So we clearly need a better strategy here. =] My current favorite
    strategy is to actually try to place the block whose predecessor is
    closest. This very simply ensures that we unwind these kinds of CFGs the
    way that is natural and fitting, and should minimize the number of cache
    lines instructions are spread across.
    
    It also happens to be *dead simple*. It's like the datastructure was
    specifically set up for this use case or something. We only push blocks
    onto the work list when the last predecessor for them is placed into the
    chain. So the back of the worklist *is* the nearest next block.
    
    Unfortunately, a change like this is going to cause *soooo* many
    benchmarks to swing wildly. So for now I'm adding this under a flag so
    that we and others can validate that this is fixing the problems
    described, that it seems possible to enable, and hopefully that it fixes
    more of our problems long term.
    
    llvm-svn: 231238
    9a53fbe2
Loading