Skip to content
  • Shuxin Yang's avatar
    Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by about 30% · 8b8fd217
    Shuxin Yang authored
    (4.58s vs 3.2s on an oldish Mac Tower). 
    
      The corresponding src is excerpted bellow. The lopp accounts for about 90% of execution time.
      --------------------
        cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c
         90 
         91         for (k=0; k<j; k++)
         92           if (other_node == cur_node->to_nodes[k]) break;
    
      The defective layout is sketched bellow, where the two branches need to swap.
      ------------------------------------------------------------------------
          L:
             ...
          if (cond) goto out-of-loop
          goto L
    
      While this code sequence is defective, I don't understand why it incurs 1/3 of 
    execution time. CPU-event-profiling indicates the poor laoyout dose not increase
    in br-misprediction; it dosen't increase stall cycle at all, and it dosen't 
    prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be working fine
    as well)... 
    
       The root cause of the problem is that the layout pass calls AnalyzeBranch() 
    with basic-block which is not updated to reflect its current layout.
    
    rdar://13966341
    
    llvm-svn: 183174
    8b8fd217
Loading