Skip to content
  1. Jun 08, 2016
  2. Jun 07, 2016
  3. Jun 06, 2016
  4. Jun 05, 2016
  5. Jun 04, 2016
  6. Jun 03, 2016
  7. Jun 02, 2016
  8. Jun 01, 2016
  9. May 29, 2016
  10. May 28, 2016
  11. May 27, 2016
    • Rui Ueyama's avatar
      Avoid doing binary search. · 406b469d
      Rui Ueyama authored
      MergedInputSection::getOffset is the busiest function in LLD if string
      merging is enabled and input files have lots of mergeable sections.
      It is usually the case when creating executable with debug info,
      so it is pretty common.
      
      The reason why it is slow is because it has to do faily complex
      computations. For non-mergeable sections, section contents are
      contiguous in output, so in order to compute an output offset,
      we only have to add the output section's base address to an input
      offset. But for mergeable strings, section contents are split for
      merging, so they are not contigous. We've got to do some lookups.
      
      We used to do binary search on the list of section pieces.
      It is slow because I think it's hostile to branch prediction.
      
      This patch replaces it with hash table lookup. Seems it's working
      pretty well. Below is "perf stat -r10" output when linking clang
      with debug info. In this case this patch speeds up about 4%.
      
      Before:
      
             6584.153205 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.09% )
                     238 context-switches          #    0.036 K/sec                    ( +-  6.59% )
                       0 cpu-migrations            #    0.000 K/sec                    ( +- 50.92% )
               1,067,675 page-faults               #    0.162 M/sec                    ( +-  0.15% )
          18,369,931,470 cycles                    #    2.790 GHz                      ( +-  0.09% )
           9,640,680,143 stalled-cycles-frontend   #   52.48% frontend cycles idle     ( +-  0.18% )
         <not supported> stalled-cycles-backend
          21,206,747,787 instructions              #    1.15  insns per cycle
                                                   #    0.45  stalled cycles per insn  ( +-  0.04% )
           3,817,398,032 branches                  #  579.786 M/sec                    ( +-  0.04% )
             132,787,249 branch-misses             #    3.48% of all branches          ( +-  0.02% )
      
             6.579106511 seconds time elapsed                                          ( +-  0.09% )
      
      After:
      
             6312.317533 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.19% )
                     221 context-switches          #    0.035 K/sec                    ( +-  4.11% )
                       1 cpu-migrations            #    0.000 K/sec                    ( +- 45.21% )
               1,280,775 page-faults               #    0.203 M/sec                    ( +-  0.37% )
          17,611,539,150 cycles                    #    2.790 GHz                      ( +-  0.19% )
          10,285,148,569 stalled-cycles-frontend   #   58.40% frontend cycles idle     ( +-  0.30% )
         <not supported> stalled-cycles-backend
          18,794,779,900 instructions              #    1.07  insns per cycle
                                                   #    0.55  stalled cycles per insn  ( +-  0.03% )
           3,287,450,865 branches                  #  520.799 M/sec                    ( +-  0.03% )
              72,259,605 branch-misses             #    2.20% of all branches          ( +-  0.01% )
      
             6.307411828 seconds time elapsed                                          ( +-  0.19% )
      
      Differential Revision: http://reviews.llvm.org/D20645
      
      llvm-svn: 270999
      406b469d
    • Peter Collingbourne's avatar
      Update LLD for D20550. · 5079f3b7
      Peter Collingbourne authored
      Differential Revision: http://reviews.llvm.org/D20704
      
      llvm-svn: 270968
      5079f3b7
Loading