Commits · 9be88629d54b7a6c19104dc5d88ce4020035f7a6 · Lorenzo Albano / LLVM bpEVL

May 27, 2016

Rui Ueyama authored May 27, 2016

MergedInputSection::getOffset is the busiest function in LLD if string
merging is enabled and input files have lots of mergeable sections.
It is usually the case when creating executable with debug info,
so it is pretty common.

The reason why it is slow is because it has to do faily complex
computations. For non-mergeable sections, section contents are
contiguous in output, so in order to compute an output offset,
we only have to add the output section's base address to an input
offset. But for mergeable strings, section contents are split for
merging, so they are not contigous. We've got to do some lookups.

We used to do binary search on the list of section pieces.
It is slow because I think it's hostile to branch prediction.

This patch replaces it with hash table lookup. Seems it's working
pretty well. Below is "perf stat -r10" output when linking clang
with debug info. In this case this patch speeds up about 4%.

Before:

       6584.153205 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.09% )
               238 context-switches          #    0.036 K/sec                    ( +-  6.59% )
                 0 cpu-migrations            #    0.000 K/sec                    ( +- 50.92% )
         1,067,675 page-faults               #    0.162 M/sec                    ( +-  0.15% )
    18,369,931,470 cycles                    #    2.790 GHz                      ( +-  0.09% )
     9,640,680,143 stalled-cycles-frontend   #   52.48% frontend cycles idle     ( +-  0.18% )
   <not supported> stalled-cycles-backend
    21,206,747,787 instructions              #    1.15  insns per cycle
                                             #    0.45  stalled cycles per insn  ( +-  0.04% )
     3,817,398,032 branches                  #  579.786 M/sec                    ( +-  0.04% )
       132,787,249 branch-misses             #    3.48% of all branches          ( +-  0.02% )

       6.579106511 seconds time elapsed                                          ( +-  0.09% )

After:

       6312.317533 task-clock (msec)         #    1.001 CPUs utilized            ( +-  0.19% )
               221 context-switches          #    0.035 K/sec                    ( +-  4.11% )
                 1 cpu-migrations            #    0.000 K/sec                    ( +- 45.21% )
         1,280,775 page-faults               #    0.203 M/sec                    ( +-  0.37% )
    17,611,539,150 cycles                    #    2.790 GHz                      ( +-  0.19% )
    10,285,148,569 stalled-cycles-frontend   #   58.40% frontend cycles idle     ( +-  0.30% )
   <not supported> stalled-cycles-backend
    18,794,779,900 instructions              #    1.07  insns per cycle
                                             #    0.55  stalled cycles per insn  ( +-  0.03% )
     3,287,450,865 branches                  #  520.799 M/sec                    ( +-  0.03% )
        72,259,605 branch-misses             #    2.20% of all branches          ( +-  0.01% )

       6.307411828 seconds time elapsed                                          ( +-  0.19% )

Differential Revision: http://reviews.llvm.org/D20645

llvm-svn: 270999

406b469d

Update LLD for D20550. · 5079f3b7
Peter Collingbourne authored May 27, 2016
```
Differential Revision: http://reviews.llvm.org/D20704

llvm-svn: 270968
```
5079f3b7
Make -L description a bit more precise. · 8ef190c7
Sean Silva authored May 27, 2016
```
llvm-svn: 270966
```
8ef190c7
Explain a bit better what --start-lib and --end-lib do. · 3b536d09
Sean Silva authored May 27, 2016
```
llvm-svn: 270965
```
3b536d09
Add a help description for --threads to avoid confusion. · 688fade4
Sean Silva authored May 27, 2016
```
llvm-svn: 270964
```
688fade4

--threads is a flag, not a number · 2c1a9da8

Sean Silva authored May 27, 2016

We would previously accept `--threads=4`, but this option just turns on
threading and does not specify a number of threads.

I ran into this by accident because I was passing `--threads=<n>` but
the number didn't seem to affect anything.

llvm-svn: 270963

2c1a9da8

May 26, 2016

[ELF][MIPS] Handle section symbol points to the .MIPS.options / .reginfo section · 84bb355c

Simon Atanasyan authored May 26, 2016

MIPS .reginfo and .MIPS.options sections are consumed by the linker, and
the linker produces a single output section. But it is possible that
input files contain section symbol points to the corresponding input
section. In case of generation a relocatable output we need to write
such symbols to the output file.

Fixes bug 27878.

Differential Revision: http://reviews.llvm.org/D20688

llvm-svn: 270910

84bb355c

Removed redundant argument. NFC. · a8f9cf18
George Rimar authored May 26, 2016
```
llvm-svn: 270847
```
a8f9cf18

May 25, 2016

[ELF] - Added support for jmp/call relaxations when... · 95433df1

George Rimar authored May 25, 2016

[ELF] - Added support for jmp/call relaxations when R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX are used.

D15779 introduced basic approach to support new relaxations.
This patch implements relaxations for jmp and call instructions,
described in System V Application Binary Interface AMD64 Architecture Processor 
Supplement Draft Version 0.99.8 (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, 
B.2 "B.2 Optimize GOTPCRELX Relocations")

Differential revision: http://reviews.llvm.org/D20622

llvm-svn: 270721

95433df1

Make SectionPiece 8 bytes smaller on LP64. · d8849274

Rui Ueyama authored May 25, 2016

This patch makes SectionPiece class 8 bytes smaller on platforms
on which pointer size is 8 bytes. Sean suggested in a post commit
review for r270340 that this could make a differentce, and it
actually is. Time to link clang (with debug info) improved from
6.725 seconds to 6.589 seconds or by about 2%.

Differential Revision: http://reviews.llvm.org/D20613

llvm-svn: 270717

d8849274

Do not ignore --no_ctors_in_init_array flag. · 1795f782
Rui Ueyama authored May 25, 2016
```
That flag is probably too dangerous to ignore silently.

llvm-svn: 270711
```
1795f782

ELF: Handle empty CIE augmentation string · 594e06b8

Ed Maste authored May 25, 2016

"A zero length string indicates that no augmentation data is present."

The FreeBSD/mips toolchain (GCC 4.2.1) generates .debug_frame sections
containing CIE records that have an empty augmentation string.

Differential Revision: http://reviews.llvm.org/D19928

llvm-svn: 270706

594e06b8

[ELF] - Implemented optimization for R_X86_64_GOTPCREL relocation. · 5c33b91b

George Rimar authored May 25, 2016

System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.8 
(https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, B.2 "B.2 Optimize GOTPCRELX Relocations")
introduces possible relaxations for R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX.

That patch implements the next relaxation: 
mov foo@GOTPCREL(%rip), %reg => lea foo(%rip), %reg
and also opens door for implementing all other ones.

Implementation was suggested by Rafael Ávila de Espíndola with few additions and testcases by myself.

Differential revision: http://reviews.llvm.org/D15779

llvm-svn: 270705

5c33b91b

Really define --export-dynamic-symbol= as an alias to --export-dynamic-symbol. · c789b631
Rui Ueyama authored May 25, 2016
```
Thanks to Sean for pointing it out.

llvm-svn: 270660
```
c789b631
Fix comment. · 02fcf11a
Rui Ueyama authored May 25, 2016
```
llvm-svn: 270659
```
02fcf11a
Reduce code duplication. · e66f45c6
Rui Ueyama authored May 25, 2016
```
llvm-svn: 270657
```
e66f45c6
Add `static` to a file-scope function. · 2487f192
Rui Ueyama authored May 25, 2016
```
llvm-svn: 270652
```
2487f192
Add a few options for compatibility with GNU. · dadda2fe
Rui Ueyama authored May 25, 2016
```
llvm-svn: 270651
```
dadda2fe

May 24, 2016

Create Relocations.cpp and move scanRelocs there. · 0fcdc730

Rui Ueyama authored May 24, 2016

scanReloc and the functions on which scanReloc depends is in total
more than 600 lines of code. Since scanReloc does not depend on Writer,
it is better to move it into a separate file.

Differential Revision: http://reviews.llvm.org/D20554

llvm-svn: 270606

0fcdc730

Use range loop. · 5ee9e7fd
Rafael Espindola authored May 24, 2016
```
Thanks to Rui for the suggestion.

llvm-svn: 270601
```
5ee9e7fd
Fix a wrong assumption. · 1f5696f9
Rafael Espindola authored May 24, 2016
```
llvm-svn: 270573
```
1f5696f9

Do not start over relocation search from beginning. · 19ccffe4

Rui Ueyama authored May 24, 2016

This patch addresses a post-commit review for r270325. r270325
introduced getReloc function that searches a relocation for a
given range. It always started searching from beginning of relocation
vector, so it was slower than before. Previously, we used to use
the fact that the relocations are sorted. This patch restore it.

llvm-svn: 270572

19ccffe4

Handle terminator .eh_frame when creating the index. · 820f4bb9
Rafael Espindola authored May 24, 2016
```
llvm-svn: 270568
```
820f4bb9
Fix crash in .eh_frame marker section. · bfffa94e
Rafael Espindola authored May 24, 2016
```
llvm-svn: 270563
```
bfffa94e
Simplify. Thanks to Rui for the suggestion. · 29da3e35
Rafael Espindola authored May 24, 2016
```
llvm-svn: 270555
```
29da3e35
Revert "Simplify. Thanks to Rui for the suggestion." · fe3a2f1b
Rafael Espindola authored May 24, 2016
```
This reverts commit r270551.

Sorry, I commited the wrong branch :-(

llvm-svn: 270554
```
fe3a2f1b
Simplify. Thanks to Rui for the suggestion. · dba64b8e
Rafael Espindola authored May 24, 2016
```
llvm-svn: 270551
```
dba64b8e

Inline SymbolBody::init. NFC. · 70595aae

Rui Ueyama authored May 24, 2016

I think this function was too short to be an independent function.

llvm-svn: 270534

70595aae

Do not pass the symbol table. NFC. · ace4f90c

Rui Ueyama authored May 24, 2016

Since the symbol table is a singleton class and globally accessible,
we don't need to pass it around.

llvm-svn: 270533

ace4f90c

Rename EHInputSection -> EhInputSection. · 0b9a9036
Rui Ueyama authored May 24, 2016
```
llvm-svn: 270532
```
0b9a9036
Simplify. NFC. · 151ff307
Rui Ueyama authored May 24, 2016
```
llvm-svn: 270531
```
151ff307
Make scanReloc and related functions non-member functions. · 022d8e8a
Rui Ueyama authored May 24, 2016
```
scanReloc does not depend on Writer, so it doesn't have to be
in the class.

llvm-svn: 270530
```
022d8e8a

Remove Writer::ensureBss(). · afa35a2a

Rui Ueyama authored May 24, 2016

Previously, we created a .bss section when needed. We had a function
ensureBss() for that purpose. Turned out that was error-prone
because it was easy to forget to call that function before accessing
the .bss section.

This patch always make the BSS section. The section is added to the
output when it's not empty.

llvm-svn: 270527

afa35a2a

Create a new file EhFrame.cpp and move code to read .eh_frame there. · f5febef2
Rui Ueyama authored May 24, 2016
```
llvm-svn: 270526
```
f5febef2

Reject zero-sized symbols when creating copy relocations. · 98843087

Rui Ueyama authored May 24, 2016

Copy relocations are relocations to copy data from DSOs to
executable's .bss segment at runtime. It doesn't make sense to
create such relocations for zero-sized symbols.

GNU linkers don't agree with each other. ld rejects such
relocation/symbol pair. gold don't reject that but do not create
copy relocations as well.  I took the former approach because
I don't think the latter is what user wants.

llvm-svn: 270525

98843087

Use range-based for. · b7eda28a
Rui Ueyama authored May 24, 2016
```
llvm-svn: 270523
```
b7eda28a
Make getFdeEncoding a non-member function. · 6de2e682
Rui Ueyama authored May 24, 2016
```
This function does not depend on EhOutputSection class.

llvm-svn: 270522
```
6de2e682

May 23, 2016

Remove dead code. · fa2f307c

Rui Ueyama authored May 23, 2016

The dead declarations made MSVC to warn on explicit template
instantiations of the classes.

llvm-svn: 270471

fa2f307c

Do not split mergeable sections if they are gc'ed. · b91bf1a9

Rui Ueyama authored May 23, 2016

Previously, mergeable section's constructors did more than just
setting member variables; it split section contents into small
pieces. It is not always computationally cheap task because if
the section is a mergeable string section, it needs to scan the
entire section to split them by NUL characters.

If a section would be thrown away by GC, that cost ended up
being a waste of time. It is going to be larger problem if the
section is compressed -- the whole time to uncompress it and
split it up is going to be a waste.

Luckily, we can defer section splitting after GC. We just have
to remember which offsets are in use during GC and apply that later.
This patch implements it.

Differential Revision: http://reviews.llvm.org/D20516

llvm-svn: 270455

b91bf1a9

Fix typos. · 2ab3d208
Rui Ueyama authored May 23, 2016
```
llvm-svn: 270451
```
2ab3d208