- May 27, 2016
-
-
Rui Ueyama authored
MergedInputSection::getOffset is the busiest function in LLD if string merging is enabled and input files have lots of mergeable sections. It is usually the case when creating executable with debug info, so it is pretty common. The reason why it is slow is because it has to do faily complex computations. For non-mergeable sections, section contents are contiguous in output, so in order to compute an output offset, we only have to add the output section's base address to an input offset. But for mergeable strings, section contents are split for merging, so they are not contigous. We've got to do some lookups. We used to do binary search on the list of section pieces. It is slow because I think it's hostile to branch prediction. This patch replaces it with hash table lookup. Seems it's working pretty well. Below is "perf stat -r10" output when linking clang with debug info. In this case this patch speeds up about 4%. Before: 6584.153205 task-clock (msec) # 1.001 CPUs utilized ( +- 0.09% ) 238 context-switches # 0.036 K/sec ( +- 6.59% ) 0 cpu-migrations # 0.000 K/sec ( +- 50.92% ) 1,067,675 page-faults # 0.162 M/sec ( +- 0.15% ) 18,369,931,470 cycles # 2.790 GHz ( +- 0.09% ) 9,640,680,143 stalled-cycles-frontend # 52.48% frontend cycles idle ( +- 0.18% ) <not supported> stalled-cycles-backend 21,206,747,787 instructions # 1.15 insns per cycle # 0.45 stalled cycles per insn ( +- 0.04% ) 3,817,398,032 branches # 579.786 M/sec ( +- 0.04% ) 132,787,249 branch-misses # 3.48% of all branches ( +- 0.02% ) 6.579106511 seconds time elapsed ( +- 0.09% ) After: 6312.317533 task-clock (msec) # 1.001 CPUs utilized ( +- 0.19% ) 221 context-switches # 0.035 K/sec ( +- 4.11% ) 1 cpu-migrations # 0.000 K/sec ( +- 45.21% ) 1,280,775 page-faults # 0.203 M/sec ( +- 0.37% ) 17,611,539,150 cycles # 2.790 GHz ( +- 0.19% ) 10,285,148,569 stalled-cycles-frontend # 58.40% frontend cycles idle ( +- 0.30% ) <not supported> stalled-cycles-backend 18,794,779,900 instructions # 1.07 insns per cycle # 0.55 stalled cycles per insn ( +- 0.03% ) 3,287,450,865 branches # 520.799 M/sec ( +- 0.03% ) 72,259,605 branch-misses # 2.20% of all branches ( +- 0.01% ) 6.307411828 seconds time elapsed ( +- 0.19% ) Differential Revision: http://reviews.llvm.org/D20645 llvm-svn: 270999
-
Peter Collingbourne authored
Differential Revision: http://reviews.llvm.org/D20704 llvm-svn: 270968
-
Sean Silva authored
llvm-svn: 270966
-
Sean Silva authored
llvm-svn: 270965
-
Sean Silva authored
llvm-svn: 270964
-
Sean Silva authored
We would previously accept `--threads=4`, but this option just turns on threading and does not specify a number of threads. I ran into this by accident because I was passing `--threads=<n>` but the number didn't seem to affect anything. llvm-svn: 270963
-
- May 26, 2016
-
-
Simon Atanasyan authored
MIPS .reginfo and .MIPS.options sections are consumed by the linker, and the linker produces a single output section. But it is possible that input files contain section symbol points to the corresponding input section. In case of generation a relocatable output we need to write such symbols to the output file. Fixes bug 27878. Differential Revision: http://reviews.llvm.org/D20688 llvm-svn: 270910
-
George Rimar authored
llvm-svn: 270847
-
- May 25, 2016
-
-
George Rimar authored
[ELF] - Added support for jmp/call relaxations when R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX are used. D15779 introduced basic approach to support new relaxations. This patch implements relaxations for jmp and call instructions, described in System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.8 (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, B.2 "B.2 Optimize GOTPCRELX Relocations") Differential revision: http://reviews.llvm.org/D20622 llvm-svn: 270721
-
Rui Ueyama authored
This patch makes SectionPiece class 8 bytes smaller on platforms on which pointer size is 8 bytes. Sean suggested in a post commit review for r270340 that this could make a differentce, and it actually is. Time to link clang (with debug info) improved from 6.725 seconds to 6.589 seconds or by about 2%. Differential Revision: http://reviews.llvm.org/D20613 llvm-svn: 270717
-
Rui Ueyama authored
That flag is probably too dangerous to ignore silently. llvm-svn: 270711
-
Ed Maste authored
"A zero length string indicates that no augmentation data is present." The FreeBSD/mips toolchain (GCC 4.2.1) generates .debug_frame sections containing CIE records that have an empty augmentation string. Differential Revision: http://reviews.llvm.org/D19928 llvm-svn: 270706
-
George Rimar authored
System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.8 (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, B.2 "B.2 Optimize GOTPCRELX Relocations") introduces possible relaxations for R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX. That patch implements the next relaxation: mov foo@GOTPCREL(%rip), %reg => lea foo(%rip), %reg and also opens door for implementing all other ones. Implementation was suggested by Rafael Ávila de Espíndola with few additions and testcases by myself. Differential revision: http://reviews.llvm.org/D15779 llvm-svn: 270705
-
Rui Ueyama authored
Thanks to Sean for pointing it out. llvm-svn: 270660
-
Rui Ueyama authored
llvm-svn: 270659
-
Rui Ueyama authored
llvm-svn: 270657
-
Rui Ueyama authored
llvm-svn: 270652
-
Rui Ueyama authored
llvm-svn: 270651
-
- May 24, 2016
-
-
Rui Ueyama authored
scanReloc and the functions on which scanReloc depends is in total more than 600 lines of code. Since scanReloc does not depend on Writer, it is better to move it into a separate file. Differential Revision: http://reviews.llvm.org/D20554 llvm-svn: 270606
-
Rafael Espindola authored
Thanks to Rui for the suggestion. llvm-svn: 270601
-
Rafael Espindola authored
llvm-svn: 270573
-
Rui Ueyama authored
This patch addresses a post-commit review for r270325. r270325 introduced getReloc function that searches a relocation for a given range. It always started searching from beginning of relocation vector, so it was slower than before. Previously, we used to use the fact that the relocations are sorted. This patch restore it. llvm-svn: 270572
-
Rafael Espindola authored
llvm-svn: 270568
-
Rafael Espindola authored
llvm-svn: 270563
-
Rafael Espindola authored
llvm-svn: 270555
-
Rafael Espindola authored
This reverts commit r270551. Sorry, I commited the wrong branch :-( llvm-svn: 270554
-
Rafael Espindola authored
llvm-svn: 270551
-
Rui Ueyama authored
I think this function was too short to be an independent function. llvm-svn: 270534
-
Rui Ueyama authored
Since the symbol table is a singleton class and globally accessible, we don't need to pass it around. llvm-svn: 270533
-
Rui Ueyama authored
llvm-svn: 270532
-
Rui Ueyama authored
llvm-svn: 270531
-
Rui Ueyama authored
scanReloc does not depend on Writer, so it doesn't have to be in the class. llvm-svn: 270530
-
Rui Ueyama authored
Previously, we created a .bss section when needed. We had a function ensureBss() for that purpose. Turned out that was error-prone because it was easy to forget to call that function before accessing the .bss section. This patch always make the BSS section. The section is added to the output when it's not empty. llvm-svn: 270527
-
Rui Ueyama authored
llvm-svn: 270526
-
Rui Ueyama authored
Copy relocations are relocations to copy data from DSOs to executable's .bss segment at runtime. It doesn't make sense to create such relocations for zero-sized symbols. GNU linkers don't agree with each other. ld rejects such relocation/symbol pair. gold don't reject that but do not create copy relocations as well. I took the former approach because I don't think the latter is what user wants. llvm-svn: 270525
-
Rui Ueyama authored
llvm-svn: 270523
-
Rui Ueyama authored
This function does not depend on EhOutputSection class. llvm-svn: 270522
-
- May 23, 2016
-
-
Rui Ueyama authored
The dead declarations made MSVC to warn on explicit template instantiations of the classes. llvm-svn: 270471
-
Rui Ueyama authored
Previously, mergeable section's constructors did more than just setting member variables; it split section contents into small pieces. It is not always computationally cheap task because if the section is a mergeable string section, it needs to scan the entire section to split them by NUL characters. If a section would be thrown away by GC, that cost ended up being a waste of time. It is going to be larger problem if the section is compressed -- the whole time to uncompress it and split it up is going to be a waste. Luckily, we can defer section splitting after GC. We just have to remember which offsets are in use during GC and apply that later. This patch implements it. Differential Revision: http://reviews.llvm.org/D20516 llvm-svn: 270455
-
Rui Ueyama authored
llvm-svn: 270451
-