- Dec 01, 2016
-
-
Sean Silva authored
In various places in LLD's hot loops, we have expressions of the form "E == R_FOO || E == R_BAR || ..." (E is a RelExpr). Some of these expressions are quite long, and even though they usually go just a very small number of ways and so should be well predicted, they can still occupy branch predictor resources harming other parts of the code, or they won't be predicted well if they overflow branch predictor resources or if the branches are too dense and the branch predictor can't track them all (the compiler can in theory avoid this, at a cost in text size). And some of these expressions are so large and executed so frequently that even when well-predicted they probably still have a nontrivial cost. This speedup should be pretty portable. The cost of these simple bit tests is independent of: - the target we are linking for - the distribution of RelExpr's for a given link (which can depend on how the input files were compiled) - what compiler was used to compile LLD (it is just a simple bit test; hopefully the compiler gets it right!) - adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra cost checking R_PPC_PLT_OPD on x86-64 builds) I did some rough measurements on clang-fsds and this patch gives over about 4% speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5% for -O0. Sorry, I don't have my current machine set up for doing really accurate measurements right now. This also is just a bit cleaner. Thanks for Joerg for suggesting for this approach. Differential Revision: https://reviews.llvm.org/D27156 llvm-svn: 288314
-
Rui Ueyama authored
- Rename currentBuffer -> getCurrentMB to start it with verb. - Simplify containsString. - Add llvm_unreachable at end of getCurrentMB. llvm-svn: 288310
-
Rui Ueyama authored
llvm-svn: 288309
-
Rui Ueyama authored
This patch also renames currentLocation getCurrentLocation. llvm-svn: 288308
-
Rui Ueyama authored
llvm-svn: 288306
-
- Nov 30, 2016
-
-
Rui Ueyama authored
Previously, on each iteration in ICF, we scan the entire vector of input sections to find boundaries of groups having the same ID. This patch changes the algorithm so that we now have a vector of ranges. Each range contains a starting index and an ending index of the group. So we no longer have to search boundaries on each iteration. Performance-wise, this seems neutral. Instead of searching boundaries, we now have to maintain ranges. But I think this is more readable than the previous implementation. Moreover, this makes easy to parallelize the main loop of ICF, which I'll do in a follow-up patch. llvm-svn: 288228
-
- Nov 29, 2016
-
-
Rui Ueyama authored
This patch is to avoid an implicit conversion from const char * to StringRefZ, to make it apparent where we are using StringRefZ. llvm-svn: 288182
-
Rui Ueyama authored
StringRefZ is a class to represent a null-terminated string. String length is computed lazily, so it's more efficient than StringRef to represent strings in string table. The motivation of defining this new class is to merge functions that only differ in string types; we have many constructors that takes `const char *` or `StringRef`. With StringRefZ, we can merge them. Differential Revision: https://reviews.llvm.org/D27037 llvm-svn: 288172
-
Peter Smith authored
The module index dynamic relocation R_ARM_DTPMOD32 is always 1 for an executable. When static linking and when we know that we are not a shared object we can resolve the module index relocation statically. The logic in handleNoRelaxTlsRelocation remains the same for Mips as it has its own custom GOT writing code. For ARM we add the module index relocation to the GOT when it can be resolved statically. In addition the type of the RelExpr for the static resolution of TlsGotRel should be R_TLS and not R_ABS as we need to include the size of the thread control block in the calculation. Addresses the TLS part of PR30218. Differential revision: https://reviews.llvm.org/D27213 llvm-svn: 288153
-
George Rimar authored
When -O0 is specified, we do not do section merging. Though before this patch several sections were generated instead of single, what is useless. Differential revision: https://reviews.llvm.org/D27041 llvm-svn: 288151
-
George Rimar authored
This change continues what was started by D27040 Now all allocatable synthetics should be available from script side. Differential revision: https://reviews.llvm.org/D27131 llvm-svn: 288150
-
Simon Atanasyan authored
llvm-svn: 288137
-
Simon Atanasyan authored
llvm-svn: 288130
-
Simon Atanasyan authored
The MipsGotSection::getPageEntryOffset calculates index of GOT entry with a "page" address. Previously this method changes the state of MipsGotSection because it modifies PageIndexMap field. That leads to the unpredictable results if getPageEntryOffset called from multiple threads. The patch makes getPageEntryOffset constant. To do so it calculates GOT entry index but does not update PageIndexMap field. Later in the MipsGotSection::writeTo method linker calculates "page" addresses and writes them to the output. llvm-svn: 288129
-
Simon Atanasyan authored
llvm-svn: 288128
-
Simon Atanasyan authored
If output section which referenced by R_MIPS_GOT_PAGE or R_MIPS_GOT16 relocations is small (less that 0x10000 bytes) and occupies two adjacent 0xffff-bytes pages, current formula gives incorrect number of required "page" GOT entries. The problem is that in time of calculation we do not know the section address and so we cannot calculate number of 0xffff-bytes pages exactly. This patch fix the formula. Now it gives a correct number of pages in the worst case when "small" section intersects 0xffff-bytes page boundary. From the other side, sometimes it adds one more redundant GOT entry for each output section. But usually number of output sections referenced by GOT relocations is small. llvm-svn: 288127
-
George Rimar authored
-N (-omagic) Set the text and data sections to be readable and writable. Also, do not page-align the data segment. Differential revision: https://reviews.llvm.org/D26888 llvm-svn: 288123
-
Eugene Leviant authored
Differential revision: https://reviews.llvm.org/D27097 llvm-svn: 288114
-
Rafael Espindola authored
Right now we just remember a SymbolBody for each got entry and duplicate a bit of logic to decide what value, if any, should be written for that SymbolBody. With ARM there will be more complicated values, and it seems better to just use the relocation code to fill the got entries. This makes it clear that each entry is filled by the dynamic linker or by the static linker. llvm-svn: 288107
-
Rafael Espindola authored
llvm-svn: 288102
-
- Nov 28, 2016
-
-
George Rimar authored
That unifies handling cases when we have SECTIONS and when -no-rosegment is given in compareSectionsNonScript() Now Config->SingleRoRx is used for check, testcase is provided. llvm-svn: 288022
-
George Rimar authored
Previously Config->SingleRoRx was set in createFiles() and used HasSections. This change moves it to readConfigs at place of common flags handling, and adds logic that sets this flag separatelly from ScriptParser if SECTIONS present. llvm-svn: 288021
-
George Rimar authored
--no-rosegment: Do not put read-only non-executable sections in their own segment Differential revision: https://reviews.llvm.org/D26889 llvm-svn: 288020
-
Eugene Leviant authored
Differential revision: https://reviews.llvm.org/D27108 llvm-svn: 288019
-
Rafael Espindola authored
Unfortunatelly PT_ARM_EXIDX is special. There is no way to create it from linker scripts, so we have to create it even if PHDRS is used. This matches bfd and is required for the lld output to survive bfd's strip. llvm-svn: 288012
-
- Nov 27, 2016
-
-
Rui Ueyama authored
When we iterate over numbers as opposed to iterable elements, parallel_for fits better than parallel_for_each. llvm-svn: 288002
-
Rafael Espindola authored
Unfortunatelly some scripts look like kernphys = ... . = .... and the expectation in that every orphan section is after the assignment. llvm-svn: 287996
-
Rafael Espindola authored
This is an horrible special case, but seems to match bfd's behaviour and is important for avoiding placing an orphan section before the expected start of the file. llvm-svn: 287994
-
- Nov 26, 2016
-
-
Rui Ueyama authored
They return new vectors, but at the same time they mutate other vectors, so returning values doesn't make much sense. We should just mutate two vectors. llvm-svn: 287979
-
Rui Ueyama authored
Config->ColorDiagnostics was of type enum before. Now it is just a boolean flag. Thanks Rafael for suggestion. llvm-svn: 287978
-
Rui Ueyama authored
llvm-svn: 287977
-
Rafael Espindola authored
This matches the behaviour of bfd ld. Using 0 was causing problems with strip, which would remove these sections. llvm-svn: 287969
-
Davide Italiano authored
llvm-svn: 287967
-
- Nov 25, 2016
-
-
Rui Ueyama authored
llvm-svn: 287951
-
Rui Ueyama authored
llvm-svn: 287950
-
Rui Ueyama authored
-color-diagnostics=auto is default because that's the same as Clang's default. When color is enabled, error or warning messages are colored like this. error: <bold>ld.lld</bold> <red>error:</red> foo.o: no such file warning: <bold>ld.lld</bold> <magenta>warning:</magenta> foo.o: no such file Differential Revision: https://reviews.llvm.org/D27117 llvm-svn: 287949
-
Rui Ueyama authored
llvm-svn: 287948
-
Rui Ueyama authored
Uncompressing section contents and spliting mergeable section contents into smaller chunks are heavy tasks. They scan entire section contents and do CPU-intensive tasks such as uncompressing zlib-compressed data or computing a hash value for each section piece. Luckily, these tasks are independent to each other, so we can do that in parallel_for_each. The number of input sections is large (as opposed to the number of output sections), so there's a large parallelism here. Actually the current design to call uncompress() and splitIntoPieces() in batch was chosen with doing this in mind. Basically what we need to do here is to replace `for` with `parallel_for_each`. It seems this patch improves latency significantly if linked programs contain debug info (which in turn contain lots of mergeable strings.) For example, the latency to link Clang (debug build) improved by 20% on my machine as shown below. Note that ld.gold took 19.2 seconds to do the same thing. Before: 30801.782712 task-clock (msec) # 3.652 CPUs utilized ( +- 2.59% ) 104,084 context-switches # 0.003 M/sec ( +- 1.02% ) 5,063 cpu-migrations # 0.164 K/sec ( +- 13.66% ) 2,528,130 page-faults # 0.082 M/sec ( +- 0.47% ) 85,317,809,130 cycles # 2.770 GHz ( +- 2.62% ) 67,352,463,373 stalled-cycles-frontend # 78.94% frontend cycles idle ( +- 3.06% ) <not supported> stalled-cycles-backend 44,295,945,493 instructions # 0.52 insns per cycle # 1.52 stalled cycles per insn ( +- 0.44% ) 8,572,384,877 branches # 278.308 M/sec ( +- 0.66% ) 141,806,726 branch-misses # 1.65% of all branches ( +- 0.13% ) 8.433424003 seconds time elapsed ( +- 1.20% ) After: 35523.764575 task-clock (msec) # 5.265 CPUs utilized ( +- 2.67% ) 159,107 context-switches # 0.004 M/sec ( +- 0.48% ) 8,123 cpu-migrations # 0.229 K/sec ( +- 23.34% ) 2,372,483 page-faults # 0.067 M/sec ( +- 0.36% ) 98,395,342,152 cycles # 2.770 GHz ( +- 2.62% ) 79,294,670,125 stalled-cycles-frontend # 80.59% frontend cycles idle ( +- 3.03% ) <not supported> stalled-cycles-backend 46,274,151,813 instructions # 0.47 insns per cycle # 1.71 stalled cycles per insn ( +- 0.47% ) 8,987,621,670 branches # 253.003 M/sec ( +- 0.60% ) 148,900,624 branch-misses # 1.66% of all branches ( +- 0.27% ) 6.747548004 seconds time elapsed ( +- 0.40% ) llvm-svn: 287946
-
Rui Ueyama authored
llvm-svn: 287945
-
Rui Ueyama authored
llvm-svn: 287944
-