Commits · 2eed75926c411e14cfe4d5d4a612da092034e927 · Lorenzo Albano / LLVM bpEVL

Dec 01, 2016

Add `isRelExprOneOf` helper · 2eed7592

Sean Silva authored Dec 01, 2016

In various places in LLD's hot loops, we have expressions of the form
"E == R_FOO || E == R_BAR || ..." (E is a RelExpr).

Some of these expressions are quite long, and even though they usually go just
a very small number of ways and so should be well predicted, they can still
occupy branch predictor resources harming other parts of the code, or they
won't be predicted well if they overflow branch predictor resources or if the
branches are too dense and the branch predictor can't track them all (the
compiler can in theory avoid this, at a cost in text size). And some of these
expressions are so large and executed so frequently that even when
well-predicted they probably still have a nontrivial cost.

This speedup should be pretty portable. The cost of these simple bit tests is
independent of:

- the target we are linking for
- the distribution of RelExpr's for a given link (which can depend on how the
  input files were compiled)
- what compiler was used to compile LLD (it is just a simple bit test;
  hopefully the compiler gets it right!)
- adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra
  cost checking R_PPC_PLT_OPD on x86-64 builds)

I did some rough measurements on clang-fsds and this patch gives over about 4%
speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5%
for -O0. Sorry, I don't have my current machine set up for doing really
accurate measurements right now.

This also is just a bit cleaner. Thanks for Joerg for suggesting for
this approach.

Differential Revision: https://reviews.llvm.org/D27156

llvm-svn: 288314

2eed7592

Simplify ScriptParser. · 10091b0a

Rui Ueyama authored Dec 01, 2016

 - Rename currentBuffer -> getCurrentMB to start it with verb.
 - Simplify containsString.
 - Add llvm_unreachable at end of getCurrentMB.

llvm-svn: 288310

10091b0a

Do not name a variable Ret which is not a return value. · 3cd22d31
Rui Ueyama authored Dec 01, 2016
```
llvm-svn: 288309
```
3cd22d31
Make get{Line,Column}Number members of StringParser. · b5f1c3ec
Rui Ueyama authored Dec 01, 2016
```
This patch also renames currentLocation getCurrentLocation.

llvm-svn: 288308
```
b5f1c3ec
Split getPos into getLineNumber and getColumnNumber. · 50fb8274
Rui Ueyama authored Dec 01, 2016
```
llvm-svn: 288306
```
50fb8274

Nov 30, 2016

Change how we manage groups in ICF. · 9dedfb1f

Rui Ueyama authored Nov 30, 2016

Previously, on each iteration in ICF, we scan the entire vector of
input sections to find boundaries of groups having the same ID.

This patch changes the algorithm so that we now have a vector of ranges.
Each range contains a starting index and an ending index of the group.
So we no longer have to search boundaries on each iteration.

Performance-wise, this seems neutral. Instead of searching boundaries,
we now have to maintain ranges. But I think this is more readable
than the previous implementation.

Moreover, this makes easy to parallelize the main loop of ICF,
which I'll do in a follow-up patch.

llvm-svn: 288228

9dedfb1f

Nov 29, 2016

Use StringRefZ explicitly instead of const char *. · 84e65a7c

Rui Ueyama authored Nov 29, 2016

This patch is to avoid an implicit conversion from const char * to
StringRefZ, to make it apparent where we are using StringRefZ.

llvm-svn: 288182

84e65a7c

Introduce StringRefZ class to represent null-terminated strings. · a13efc2a

Rui Ueyama authored Nov 29, 2016

StringRefZ is a class to represent a null-terminated string. String
length is computed lazily, so it's more efficient than StringRef to
represent strings in string table.

The motivation of defining this new class is to merge functions
that only differ in string types; we have many constructors that takes
`const char *` or `StringRef`. With StringRefZ, we can merge them.

Differential Revision: https://reviews.llvm.org/D27037

llvm-svn: 288172

a13efc2a

[ELF] Add support for static TLS to ARM · de3e7388

Peter Smith authored Nov 29, 2016

The module index dynamic relocation R_ARM_DTPMOD32 is always 1 for an
executable. When static linking and when we know that we are not a shared
object we can resolve the module index relocation statically.
    
The logic in handleNoRelaxTlsRelocation remains the same for Mips as it
has its own custom GOT writing code. For ARM we add the module index
relocation to the GOT when it can be resolved statically.
    
In addition the type of the RelExpr for the static resolution of TlsGotRel
should be R_TLS and not R_ABS as we need to include the size of
the thread control block in the calculation.
    
Addresses the TLS part of PR30218.

Differential revision: https://reviews.llvm.org/D27213

llvm-svn: 288153

de3e7388

[ELF] - Disable emiting multiple output sections when merging is disabled. · 9b3ae73f

George Rimar authored Nov 29, 2016

When -O0 is specified, we do not do section merging.
Though before this patch several sections were generated instead
of single, what is useless.

Differential revision: https://reviews.llvm.org/D27041

llvm-svn: 288151

9b3ae73f

[ELF] - Add support of proccessing of the rest allocatable synthetic sections from linkerscript. · 3fb5a6dc

George Rimar authored Nov 29, 2016

This change continues what was started by D27040
Now all allocatable synthetics should be available from script side.

Differential revision: https://reviews.llvm.org/D27131

llvm-svn: 288150

3fb5a6dc

[ELF][MIPS] Make stable an order of GOT page address entries · 160bf723
Simon Atanasyan authored Nov 29, 2016
```
llvm-svn: 288137
```
160bf723
[ELF][MIPS] Restore Config->Threads for MIPS targets · 9705ff74
Simon Atanasyan authored Nov 29, 2016
```
llvm-svn: 288130
```
9705ff74

[ELF][MIPS] Do not change MipsGotSection state in the getPageEntryOffset method · 9fae3b8a

Simon Atanasyan authored Nov 29, 2016

The MipsGotSection::getPageEntryOffset calculates index of GOT entry
with a "page" address. Previously this method changes the state
of MipsGotSection because it modifies PageIndexMap field. That leads
to the unpredictable results if getPageEntryOffset called from multiple threads.

The patch makes getPageEntryOffset constant. To do so it calculates GOT
entry index but does not update PageIndexMap field. Later in the
MipsGotSection::writeTo method linker calculates "page" addresses and
writes them to the output.

llvm-svn: 288129

9fae3b8a

[ELF][MIPS] Replace the magic number of GOT header entries by constant. NFC · a0efc426
Simon Atanasyan authored Nov 29, 2016
```
llvm-svn: 288128
```
a0efc426

[ELF][MIPS] Fix calculation of GOT "page address" entries number · 80f3d9ce

Simon Atanasyan authored Nov 29, 2016

If output section which referenced by R_MIPS_GOT_PAGE or R_MIPS_GOT16
relocations is small (less that 0x10000 bytes) and occupies two adjacent
0xffff-bytes pages, current formula gives incorrect number of required "page"
GOT entries. The problem is that in time of calculation we do not know
the section address and so we cannot calculate number of 0xffff-bytes
pages exactly.

This patch fix the formula. Now it gives a correct number of pages in
the worst case when "small" section intersects 0xffff-bytes page
boundary. From the other side, sometimes it adds one more redundant GOT
entry for each output section. But usually number of output sections
referenced by GOT relocations is small.

llvm-svn: 288127

80f3d9ce

[ELF] - Implemented -N (-omagic) command line option. · 595a763f

George Rimar authored Nov 29, 2016

-N (-omagic)
  Set the text and data sections to be readable and writable. 
  Also, do not page-align the data segment.

Differential revision: https://reviews.llvm.org/D26888

llvm-svn: 288123

595a763f

[ELF] Refactor target error messages · 84569e6c
Eugene Leviant authored Nov 29, 2016
```
Differential revision: https://reviews.llvm.org/D27097

llvm-svn: 288114
```
84569e6c

Use relocations to fill statically known got entries. · f1e24531

Rafael Espindola authored Nov 29, 2016

Right now we just remember a SymbolBody for each got entry and
duplicate a bit of logic to decide what value, if any, should be
written for that SymbolBody.

With ARM there will be more complicated values, and it seems better to
just use the relocation code to fill the got entries. This makes it
clear that each entry is filled by the dynamic linker or by the static
linker.

llvm-svn: 288107

f1e24531

Sort. NFC. · d3b32df3
Rafael Espindola authored Nov 29, 2016
```
llvm-svn: 288102
```
d3b32df3

Nov 28, 2016

[ELF] - Do not put non exec sections first when -no-rosegment · 1642c5d8

George Rimar authored Nov 28, 2016

That unifies handling cases when we have SECTIONS and when
-no-rosegment is given in compareSectionsNonScript()

Now Config->SingleRoRx is used for check, testcase is provided.

llvm-svn: 288022

1642c5d8

[ELF] - Set Config->SingleRoRx differently. NFC. · 18a30962

George Rimar authored Nov 28, 2016

Previously Config->SingleRoRx was set in
createFiles() and used HasSections.

This change moves it to readConfigs at place of
common flags handling, and adds logic that sets
this flag separatelly from ScriptParser if SECTIONS present.

llvm-svn: 288021

18a30962

[ELF] - Implemented -no-rosegment. · 63bf0110

George Rimar authored Nov 28, 2016

--no-rosegment: Do not put read-only non-executable sections in their own segment

Differential revision: https://reviews.llvm.org/D26889

llvm-svn: 288020

63bf0110

[ELF] Print file:line for 'undefined section' errors · ed30ce7a
Eugene Leviant authored Nov 28, 2016
```
Differential revision: https://reviews.llvm.org/D27108

llvm-svn: 288019
```
ed30ce7a

Always create a PT_ARM_EXIDX if needed. · 8e67000f

Rafael Espindola authored Nov 28, 2016

Unfortunatelly PT_ARM_EXIDX is special. There is no way to create it
from linker scripts, so we have to create it even if PHDRS is used.

This matches bfd and is required for the lld output to survive bfd's strip.

llvm-svn: 288012

8e67000f

Nov 27, 2016

Add paralell_for and use it where appropriate. · 1dd86a66

Rui Ueyama authored Nov 27, 2016

When we iterate over numbers as opposed to iterable elements,
parallel_for fits better than parallel_for_each.

llvm-svn: 288002

1dd86a66

Also skip regular symbol assignment at the start of a script. · 5fcc99c2

Rafael Espindola authored Nov 27, 2016

Unfortunatelly some scripts look like

kernphys = ...
. = ....

and the expectation in that every orphan section is after the
assignment.

llvm-svn: 287996

5fcc99c2

Don't put an orphan before the first . assignment. · 7fe4ec9b

Rafael Espindola authored Nov 27, 2016

This is an horrible special case, but seems to match bfd's behaviour
and is important for avoiding placing an orphan section before the
expected start of the file.

llvm-svn: 287994

7fe4ec9b

Nov 26, 2016
- Change return types of split{Non,}Strings. · e8a077ba
  Rui Ueyama authored Nov 26, 2016
```
They return new vectors, but at the same time they mutate other vectors,
so returning values doesn't make much sense. We should just mutate two
vectors.

llvm-svn: 287979
```
  e8a077ba
- Make getColorDiagnostics return a boolean value instead of an enum. · 72b1ee25
  Rui Ueyama authored Nov 26, 2016
```
Config->ColorDiagnostics was of type enum before. Now it is just a
boolean flag. Thanks Rafael for suggestion.

llvm-svn: 287978
```
  72b1ee25
- Split MergeOutputSection::finalize. · 1880bbed
  Rui Ueyama authored Nov 26, 2016
```
llvm-svn: 287977
```
  1880bbed
- Create sections with just assignments as STT_NOBITS. · f93b8c29
  Rafael Espindola authored Nov 26, 2016
```
This matches the behaviour of bfd ld. Using 0 was causing problems
with strip, which would remove these sections.

llvm-svn: 287969
```
  f93b8c29
- [ELF] Be compliant with LLVM and rename Lto into LTO. NFCI. · 3bfa081a
  Davide Italiano authored Nov 26, 2016
```
llvm-svn: 287967
```
  3bfa081a
Nov 25, 2016

Fix typo. · 1df93169
Rui Ueyama authored Nov 25, 2016
```
llvm-svn: 287951
```
1df93169
Do not print out ARGV0 in white because it's unreadable on white background. · c01321c6
Rui Ueyama authored Nov 25, 2016
```
llvm-svn: 287950
```
c01321c6

Support -color-diagnostics={auto,always,never}. · 8c8818a5

Rui Ueyama authored Nov 25, 2016

-color-diagnostics=auto is default because that's the same as
Clang's default. When color is enabled, error or warning messages
are colored like this.

  error:
  <bold>ld.lld</bold> <red>error:</red> foo.o: no such file

  warning:
  <bold>ld.lld</bold> <magenta>warning:</magenta> foo.o: no such file

Differential Revision: https://reviews.llvm.org/D27117

llvm-svn: 287949

8c8818a5

We shouldn't call parallle_for_each if -no-thread is given. · 60666414
Rui Ueyama authored Nov 25, 2016
```
llvm-svn: 287948
```
60666414

Parallelize uncompress() and splitIntoPieces(). · 2555952b

Rui Ueyama authored Nov 25, 2016

Uncompressing section contents and spliting mergeable section contents
into smaller chunks are heavy tasks. They scan entire section contents
and do CPU-intensive tasks such as uncompressing zlib-compressed data
or computing a hash value for each section piece.

Luckily, these tasks are independent to each other, so we can do that
in parallel_for_each. The number of input sections is large (as opposed
to the number of output sections), so there's a large parallelism here.

Actually the current design to call uncompress() and splitIntoPieces()
in batch was chosen with doing this in mind. Basically what we need to
do here is to replace `for` with `parallel_for_each`.

It seems this patch improves latency significantly if linked programs
contain debug info (which in turn contain lots of mergeable strings.)
For example, the latency to link Clang (debug build) improved by 20% on
my machine as shown below. Note that ld.gold took 19.2 seconds to do
the same thing.

Before:
    30801.782712 task-clock (msec)         #    3.652 CPUs utilized            ( +-  2.59% )
         104,084 context-switches          #    0.003 M/sec                    ( +-  1.02% )
           5,063 cpu-migrations            #    0.164 K/sec                    ( +- 13.66% )
       2,528,130 page-faults               #    0.082 M/sec                    ( +-  0.47% )
  85,317,809,130 cycles                    #    2.770 GHz                      ( +-  2.62% )
  67,352,463,373 stalled-cycles-frontend   #   78.94% frontend cycles idle     ( +-  3.06% )
 <not supported> stalled-cycles-backend
  44,295,945,493 instructions              #    0.52  insns per cycle
                                           #    1.52  stalled cycles per insn  ( +-  0.44% )
   8,572,384,877 branches                  #  278.308 M/sec                    ( +-  0.66% )
     141,806,726 branch-misses             #    1.65% of all branches          ( +-  0.13% )

     8.433424003 seconds time elapsed                                          ( +-  1.20% )

After:
    35523.764575 task-clock (msec)         #    5.265 CPUs utilized            ( +-  2.67% )
         159,107 context-switches          #    0.004 M/sec                    ( +-  0.48% )
           8,123 cpu-migrations            #    0.229 K/sec                    ( +- 23.34% )
       2,372,483 page-faults               #    0.067 M/sec                    ( +-  0.36% )
  98,395,342,152 cycles                    #    2.770 GHz                      ( +-  2.62% )
  79,294,670,125 stalled-cycles-frontend   #   80.59% frontend cycles idle     ( +-  3.03% )
 <not supported> stalled-cycles-backend
  46,274,151,813 instructions              #    0.47  insns per cycle
                                           #    1.71  stalled cycles per insn  ( +-  0.47% )
   8,987,621,670 branches                  #  253.003 M/sec                    ( +-  0.60% )
     148,900,624 branch-misses             #    1.66% of all branches          ( +-  0.27% )

     6.747548004 seconds time elapsed                                          ( +-  0.40% )

llvm-svn: 287946

2555952b

Move typedefs inside a class definition. · 623b36e3
Rui Ueyama authored Nov 25, 2016
```
llvm-svn: 287945
```
623b36e3
Remove a parameter from ScriptParser. · 22375f24
Rui Ueyama authored Nov 25, 2016
```
llvm-svn: 287944
```
22375f24