Commits · e5e320de06ef90192aca13f5c2145499919edeaf · Lorenzo Albano / LLVM bpEVL

Feb 28, 2017

ELF ICF: Merge only functions. · 13ed0b69

Rui Ueyama authored Feb 28, 2017

Previously, LLD merged all read-only sections. So the following
program prints out "true" if -icf=all is specified.

  static const int foo = 1;
  static const int bar = 1;
  int main() { printf("%s\n", &foo == &bar ? "true" : "false"); }

This is somewhat counter-intuitive, and it actually caused nasty issues.
One example is https://bugs.chromium.org/p/chromium/issues/detail?id=682773#c24.

This patch changes the way how it works. Now ICF merges only functions
(i.e. executable sections).

Differential Revision: https://reviews.llvm.org/D30365

llvm-svn: 296534

13ed0b69

De-template DefinedRegular. · 80474a26
Rui Ueyama authored Feb 28, 2017
```
Differential Revision: https://reviews.llvm.org/D30348

llvm-svn: 296508
```
80474a26

Feb 27, 2017

Move SymbolTable<ELFT>::Sections out of the class. · 536a2670

Rui Ueyama authored Feb 27, 2017

The list of all input sections was defined in SymbolTable class for a
historical reason. The list itself is not a template. However, because
SymbolTable class is a template, we needed to pass around ELFT to access
the list. This patch moves the list out of the class so that it doesn't
need ELFT.

llvm-svn: 296309

536a2670

Feb 23, 2017

Make InputSection a class. NFC. · 774ea7d0

Rafael Espindola authored Feb 23, 2017

With the current design an InputSection is basically anything that
goes directly in a OutputSection. That includes plain input section
but also synthetic sections, so this should probably not be a
template.

llvm-svn: 295993

774ea7d0

Convert InputSectionBase to a class. · b4c9b81a

Rafael Espindola authored Feb 23, 2017

Removing this template is not a big win by itself, but opens the way
for removing more templates.

llvm-svn: 295923

b4c9b81a

Jan 20, 2017

ELF: Fix ICF crash on absolute symbol relocations. · dbd8d9b5

Peter Collingbourne authored Jan 20, 2017

If two sections contained relocations to absolute symbols with the same
value we would crash when trying to access their sections. Add a check that
both symbols point to sections before accessing their sections, and treat
absolute symbols as equal if their values are equal.

Differential Revision: https://reviews.llvm.org/D28935

llvm-svn: 292578

dbd8d9b5

Jan 15, 2017
- Fix typo. · c9df1725
  Rui Ueyama authored Jan 15, 2017
```
llvm-svn: 292044
```
  c9df1725
Dec 05, 2016
- Use "equivalence class" instead of "color" to describe the concept in ICF. · fcd3fa83
  Rui Ueyama authored Dec 05, 2016
```
Also add a citation to GNU gold safe ICF paper.

Differential Revision: https://reviews.llvm.org/D27398

llvm-svn: 288684
```
  fcd3fa83
- Simplify ICF alignment handling. · 5cb712ed
  Rui Ueyama authored Dec 05, 2016
```
llvm-svn: 288630
```
  5cb712ed
Dec 04, 2016

Re-implement the optimization that I removed in r288527. · 045d8281

Rui Ueyama authored Dec 04, 2016

I removed a wrong optimization for ICF in r288527. Sean Silva suggested
in a post commit review that the correct algorithm can be implemented
easily. So is this patch.

llvm-svn: 288620

045d8281

Dec 03, 2016
- Factor out common code to a header. · 244a435a
  Rui Ueyama authored Dec 03, 2016
```
llvm-svn: 288599
```
  244a435a
Dec 02, 2016

Remove a wrong performance optimization. · 5419861a

Rui Ueyama authored Dec 02, 2016

This is a hack for single thread execution. We are using Color[0] and
Color[1] alternately on each iteration. This optimization is to look
at the next slot as opposted to the current slot to get recent results
early. Turns out that the assumption is wrong, because the other slots
are not always have the most recent values, but instead it may have
stale values of the previous iteration. This patch removes that
performance hack.

llvm-svn: 288527

5419861a

Removed a wrong assertion about non-colorable sections. · 83ec681a

Rui Ueyama authored Dec 02, 2016

The assertion asserted that colorable sections can never have
a reference to non-colorable sections, but that was simply wrong.
They can have references to non-colorable sections. If that's the
case, referenced sections must be the same in terms of pointer
comparison.

llvm-svn: 288511

83ec681a

Fix the worse case performance of ICF. · 1b6bab01

Rui Ueyama authored Dec 02, 2016

r288228 seems to have regressed ICF performance in some cases in which
a lot of sections are actually mergeable. In r288228, I made a change
to create a Range object for each new color group. So every time we
split a group, we allocated and added a new group to a list of groups.

This patch essentially reverted r288228 with an improvement to
parallelize the original algorithm.

Now the ICF main loop is entirely allocation-free and lock-free.

Just like pre-r288228, we search for group boundaries by linear scan
instead of managing the information using Range class. r288228 was
neutral in performance-wise, and so is this patch.

I confirmed that this produces the exact same result as before
using chromium and clang as tests.

llvm-svn: 288480

1b6bab01

Fix undefined behavior. · 395859bd

Rui Ueyama authored Dec 02, 2016

New items can be added to Ranges here, and that invalidates
an iterater that previously pointed the end of the vector.

llvm-svn: 288443

395859bd

Dec 01, 2016

Add an assert instead of ignoring an impossible condition. · a6cd5fe4
Rui Ueyama authored Dec 01, 2016
```
llvm-svn: 288419
```
a6cd5fe4
Updates file comments and variable names. · 91ae861a
Rui Ueyama authored Dec 01, 2016
```
Use "color" instead of "group id" to describe the ICF algorithm.

llvm-svn: 288409
```
91ae861a

Parallelize ICF to make LLD's ICF really fast. · c1835319

Rui Ueyama authored Dec 01, 2016

ICF is short for Identical Code Folding. It is a size optimization to
identify two or more functions that happened to have the same contents
to merges them. It usually reduces output size by a few percent.

ICF is slow because it is computationally intensive process. I tried
to paralellize it before but failed because I couldn't make a
parallelized version produce consistent outputs. Although it didn't
create broken executables, every invocation of the linker generated
slightly different output, and I couldn't figure out why.

I think I now understand what was going on, and also came up with a
simple algorithm to fix it. So is this patch.

The result is very exciting. Chromium for example has 780,662 input
sections in which 20,774 are reducible by ICF. LLD previously took
7.980 seconds for ICF. Now it finishes in 1.065 seconds.

As a result, LLD can now link a Chromium binary (output size 1.59 GB)
in 10.28 seconds on my machine with ICF enabled. Compared to gold
which takes 40.94 seconds to do the same thing, this is an amazing
number.

From here, I'll describe what we are doing for ICF, what was the
previous problem, and what I did in this patch.

In ICF, two sections are considered identical if they have the same
section flags, section data, and relocations. Relocations are tricky,
becuase two relocations are considered the same if they have the same
relocation type, values, and if they point to the same section _in
terms of ICF_.

Here is an example. If foo and bar defined below are compiled to the
same machine instructions, ICF can (and should) merge the two,
although their relocations point to each other.

  void foo() { bar(); }
  void bar() { foo(); }

This is not an easy problem to solve.

What we are doing in LLD is some sort of coloring algorithm. We color
non-identical sections using different colors repeatedly, and sections
in the same color when the algorithm terminates are considered
identical. Here is the details:

  1. First, we color all sections using their hash values of section
  types, section contents, and numbers of relocations. At this moment,
  relocation targets are not taken into account. We just color
  sections that apparently differ in different colors.

  2. Next, for each color C, we visit sections having color C to see
  if their relocations are the same. Relocations are considered equal
  if their targets have the same color. We then recolor sections that
  have different relocation targets in new colors.

  3. If we recolor some section in step 2, relocations that were
  previously pointing to the same color targets may now be pointing to
  different colors. Therefore, repeat 2 until a convergence is
  obtained.

Step 2 is a heavy operation. For Chromium, the first iteration of step
2 takes 2.882 seconds, and the second iteration takes 1.038 seconds,
and in total it needs 23 iterations.

Parallelizing step 1 is easy because we can color each section
independently. This patch does that.

Parallelizing step 2 is tricky. We could work on each color
independently, but we cannot recolor sections in place, because it
will break the invariance that two possibly-identical sections must
have the same color at any moment.

Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2
are identical, S3 and S4 are identical, but S2 and S3 are not. Thread
A is about to recolor S1 and S2 in C'. After thread A recolor S1 in
C', but before recolor S2 in C', other thread B might observe S1 and
S2. Then thread B will conclude that S1 and S2 are different, and it
will split thread B's sections into smaller groups wrongly. Over-
splitting doesn't produce broken results, but it loses a chance to
merge some identical sections. That was the cause of indeterminism.

To fix the problem, I made sections have two colors, namely current
color and next color. At the beginning of each iteration, both colors
are the same. Each thread reads from current color and writes to next
color. In this way, we can avoid threads from reading partial
results. After each iteration, we flip current and next.

This is a very simple solution and is implemented in less than 50
lines of code.

I tested this patch with Chromium and confirmed that this parallelized
ICF produces the identical output as the non-parallelized one.

Differential Revision: https://reviews.llvm.org/D27247

llvm-svn: 288373

c1835319

Nov 30, 2016

Change how we manage groups in ICF. · 9dedfb1f

Rui Ueyama authored Nov 30, 2016

Previously, on each iteration in ICF, we scan the entire vector of
input sections to find boundaries of groups having the same ID.

This patch changes the algorithm so that we now have a vector of ranges.
Each range contains a starting index and an ending index of the group.
So we no longer have to search boundaries on each iteration.

Performance-wise, this seems neutral. Instead of searching boundaries,
we now have to maintain ranges. But I think this is more readable
than the previous implementation.

Moreover, this makes easy to parallelize the main loop of ICF,
which I'll do in a follow-up patch.

llvm-svn: 288228

9dedfb1f

Nov 21, 2016
- Update comments. · 7bed9eec
  Rui Ueyama authored Nov 20, 2016
```
llvm-svn: 287509
```
  7bed9eec
Nov 20, 2016

Use auto for obvious types. · 9f8cb730
Rui Ueyama authored Nov 20, 2016
```
llvm-svn: 287481
```
9f8cb730

Do not expose ICF class from the file. · bd1f0630

Rui Ueyama authored Nov 20, 2016

Also this patch uses file-scope functions instead of class member function.

Now that ICF class is not visible from outside, InputSection class
can no longer be "friend" of it. So I removed the friend relation
and just make it expose the features to public.

llvm-svn: 287480

bd1f0630

Refactor ICF. · e2dfbc17

Rui Ueyama authored Nov 19, 2016

In order to use forEachGroup in the final loop in ICF::run,
I changed some function parameter types.

llvm-svn: 287466

e2dfbc17

Nov 19, 2016
- Use std::equal instead of hand-written loops. · a05134e8
  Rui Ueyama authored Nov 19, 2016
```
llvm-svn: 287460
```
  a05134e8
Nov 10, 2016

Parse relocations only once. · 9f0c4bb7

Rafael Espindola authored Nov 10, 2016

Relocations are the last thing that we wore storing a raw section
pointer to and parsing on demand.

With this patch we parse it only once and store a pointer to the
actual data.

The patch also changes where we store it. It is now in
InputSectionBase. Not all sections have relocations, but most do and
this simplifies the logic. It also means that we now only support one
relocation section per section. Given that that constraint is
maintained even with -r with gold bfd and lld, I think it is OK.

llvm-svn: 286459

9f0c4bb7

Nov 09, 2016
- Add a convenience getObj method. NFC. · 77dbe9a4
  Rafael Espindola authored Nov 09, 2016
```
llvm-svn: 286370
```
  77dbe9a4
Nov 08, 2016
- Don't add null and discarded sections to the global list. · 8f9026ba
  Rafael Espindola authored Nov 08, 2016
```
Avoids having to skip them multiple times.

llvm-svn: 286261
```
  8f9026ba
Nov 05, 2016

Create a vector containing all input sections. · 8c6a5aaf

Rui Ueyama authored Nov 05, 2016

Previously, we do this piece of code to iterate over all input sections.

  for (elf::ObjectFile<ELFT> *F : Symtab.getObjectFiles())
    for (InputSectionBase<ELFT> *S : F->getSections())

It turned out that this mechanisms doesn't work well with synthetic
input sections because synthetic input sections don't belong to any
input file.

This patch defines a vector that contains all input sections including
synthetic ones.

llvm-svn: 286051

8c6a5aaf

Nov 03, 2016
- Now that the ELFFile constructor does nothing, create it when needed. · e19abab9
  Rafael Espindola authored Nov 03, 2016
```
This avoids duplicating the buffer in InputFile.

llvm-svn: 285965
```
  e19abab9
- Update for llvm change. · 454fe154
  Rafael Espindola authored Nov 03, 2016
```
llvm-svn: 285956
```
  454fe154
Oct 26, 2016
- Delete trivial getters. NFC. · 1854a8eb
  Rafael Espindola authored Oct 26, 2016
```
llvm-svn: 285190
```
  1854a8eb
Oct 25, 2016

Delete getSectionHdr. · 58139d17

Rafael Espindola authored Oct 25, 2016

We were fairly inconsistent as to what information should be accessed
with getSectionHdr and what information (like alignment) was stored
elsewhere.

Now all section info has a dedicated getter. The code is also a bit
more compact.

llvm-svn: 285079

58139d17

Sep 14, 2016

Simplify InputFile ownership management. · 38dbd3ee

Rui Ueyama authored Sep 14, 2016

Previously, all input files were owned by the symbol table.
Files were created at various places, such as the Driver, the lazy
symbols, or the bitcode compiler, and the ownership of new files
was transferred to the symbol table using std::unique_ptr.
All input files were then free'd when the symbol table is freed
which is on program exit.

I think we don't have to transfer ownership just to free all
instance at once on exit.

In this patch, all instances are automatically collected to a
vector and freed on exit. In this way, we no longer have to
use std::unique_ptr.

Differential Revision: https://reviews.llvm.org/D24493

llvm-svn: 281425

38dbd3ee

Sep 12, 2016
- Store an ArrayRef for Data in InputSectionData. · c7e1e034
  Rafael Espindola authored Sep 12, 2016
```
llvm-svn: 281210
```
  c7e1e034
Sep 08, 2016

Compute section names only once. · 042a3f20

Rafael Espindola authored Sep 08, 2016

This simplifies error handling as there is now only one place in the
code that needs to consider the possibility that the name is
corrupted. Before we would do it in every access.

llvm-svn: 280937

042a3f20

Aug 22, 2016

[ELF] ICF should respect section alignment · 901948a2

Petr Hosek authored Aug 22, 2016

When performing ICF, we have to respect the alignment requirement
of each section within each group.

Differential Revision: https://reviews.llvm.org/D23732

llvm-svn: 279456

901948a2

May 02, 2016
- Do not pass Symtab to markLive/doICF since Symtab is globally accessible. · 4f8d21f3
  Rui Ueyama authored May 02, 2016
```
llvm-svn: 268286
```
  4f8d21f3
Apr 27, 2016
- ELF: Move code to where it is used, and related cleanups. NFC. · 676c7cd1
  Peter Collingbourne authored Apr 26, 2016
```
Differential Revision: http://reviews.llvm.org/D19490

llvm-svn: 267637
```
  676c7cd1
Apr 26, 2016
- Call repl in getSymbolBody. NFC. · 6c75238a
  Rafael Espindola authored Apr 26, 2016
```
Every caller was doing it.

llvm-svn: 267603
```
  6c75238a
Apr 05, 2016
- Update for llvm change. · 0f7ccc3d
  Rafael Espindola authored Apr 05, 2016
```
llvm-svn: 265404
```
  0f7ccc3d