Skip to content
  1. Dec 02, 2020
    • Amir Ayupov's avatar
      Rebase: Merge BOLT codebase in monorepo · 1c5d3a05
      Amir Ayupov authored
      Summary:
      This commit is the first step in rebasing all of BOLT
      history in the LLVM monorepo. It also solves trivial build issues
      by updating BOLT codebase to use current LLVM. There is still work
      left in rebasing some BOLT features and in making sure everything
      is working as intended.
      
      History has been rewritten to put BOLT in the /bolt folder, as
      opposed to /tools/llvm-bolt.
      
      (cherry picked from FBD33289252)
      1c5d3a05
  2. Jan 30, 2021
  3. Jan 28, 2021
  4. Jan 21, 2021
  5. Jan 20, 2021
  6. Jan 11, 2021
    • Rafael Auler's avatar
      [PERF2BOLT] Relax segment matching requirements · 0de92b83
      Rafael Auler authored
      Summary:
      When looking at perf.data's available binaries and their
      respective mmap'ed segments, match them with the input binary by
      looking at both aligned and non-aligned addresses. If we suppose
      the alignment is the mmap'ed page size, we may miss some cases and
      perf2bolt will refuse to proceed because it failed to match the
      input binary with a process recorded in perf.data.
      
      (cherry picked from FBD25732673)
      0de92b83
  7. Dec 30, 2020
    • Rafael Auler's avatar
      [BOLT] Add threshold options for lite mode · e3898d59
      Rafael Auler authored
      Summary:
      Add options for trading processing speed for binary performance.
      
        -lite-threshold-pct=<uint>
          Threshold (in percent) for selecting functions to process in lite
          mode. Higher threshold means fewer functions to process.
          E.g threshold of 90 means only top 10 percent of functions with
          profile will be processed.
      
        -lite-threshold-count=<uint>
          Similar to '-lite-threshold-pct' but specify threshold using
          absolute function call count. I.e. limit processing to functions
          executed at least the specified number of times.
      
        -no-scan
          Do not scan cold functions for external references (may result in
          slower binary).
      
      (cherry picked from FBD24739092)
      e3898d59
  8. Dec 09, 2020
  9. Dec 04, 2020
    • Rafael Auler's avatar
      [BOLT] Fix shrinkwrapping bug when changing frame alignment · d2f68039
      Rafael Auler authored
      Summary:
      This fixes a bug with shrink wrapping when trying to move
      push-pops in a function where we are not allowed to modify the
      stack layout for alignment reasons. In this bug, we failed to
      propagate alignment requirement upwards in the call graph from
      function A to B when: (1) there is a cycle in the call graph and
      (2) the distance from A to B is greater than 1 in the call graph
      and (3) there is a node in the path from A to B, not including
      A or B, that does not access parameters in the stack.
      
      (cherry picked from FBD25315977)
      d2f68039
  10. Nov 20, 2020
    • Alexander Shaposhnikov's avatar
      Inject instrumentation's global dtor on MachO · e067f2ad
      Alexander Shaposhnikov authored
      Summary:
      This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO.
      
      1/  Function "bolt_instr_fini" is placed into the predefined section "__fini"
      
      2/ In the instrumentation pass we create a symbol "bolt_instr_fini" and
      replace the last global destructor with it.
      
      This is a temporary solution, in the future we need to register bolt_instr_fini in addition to the existing destructors without dropping the last one.
      
      (cherry picked from FBD25071864)
      e067f2ad
  11. Nov 19, 2020
  12. Nov 18, 2020
  13. Nov 17, 2020
  14. Nov 16, 2020
    • Maksim Panchenko's avatar
      [BOLT] Fix data race while running split functions pass · 7eaf63a1
      Maksim Panchenko authored
      Summary:
      In BinaryContext::calculateEmittedSize(), after the temporary code
      emission, we have to perform a cleanup and mark all symbols used
      during the emission as undefined and unregistered (so that we can emit
      them again later). The cleanup is happening even for symbols that were
      referenced and not defined by emitted code.
      
      If all emitted symbols are local, there is no risk that one thread will
      define a symbol while some other thread will undefine it in its cleanup
      code. Such behavior is expected as local symbols can only be referenced
      within the containing function and each function is processed in one
      thread. However, secondary entry points have associated global symbols
      and if we emit them, then it is possible for a thread to undefine
      a symbol while the other thread had defined it and was in the process of
      emitting the fragment with it. In such case, a data race may happen and
      the thread that contains the definition of the symbol may define it
      twice causing a redefinition error.
      
      To avoid the data race, we skip the emission of secondary entry global
      symbols when emitting code used only for the size estimation.
      
      (cherry picked from FBD24986007)
      7eaf63a1
  15. Nov 14, 2020
    • Sergey Pupyrev's avatar
      a new version of hfsort+ · 1e9b7330
      Sergey Pupyrev authored
      Summary:
      A faster and better version of function reordering:
      - fixed a bug when some computed probabilities were negative;
      - changed an O(n^2) loop to a priority queue to find a candidate of chains to merge
      
      (cherry picked from FBD24571208)
      1e9b7330
  16. Nov 12, 2020
    • Amir Ayupov's avatar
      [BOLT] Support jump tables in split fragments with entries pointing back to parent functions · 6401af89
      Amir Ayupov authored
      Summary:
      Support jump tables belonging to split fragments with entries
      pointing back to parent functions.
      While skipping such families of functions, make sure to use the
      topmost fragment to ignore its fragments.
      
      (cherry picked from FBD24907438)
      6401af89
    • Amir Ayupov's avatar
      [BOLT] Add invalid offset for a JT entry pointing to a fragment · e8234b3b
      Amir Ayupov authored
      Summary:
      In a jump table identification, register an invalid offset for jump table
      entries pointing to function fragments.
      These invalid offsets have no effect other than padding the jump
      table size, calculated as `max(OffsetEntries, Entries)`.
      Correct jump table size is required in strict mode (enabled by default
      in aggregation mode by `perf2bolt`) in accounting of all PC-relative
      relocations in data.
      Functions containing these jump tables with invalid offsets are
      marked to be ignored immediately afterwards in
      `populateJumpTables`.
      
      (cherry picked from FBD24897464)
      e8234b3b
    • Amir Ayupov's avatar
      [BOLT] Debug logging in analyzeJumpTable · 157129b7
      Amir Ayupov authored
      Summary:
      Added debug logging in/around `analyzeJumpTable`:
      - Dump jump table entries as they are being processed:
      ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2)
        * Checking 0x428ff40 -> OK: real entry
        * Checking 0x428ff44 -> OK: real entry
        * Checking 0x428ff48 -> OK: real entry
        * Checking 0x428ff4c -> OK: real entry
        * Checking 0x428ff50 -> OK: real entry
        * Checking 0x428ff54 -> OK: address in split fragment
        * Checking 0x428ff58 -> OK: address in split fragment
        * Checking 0x428ff5c -> OK: address in split fragment
        * Checking 0x428ff60 -> OK: address in split fragment
        * Checking 0x428ff64 -> OK: real entry
        * Checking 0x428ff68 -> OK: real entry
        * Checking 0x428ff6c -> OK: real entry
        * Checking 0x428ff70 -> OK: real entry
      BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2)
        * Checking 0x428ff74 -> OK: real entry
        ...
      ```
      - Dump skipped functions:
      ```
      Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family
      Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2)
      Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2)
      Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family
      Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode
      Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2)
      ```
      - Dump values of unclaimed PC-relative relocations in data.
      
      (cherry picked from FBD24898172)
      157129b7
  17. Nov 11, 2020
  18. Nov 10, 2020
    • Amir Ayupov's avatar
      [BOLT] Disable DynoStats printing after SCTC · e54d3897
      Amir Ayupov authored
      Summary:
      Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets
      unset by SCTC pass. Make DynoStats collection conditional on this
      new flag.
      SCTC leaves CFG in a state where branch counters of BBs with tail
      calls/conditional tail calls are not available (except via annotations,
      which get stripped by `lower-annotations`). Without branch
      counters, DynoStats are invalid.
      
      (cherry picked from FBD24558050)
      e54d3897
  19. Nov 09, 2020
    • Amir Ayupov's avatar
      Improve cold fragment name matching · c36b7168
      Amir Ayupov authored
      Summary:
      Fix cold fragment name matching regex by replacing existing
      regexes `.*\.cold\..*` and  `.*\.cold`
      and combining them into `.*\.cold(\.\d)?`,
      applied to restored name (with BOLT-added suffixes stripped)
      
      This allows matching names like "execute_stack_op.cold/1", which
      previously weren't recognized.
      
      (cherry picked from FBD24804880)
      c36b7168
  20. Nov 06, 2020
    • Amir Ayupov's avatar
      Lost in rebase: call registerFragment with a reference to TargetBF · f86a78a4
      Amir Ayupov authored
      Summary: Fixes broken build due to a lost dereferencing
      
      (cherry picked from FBD24799948)
      f86a78a4
    • Amir Ayupov's avatar
      Conservatively handle jump tables in split functions · 2b09d672
      Amir Ayupov authored
      Summary:
      - Allow jump table entries to point to locations inside the function and its fragments.
      Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion.
      
      - Add fragment identification heuristic based on function name regex and contiguous jump table entries.
      Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table.
      Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics:
        1. Fragment is identified as such based on name (contains `.cold.` part), but
        2. Parent function is not set – no direct interprocedural references to that fragment, and
        3. Fragment has the name of the form <parent>.cold(.\d+)
      
      * For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored.
      
      (cherry picked from FBD24456904)
      2b09d672
    • Amir Ayupov's avatar
      processInterproceduralReferences: record references to cold fragments as entry points · dc48354f
      Amir Ayupov authored
      Summary:
      For interprocedural references to fragments, record them as
      fragment entry points. Not registering these entry points leads to
      UCE removing the blocks and "Undefined temporary symbol"
      assertion.
      
      (cherry picked from FBD24511281)
      dc48354f
    • Amir Ayupov's avatar
      Extract BinaryContext::registerFragment · 54522877
      Amir Ayupov authored
      Summary: registerFragment to be reused in adding fragments reachable only through jump tables.
      
      (cherry picked from FBD24656651)
      54522877
  21. Nov 05, 2020
    • Vladislav Khmelevsky's avatar
      [BOLT][PR] Handle TLS relocations on AArch64 · 58460460
      Vladislav Khmelevsky authored
      Summary:
      Some of the TLS relocatios like R_AARCH64_TLSDESC_ADR_PAGE21 must be
      handled by bolt and should not be skipped by the removed condition. Some
      of the TLS relocations like R_AARCH64_TLS_TPREL64 could really be skipped
      here, but AFAIU this condition was added as part of BOLT its self optimization, so
      to prevent future problems here my suggestion is not to add another condition
      like "isTLS(RType) && isTLSRelocatable(RType)", but just remove it since
      absense of this condition should not broke any other TLS relocation.
      Vladislav Khmelevsky,
      Advanced Software Technology Lab, Huawei
      
      Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/103
      GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com>
      
      (cherry picked from FBD24745928)
      58460460
  22. Nov 04, 2020
    • Maksim Panchenko's avatar
      [BOLT] Fix C++ exceptions for shared objects · 4f4239ce
      Maksim Panchenko authored
      Summary:
      Fix several issues to make C++ exceptions work in shared objects:
        * Set MCObjectFileInfo PIC type based on the input binary type.
        * Support indirect (DW_EH_PE_indirect) encoding while writing
          exception Type Table.
        * Use different LPStart value and landing pad encoding for .so's.
        * Disable splitting of exception-handling code for .so's because of
          the new encoding.
      
      (cherry picked from FBD24698765)
      4f4239ce
  23. Nov 03, 2020
    • Rafael Auler's avatar
      [BOLT] Remove threaded EliminateUnreachableBlock version · c1bb4dcb
      Rafael Auler authored
      Summary:
      EliminateUnreachableBlocks has a data race because it depends
      on BinaryContext::computeCodeSize. computeCodeSize supports independent
      Emitters, enabling a lock-free execution. Unfortunately, that is almost
      as expensive as the lock. Removing the boilerplate code for
      parallellization of this pass turned out to be the best alternative: no
      races and slightly better execution time for HHVM.
      
      (cherry picked from FBD24716250)
      c1bb4dcb
  24. Oct 30, 2020
    • Rafael Auler's avatar
      [BOLT] Please sanitizers · 37921b48
      Rafael Auler authored
      Summary:
      In BinaryContext, we had StringRef holding a reference to
      an r-value std::string. This triggers clang's address sanitizer
      warnings. In MCPlusBuilder we had a left shift overflowing a type,
      which is undefined behavior. Similarly, in CallGraph, we had a hash
      function shifting a negative value, which is also UB. The last two
      triggers the UB sanitizer.
      
      (cherry picked from FBD24661045)
      37921b48
    • Rafael Auler's avatar
      [DOCS] Add instrumentation instructions to README · 3e78082c
      Rafael Auler authored
      Summary: Add basic instructions on how to instrument a binary.
      
      (cherry picked from FBD24660183)
      3e78082c
  25. Oct 31, 2020
  26. Oct 23, 2020
    • Maksim Panchenko's avatar
      [BOLT] Always keep dynamic symbols defined · 6b185ccc
      Maksim Panchenko authored
      Summary:
      Some symbols in .dynsym will be erroneously marked as belonging to a
      non-allocatable section that BOLT can remove. In that case, keep the
      original invalid index for such symbols instead of setting the UNDEF
      index.
      
      (cherry picked from FBD24488677)
      6b185ccc
  27. Oct 22, 2020
    • Amir Ayupov's avatar
      Add pass number to dot dump filename · 5f2f96c4
      Amir Ayupov authored
      Summary:
      Change .dot dumps filename format from
        <function>-<passname>.dot
      to
        <function>-<passidx>_<passname>.dot
      This change helps navigate dumps by making the pass order explicit.
      Example:
        execute_stack_op.cold.6-1(*2)-00_build-cfg.dot
        execute_stack_op.cold.6-1(*2)-01_validate-internal-calls.dot
        execute_stack_op.cold.6-1(*2)-02_strip-rep-ret.dot
        ...
      
      (cherry picked from FBD24452903)
      5f2f96c4
  28. Oct 21, 2020
    • Maksim Panchenko's avatar
      [BOLT] Fix PatchEntries pass · d91add0b
      Maksim Panchenko authored
      Summary:
      While refactoring the pass, I removed the important transactional
      property of the patching process. Restore it.
      
      (cherry picked from FBD24440214)
      d91add0b
  29. Oct 18, 2020
  30. Oct 17, 2020
    • Rafael Auler's avatar
      [BOLT] Ignore __hot_start, __hot_end from input · e4396c41
      Rafael Auler authored
      Summary:
      When -hot-text is on, do not read __hot_start and __hot_end
      from input (inserted by a linker script with the intent of ordering
      functions). This can confuse BOLT into creating a function with this
      name depending on which address the symbol lands and we will assert
      when trying to emit our own __hot_start/__hot_end with symbol
      redefinition.
      
      (cherry picked from FBD24366636)
      e4396c41
Loading