- Dec 02, 2020
-
-
Amir Ayupov authored
Summary: This commit is the first step in rebasing all of BOLT history in the LLVM monorepo. It also solves trivial build issues by updating BOLT codebase to use current LLVM. There is still work left in rebasing some BOLT features and in making sure everything is working as intended. History has been rewritten to put BOLT in the /bolt folder, as opposed to /tools/llvm-bolt. (cherry picked from FBD33289252)
-
- Jan 30, 2021
-
-
Alexander Shaposhnikov authored
Summary: Add support for reading profile on Mach-O. (cherry picked from FBD25777049)
-
- Jan 28, 2021
-
-
Alexander Shaposhnikov authored
Summary: Add support for dumping profile on MacOS. (cherry picked from FBD25751363)
-
Alexander Shaposhnikov authored
Summary: Add support for dumping counters on MacOS (cherry picked from FBD25750516)
-
Alexander Shaposhnikov authored
Summary: 1. Add support for __literal16 section in the instrumentation runtime library for MacOS. 2. Fix emitting __counters section. (cherry picked from FBD25746342)
-
Sergey Pupyrev authored
Summary: a few minor updates in block reordering: - some refactoring to improve readability; - optimized chain splitting strategy to improve quality of layout and performance of the algorithm. (cherry picked from FBD25126220)
-
- Jan 21, 2021
-
-
Alexander Shaposhnikov authored
Summary: Enable intToStr et al. in the runtime library for MacOS. (cherry picked from FBD25745358)
-
- Jan 20, 2021
-
-
Alexander Shaposhnikov authored
Summary: Use size_t for the first parameter of operator new. https://en.cppreference.com/w/cpp/memory/new/operator_new (cherry picked from FBD25750921)
-
Amir Ayupov authored
(cherry picked from FBD25966797)
-
- Jan 11, 2021
-
-
Rafael Auler authored
Summary: When looking at perf.data's available binaries and their respective mmap'ed segments, match them with the input binary by looking at both aligned and non-aligned addresses. If we suppose the alignment is the mmap'ed page size, we may miss some cases and perf2bolt will refuse to proceed because it failed to match the input binary with a process recorded in perf.data. (cherry picked from FBD25732673)
-
- Dec 30, 2020
-
-
Rafael Auler authored
Summary: Add options for trading processing speed for binary performance. -lite-threshold-pct=<uint> Threshold (in percent) for selecting functions to process in lite mode. Higher threshold means fewer functions to process. E.g threshold of 90 means only top 10 percent of functions with profile will be processed. -lite-threshold-count=<uint> Similar to '-lite-threshold-pct' but specify threshold using absolute function call count. I.e. limit processing to functions executed at least the specified number of times. -no-scan Do not scan cold functions for external references (may result in slower binary). (cherry picked from FBD24739092)
-
- Dec 09, 2020
-
-
Rafael Auler authored
Summary: Test mistakenly used -debug output, which makes it fail on no-asserts build. (cherry picked from FBD25399449)
-
- Dec 04, 2020
-
-
Rafael Auler authored
Summary: This fixes a bug with shrink wrapping when trying to move push-pops in a function where we are not allowed to modify the stack layout for alignment reasons. In this bug, we failed to propagate alignment requirement upwards in the call graph from function A to B when: (1) there is a cycle in the call graph and (2) the distance from A to B is greater than 1 in the call graph and (3) there is a node in the path from A to B, not including A or B, that does not access parameters in the stack. (cherry picked from FBD25315977)
-
- Nov 20, 2020
-
-
Alexander Shaposhnikov authored
Summary: This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO. 1/ Function "bolt_instr_fini" is placed into the predefined section "__fini" 2/ In the instrumentation pass we create a symbol "bolt_instr_fini" and replace the last global destructor with it. This is a temporary solution, in the future we need to register bolt_instr_fini in addition to the existing destructors without dropping the last one. (cherry picked from FBD25071864)
-
- Nov 19, 2020
-
-
Alexander Shaposhnikov authored
Summary: Refactor syscall wrappers for OSX. (cherry picked from FBD25084642)
-
- Nov 18, 2020
-
-
Amir Ayupov authored
Summary: Fix corner case of insertion of updated CFI with unset `PrevBB`. Handle it in the same way as inserting past hot-cold split point. (cherry picked from FBD24943911)
-
- Nov 17, 2020
-
-
Alexander Shaposhnikov authored
Summary: Link the instrumentation runtime on OSX. (cherry picked from FBD24390019)
-
- Nov 16, 2020
-
-
Maksim Panchenko authored
Summary: In BinaryContext::calculateEmittedSize(), after the temporary code emission, we have to perform a cleanup and mark all symbols used during the emission as undefined and unregistered (so that we can emit them again later). The cleanup is happening even for symbols that were referenced and not defined by emitted code. If all emitted symbols are local, there is no risk that one thread will define a symbol while some other thread will undefine it in its cleanup code. Such behavior is expected as local symbols can only be referenced within the containing function and each function is processed in one thread. However, secondary entry points have associated global symbols and if we emit them, then it is possible for a thread to undefine a symbol while the other thread had defined it and was in the process of emitting the fragment with it. In such case, a data race may happen and the thread that contains the definition of the symbol may define it twice causing a redefinition error. To avoid the data race, we skip the emission of secondary entry global symbols when emitting code used only for the size estimation. (cherry picked from FBD24986007)
-
- Nov 14, 2020
-
-
Sergey Pupyrev authored
Summary: A faster and better version of function reordering: - fixed a bug when some computed probabilities were negative; - changed an O(n^2) loop to a priority queue to find a candidate of chains to merge (cherry picked from FBD24571208)
-
- Nov 12, 2020
-
-
Amir Ayupov authored
Summary: Support jump tables belonging to split fragments with entries pointing back to parent functions. While skipping such families of functions, make sure to use the topmost fragment to ignore its fragments. (cherry picked from FBD24907438)
-
Amir Ayupov authored
Summary: In a jump table identification, register an invalid offset for jump table entries pointing to function fragments. These invalid offsets have no effect other than padding the jump table size, calculated as `max(OffsetEntries, Entries)`. Correct jump table size is required in strict mode (enabled by default in aggregation mode by `perf2bolt`) in accounting of all PC-relative relocations in data. Functions containing these jump tables with invalid offsets are marked to be ignored immediately afterwards in `populateJumpTables`. (cherry picked from FBD24897464)
-
Amir Ayupov authored
Summary: Added debug logging in/around `analyzeJumpTable`: - Dump jump table entries as they are being processed: ```BOLT-DEBUG: analyzeJumpTable in read_encoded_value_with_base/2(*2) * Checking 0x428ff40 -> OK: real entry * Checking 0x428ff44 -> OK: real entry * Checking 0x428ff48 -> OK: real entry * Checking 0x428ff4c -> OK: real entry * Checking 0x428ff50 -> OK: real entry * Checking 0x428ff54 -> OK: address in split fragment * Checking 0x428ff58 -> OK: address in split fragment * Checking 0x428ff5c -> OK: address in split fragment * Checking 0x428ff60 -> OK: address in split fragment * Checking 0x428ff64 -> OK: real entry * Checking 0x428ff68 -> OK: real entry * Checking 0x428ff6c -> OK: real entry * Checking 0x428ff70 -> OK: real entry BOLT-DEBUG: analyzeJumpTable in classify_object_over_fdes/1(*2) * Checking 0x428ff74 -> OK: real entry ... ``` - Dump skipped functions: ``` Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2/1(*2) Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.part.2.cold.3/1(*2) Skipping _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode family Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode Ignoring _ZNK6icu_676number4impl12RoundingImpl5applyERNS1_15DecimalQuantityER10UErrorCode.cold.4/1(*2) ``` - Dump values of unclaimed PC-relative relocations in data. (cherry picked from FBD24898172)
-
- Nov 11, 2020
-
-
Amir Ayupov authored
Summary: Minimized test case while preserving the CFG subgraph with an issue (cherry picked from FBD24871063)
-
- Nov 10, 2020
-
-
Amir Ayupov authored
Summary: Introduce new BinaryFunction flag `IsCanonicalCFG`, which gets unset by SCTC pass. Make DynoStats collection conditional on this new flag. SCTC leaves CFG in a state where branch counters of BBs with tail calls/conditional tail calls are not available (except via annotations, which get stripped by `lower-annotations`). Without branch counters, DynoStats are invalid. (cherry picked from FBD24558050)
-
- Nov 09, 2020
-
-
Amir Ayupov authored
Summary: Fix cold fragment name matching regex by replacing existing regexes `.*\.cold\..*` and `.*\.cold` and combining them into `.*\.cold(\.\d)?`, applied to restored name (with BOLT-added suffixes stripped) This allows matching names like "execute_stack_op.cold/1", which previously weren't recognized. (cherry picked from FBD24804880)
-
- Nov 06, 2020
-
-
Amir Ayupov authored
Summary: Fixes broken build due to a lost dereferencing (cherry picked from FBD24799948)
-
Amir Ayupov authored
Summary: - Allow jump table entries to point to locations inside the function and its fragments. Reasoning behind this is that jump table identification has the logic of stopping at entry which belongs to a function different from the one originally referencing jump table. This assumption is invalid for jump tables with entries pointing to both parent function and cold fragments, leading to "unclaimed PC-relative relocations" assertion. - Add fragment identification heuristic based on function name regex and contiguous jump table entries. Currently, parent-to-fragment relationship is set up based on interprocedural references – direct references from the parent function. These references don't include references through jump table. Additionally, some fragments are only reachable through jump table. In that case, in order to fully consume jump table, add parent-to-fragment relationship during `analyzeJumpTable` using the following heuristics: 1. Fragment is identified as such based on name (contains `.cold.` part), but 2. Parent function is not set – no direct interprocedural references to that fragment, and 3. Fragment has the name of the form <parent>.cold(.\d+) * For split functions with jump table entries spanning parent and fragments, mark parent and all fragments as ignored. (cherry picked from FBD24456904)
-
Amir Ayupov authored
Summary: For interprocedural references to fragments, record them as fragment entry points. Not registering these entry points leads to UCE removing the blocks and "Undefined temporary symbol" assertion. (cherry picked from FBD24511281)
-
Amir Ayupov authored
Summary: registerFragment to be reused in adding fragments reachable only through jump tables. (cherry picked from FBD24656651)
-
- Nov 05, 2020
-
-
Vladislav Khmelevsky authored
Summary: Some of the TLS relocatios like R_AARCH64_TLSDESC_ADR_PAGE21 must be handled by bolt and should not be skipped by the removed condition. Some of the TLS relocations like R_AARCH64_TLS_TPREL64 could really be skipped here, but AFAIU this condition was added as part of BOLT its self optimization, so to prevent future problems here my suggestion is not to add another condition like "isTLS(RType) && isTLSRelocatable(RType)", but just remove it since absense of this condition should not broke any other TLS relocation. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/103 GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com> (cherry picked from FBD24745928)
-
- Nov 04, 2020
-
-
Maksim Panchenko authored
Summary: Fix several issues to make C++ exceptions work in shared objects: * Set MCObjectFileInfo PIC type based on the input binary type. * Support indirect (DW_EH_PE_indirect) encoding while writing exception Type Table. * Use different LPStart value and landing pad encoding for .so's. * Disable splitting of exception-handling code for .so's because of the new encoding. (cherry picked from FBD24698765)
-
- Nov 03, 2020
-
-
Rafael Auler authored
Summary: EliminateUnreachableBlocks has a data race because it depends on BinaryContext::computeCodeSize. computeCodeSize supports independent Emitters, enabling a lock-free execution. Unfortunately, that is almost as expensive as the lock. Removing the boilerplate code for parallellization of this pass turned out to be the best alternative: no races and slightly better execution time for HHVM. (cherry picked from FBD24716250)
-
- Oct 30, 2020
-
-
Rafael Auler authored
Summary: In BinaryContext, we had StringRef holding a reference to an r-value std::string. This triggers clang's address sanitizer warnings. In MCPlusBuilder we had a left shift overflowing a type, which is undefined behavior. Similarly, in CallGraph, we had a hash function shifting a negative value, which is also UB. The last two triggers the UB sanitizer. (cherry picked from FBD24661045)
-
Rafael Auler authored
Summary: Add basic instructions on how to instrument a binary. (cherry picked from FBD24660183)
-
- Oct 31, 2020
-
-
Rafael Auler authored
Summary: Only use dump() method under DEBUG() macro. (cherry picked from FBD24666481)
-
- Oct 23, 2020
-
-
Maksim Panchenko authored
Summary: Some symbols in .dynsym will be erroneously marked as belonging to a non-allocatable section that BOLT can remove. In that case, keep the original invalid index for such symbols instead of setting the UNDEF index. (cherry picked from FBD24488677)
-
- Oct 22, 2020
-
-
Amir Ayupov authored
Summary: Change .dot dumps filename format from <function>-<passname>.dot to <function>-<passidx>_<passname>.dot This change helps navigate dumps by making the pass order explicit. Example: execute_stack_op.cold.6-1(*2)-00_build-cfg.dot execute_stack_op.cold.6-1(*2)-01_validate-internal-calls.dot execute_stack_op.cold.6-1(*2)-02_strip-rep-ret.dot ... (cherry picked from FBD24452903)
-
- Oct 21, 2020
-
-
Maksim Panchenko authored
Summary: While refactoring the pass, I removed the important transactional property of the patching process. Restore it. (cherry picked from FBD24440214)
-
- Oct 18, 2020
-
-
Maksim Panchenko authored
Summary: When optimizing input with relocations, make it faster and less memory-hungry with lite mode. (cherry picked from FBD24374241)
-
- Oct 17, 2020
-
-
Rafael Auler authored
Summary: When -hot-text is on, do not read __hot_start and __hot_end from input (inserted by a linker script with the intent of ordering functions). This can confuse BOLT into creating a function with this name depending on which address the symbol lands and we will assert when trying to emit our own __hot_start/__hot_end with symbol redefinition. (cherry picked from FBD24366636)
-