[BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum
Summary: This diff ports reviews.llvm.org/D70157 to our LLVM tree, which makes the integrated assembler able to align X86 control-flow changing instructions in a way to reduce the performance impact of the ucode update on Intel processors that implement the JCC erratum mitigation. See white paper "Mitigations for Jump Conditional Code Erratum" by Intel published November 2019. To port this patch, I changed classifySecondInstInMacroFusion to analyze instruction opcodes directly instead of analyzing the CondCond operand (in more recent versions of LLVM, all conditional branches share the same opcode, but with a different conditional operand). I also pulled to our tree Alignment.h as a dependency, and the macroop analyzing helpers. x86-align-branch-boundary and -x86-align-branch are the two flags that control nop insertion to avoid disabling the decoder cache, following the original patch. In BOLT, I added the flag x86-align-branch-boundary-hot-only to request the alignment to only be applied to hot code, which is turned on by default. The reason is because such alignment is expensive to perform on large modules, but if we limit it to hot code, the relaxation pass runtime becomes tolerable. (cherry picked from FBD19828850)
Loading
Please sign in to comment