Skip to content
  • Jakob Stoklund Olesen's avatar
    Enable register mask operands for x86 calls. · 8a450cb2
    Jakob Stoklund Olesen authored
    Call instructions no longer have a list of 43 call-clobbered registers.
    Instead, they get a single register mask operand with a bit vector of
    call-preserved registers.
    
    This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call
    instruction, and it speeds up building call instructions because those
    43 imp-def operands no longer need to be added to use-def lists. (And
    removed and shifted and re-added for every explicit call operand).
    
    Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and
    BranchFolding are significantly faster because they can deal with call
    clobbers in bulk.
    
    Overall, clang -O2 is between 0% and 8% faster, uniformly distributed
    depending on call density in the compiled code.  Debug builds using
    clang -O0 are 0% - 3% faster.
    
    I have verified that this patch doesn't change the assembly generated
    for the LLVM nightly test suite when building with -disable-copyprop
    and -disable-branch-fold.
    
    Branch folding behaves slightly differently in a few cases because call
    instructions have different hash values now.
    
    Copy propagation flushes its data structures when it crosses a register
    mask operand. This causes it to leave a few dead copies behind, on the
    order of 20 instruction across the entire nightly test suite, including
    SPEC. Fixing this properly would require the pass to use different data
    structures.
    
    llvm-svn: 150638
    8a450cb2
Loading