Skip to content
Commit c4d7460e authored by Bill Nell's avatar Bill Nell Committed by Maksim Panchenko
Browse files

[BOLT] Improve ICP for virtual method calls and jump tables using value profiling.

Summary:
Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites.

The basic process is the following:
1. Work backwards from the callsite to find the most recent def of the call register.
2. Work back from the call register def to find the instruction where the vtable is loaded.
3. Find out of there is any value profiling data associated with the vtable load.  If so, record all these addresses as potential vtables + method offsets.
4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address.  At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register.  The result of this execution should be the method offset.
5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset.  Make sure that this address maps to an actual function symbol.
6. Try to associate a vtable pointer with each target address in SymTargets.  If every target has a vtable, then this is almost certainly a virtual method callsite.
7. Use the vtable address when generating the promoted call code.  It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer.  Additionally, the instructions to load up the method are dumped into the cold call block.

For jump tables, the basic idea is the same.  I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table.

Note: I'm assuming the whole call is in a single BB.  According to @rafaelauler, this isn't always the case on ARM.    This also isn't always the case on X86 either.  If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call.  I'm going to leave fixing this until later since it makes things a bit more complicated.

I've also fixed a bug where ICP was introducing a conditional tail call.  I made sure that SCTC fixes these up afterwards.  I have no idea why I made it introduce a CTC in the first place.

(cherry picked from FBD6120768)
parent 1475c4da
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment