[AVR] Custom lower 32-bit shift instructions
32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal. I've optimized 32-bit shifts on AVR using a custom inserter. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers. This is the first commit in a series. When completed, shift instructions will take around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop. As a side effect, non-constant 32-bit shifts also become more efficient. For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant. The current patch just lays the groundwork. The result is actually a regression in code size. Later patches will use this as a basis to optimize these shift instructions. Differential Revision: https://reviews.llvm.org/D140569
Loading
Please sign in to comment