[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#70469)
* Enhanced the logic of ExpandMemCmp pass to merge contiguous subsequences in LoadSequence, based on sizes allowed in `AllowedTailExpansions`. * This enhancement seeks to minimize the number of basic blocks and produce optimized code when using memcmp with non-register aligned sizes. * Enable this feature for AArch64 with memcmp sizes modulo 8 equal to 3, 5, and 6. Reapplication of #69942 after fixing a bug
Loading
Please sign in to comment