[LV] Use VScaleForTuning to allow wider epilogue VFs.
When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709
Loading
Please sign in to comment