[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores
As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" and "sqrtps" instructions: https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction. Differential Revision: https://reviews.llvm.org/D119001
Loading
Please sign in to comment