[X86] WriteFShuffle256 shuffles aren't microcoded in the llvm sense
znver1/2 might have poor throughput for crosslane shuffles but they don't consume 100 cycles of resources I think there was a misunderstanding between the AMD definition of microcoding (more than 2-3 uops) and LLVM (here be dragons - impossible to approximately model the instruction) This is more yak shaving to come from D103695 - this time working out why codegen involving broadcasts gives such weird numbers
Loading
Please sign in to comment