[AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774
Showing
- llvm/lib/Target/X86/X86ISelLowering.cpp 7 additions, 8 deletionsllvm/lib/Target/X86/X86ISelLowering.cpp
- llvm/lib/Target/X86/X86InstrAVX512.td 10 additions, 0 deletionsllvm/lib/Target/X86/X86InstrAVX512.td
- llvm/test/CodeGen/X86/avx512-arith.ll 1 addition, 4 deletionsllvm/test/CodeGen/X86/avx512-arith.ll
- llvm/test/CodeGen/X86/avx512-vbroadcast.ll 6 additions, 16 deletionsllvm/test/CodeGen/X86/avx512-vbroadcast.ll
- llvm/test/CodeGen/X86/avx512-vec-cmp.ll 6 additions, 18 deletionsllvm/test/CodeGen/X86/avx512-vec-cmp.ll
- llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll 2 additions, 4 deletionsllvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
Loading
Please register or sign in to comment