[X86] Add some reduction add test cases that show sub-optimal code on avx2 and later.
For v4i8 and v8i8 when the reduction starts with a load we end up shifting the data in the scalar domain and copying to the vector domain a second time using a broadcast. We already copied it to the vector domain once. It's better to just shuffle it there. llvm-svn: 368544
Loading
Please sign in to comment