Skip to content
  1. Nov 27, 2017
  2. Nov 26, 2017
  3. Nov 25, 2017
    • Craig Topper's avatar
      [X86] Add separate intrinsics for scalar FMA4 instructions. · e485631c
      Craig Topper authored
      Summary:
      These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.
      
      I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.
      
      I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.
      
      I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.
      
      fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.
      
      Reviewers: RKSimon, spatel
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D39851
      
      llvm-svn: 318984
      e485631c
    • Craig Topper's avatar
      [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is... · ea37e201
      Craig Topper authored
      [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions.
      
      Summary:
      This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.
      
      Test command lines have been added for these two cases.
      
      Reviewers: magabari, delena, RKSimon, zvi
      
      Reviewed By: RKSimon
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D40282
      
      llvm-svn: 318983
      ea37e201
    • Andrew V. Tischenko's avatar
      Add BTVER2 sched support for SHLD/SHRD. · 198720d3
      Andrew V. Tischenko authored
      Differential Revision: https://reviews.llvm.org/D40124
      
      llvm-svn: 318977
      198720d3
    • Craig Topper's avatar
      [X86] Support folding to andnps with SSE1 only. · c1b32691
      Craig Topper authored
      With SSE1 only, we emit FAND and FXOR nodes for v4f32.
      
      llvm-svn: 318968
      c1b32691
    • Craig Topper's avatar
      [X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR... · 5b85df86
      Craig Topper authored
      [X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR whe only SSE1 is available.
      
      v4i32 isn't a legal type with sse1 only and would end up getting scalarized otherwise.
      
      This isn't completely ideal as it doesn't handle cases like v8i32 that would get split to v4i32. But it at least helps with code written using the clang intrinsic header.
      
      llvm-svn: 318967
      5b85df86
  4. Nov 24, 2017
  5. Nov 23, 2017
  6. Nov 22, 2017
Loading