Skip to content
  1. Jun 12, 2017
  2. Jun 11, 2017
    • Sanjay Patel's avatar
      [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load · dcbfbb11
      Sanjay Patel authored
      I was looking closer at the x86 test diffs in D33866, and the first change seems like it 
      shouldn't happen in the first place. So this patch will resolve that.
      
      Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for 
      any given CPU model, so we should be able to interchange those without affecting perf. 
      But as we can see in some of the diffs here, using vperm2f128 allows load folding, so 
      we should take that opportunity to reduce code size and register pressure.
      
      A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 
      was introduced with AVX1, we should be selecting it in all of the same situations that we 
      would with AVX2. If there's some reason that an AVX1 CPU would not want to use this 
      instruction, that should be fixed up in a later pass.
      
      Differential Revision: https://reviews.llvm.org/D33938
      
      llvm-svn: 305171
      dcbfbb11
  3. Jun 10, 2017
  4. Jun 09, 2017
  5. Jun 08, 2017
  6. Jun 07, 2017
Loading