Skip to content
  1. Jun 28, 2014
    • Chandler Carruth's avatar
      [x86] Add handling for splat-like widenings of v16i8 shuffles. · 887c2c34
      Chandler Carruth authored
      These show up really frequently, not the least with actual splats. =] We
      lowered these quite badly before. The new code path tries to widen i8
      shuffles to i16 shuffles in a splat-like way. There are still some
      inefficiencies in our i16 splat logic though, so we aren't really done
      here.
      
      Also, for certain patterns (bit of a gather-and-splat) we still
      generate pretty silly code, and I've left a fixme for addressing it.
      However, I'm not actually worried about this code pattern as much. The
      old shuffle lowering generates a 29 instruction monstrosity for it that
      should execute much more slowly.
      
      llvm-svn: 211974
      887c2c34
  2. Jun 27, 2014
    • Chandler Carruth's avatar
      [x86] Fix a miscompile in the new shuffle lowering uncovered by · dd6470a9
      Chandler Carruth authored
      a bootstrap.
      
      I managed to mis-remember how PACKUS worked on x86, and was using undef
      for the high bytes instead of zero. The fix is fairly obvious.
      
      llvm-svn: 211922
      dd6470a9
    • Chandler Carruth's avatar
      [x86] Begin a significant overhaul of how vector lowering is done in the · 83860cfc
      Chandler Carruth authored
      x86 backend.
      
      This sketches out a new code path for vector lowering, hidden behind an
      off-by-default flag while it is under development. The fundamental idea
      behind the new code path is to aggressively break down the problem space
      in ways that ease selecting the odd set of instructions available on
      x86, and carefully avoid scalarizing code even when forced to use older
      ISAs. Notably, this starts off restricting itself to SSE2 and implements
      the complete vector shuffle and blend space for 128-bit vectors in SSE2
      without scalarizing. The plan is to layer on top of this ISA extensions
      where we can bail out of the complex SSE2 lowering and opt for
      a cheaper, specialized instruction (or set of instructions). It also
      needs to be generalized to AVX and AVX512 vector widths.
      
      Currently, this does a decent but not perfect job for SSE2. There are
      some specific shortcomings that I plan to address:
      - We need a peephole combine to fold together shuffles where possible.
        There are cases where a previous shuffle could be modified slightly to
        arrange for elements to be in the correct position and a later shuffle
        eliminated. Doing this eagerly added quite a bit of complexity, and
        so my plan is to combine away these redundancies afterward.
      - There are a lot more clever ways to use unpck and pack that need to be
        added. This is essential for real world shuffles as it turns out...
      
      Once SSE2 is polished a bit I should be able to get interesting numbers
      on performance improvements on benchmarks conducive to vectorization.
      All of this will be off by default until it is functionally equivalent
      of course.
      
      Differential Revision: http://reviews.llvm.org/D4225
      
      llvm-svn: 211888
      83860cfc
Loading