Skip to content
  • Arnold Schwaighofer's avatar
    Costmodel: Add support for horizontal vector reductions · cae8735a
    Arnold Schwaighofer authored
    Upcoming SLP vectorization improvements will want to be able to estimate costs
    of horizontal reductions. Add infrastructure to support this.
    
    We model reductions as a series of (shufflevector,add) tuples ultimately
    followed by an extractelement. For example, for an add-reduction of <4 x float>
    we could generate the following sequence:
    
     (v0, v1, v2, v3)
       \   \  /  /
         \  \  /
           +  +
    
     (v0+v2, v1+v3, undef, undef)
        \      /
     ((v0+v2) + (v1+v3), undef, undef)
    
     %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
                               <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
     %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
     %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
                              <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
     %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
     %r = extractelement <4 x float> %bin.rdx8, i32 0
    
    This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
    that will allow clients to ask for the cost of such a reduction (as backends
    might generate more efficient code than the cost of the individual instructions
    summed up). This interface is excercised by the CostModel analysis pass which
    looks for reduction patterns like the one above - starting at extractelements -
    and if it sees a matching sequence will call the cost model interface.
    
    We will also support a second form of pairwise reduction that is well supported
    on common architectures (haddps, vpadd, faddp).
    
     (v0, v1, v2, v3)
      \   /    \  /
     (v0+v1, v2+v3, undef, undef)
        \     /
     ((v0+v1)+(v2+v3), undef, undef, undef)
    
      %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
            <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
      %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
            <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
      %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
      %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
            <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
      %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
            <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
      %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
      %r = extractelement <4 x float> %bin.rdx.1, i32 0
    
    llvm-svn: 190876
    cae8735a
Loading