Skip to content
  1. Aug 09, 2013
  2. Jul 12, 2013
  3. Jun 27, 2013
  4. Jun 25, 2013
  5. Jun 18, 2013
  6. Apr 17, 2013
  7. Apr 08, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Model cost for uitofp and sitofp on SSE2 · f47d2d7f
      Arnold Schwaighofer authored
      The costs are overfitted so that I can still use the legalization factor.
      
      For example the following kernel has about half the throughput vectorized than
      unvectorized when compiled with SSE2. Before this patch we would vectorize it.
      
      unsigned short A[1024];
      double B[1024];
      void f() {
        int i;
        for (i = 0; i < 1024; ++i) {
          B[i] = (double) A[i];
        }
      }
      
      radar://13599001
      
      llvm-svn: 179033
      f47d2d7f
  8. Apr 05, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Differentiate cost for vector shifts of constants · 44f902ed
      Arnold Schwaighofer authored
      SSE2 has efficient support for shifts by a scalar. My previous change of making
      shifts expensive did not take this into account marking all shifts as expensive.
      This would prevent vectorization from happening where it is actually beneficial.
      
      With this change we differentiate between shifts of constants and other shifts.
      
      radar://13576547
      
      llvm-svn: 178808
      44f902ed
    • Arnold Schwaighofer's avatar
      CostModel: Add parameter to instruction cost to further classify operand values · b9773871
      Arnold Schwaighofer authored
      On certain architectures we can support efficient vectorized version of
      instructions if the operand value is uniform (splat) or a constant scalar.
      An example of this is a vector shift on x86.
      
      We can efficiently support
      
      for (i = 0 ; i < ; i += 4)
        w[0:3] = v[0:3] << <2, 2, 2, 2>
      
      but not
      
      for (i = 0; i < ; i += 4)
        w[0:3] = v[0:3] << x[0:3]
      
      This patch adds a parameter to getArithmeticInstrCost to further qualify operand
      values as uniform or uniform constant.
      
      Targets can then choose to return a different cost for instructions with such
      operand values.
      
      A follow-up commit will test this feature on x86.
      
      radar://13576547
      
      llvm-svn: 178807
      b9773871
  9. Apr 03, 2013
  10. Apr 01, 2013
  11. Mar 20, 2013
    • Michael Liao's avatar
      Correct cost model for vector shift on AVX2 · 70dd7f99
      Michael Liao authored
      - After moving logic recognizing vector shift with scalar amount from
        DAG combining into DAG lowering, we declare to customize all vector
        shifts even vector shift on AVX is legal. As a result, the cost model
        needs special tuning to identify these legal cases.
      
      llvm-svn: 177586
      70dd7f99
  12. Mar 19, 2013
  13. Mar 02, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4
      Arnold Schwaighofer authored
      This matters for example in following matrix multiply:
      
      int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
        int i, j, k, val;
        for (i=0; i<rows; i++) {
          for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
              val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
          }
        }
        return(m3);
      }
      
      Taken from the test-suite benchmark Shootout.
      
      We estimate the cost of the multiply to be 2 while we generate 9 instructions
      for it and end up being quite a bit slower than the scalar version (48% on my
      machine).
      
      Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
      vector into 2 128bits and handle the subvector muls like above with 9
      instructions.
      Only on avx-2 will we have a cost of 9 for v4i64.
      
      I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
      add instead of a mul because with a mul we now no longer vectorize. I did
      verify that the mul would be indeed more expensive when vectorized with 3
      kernels:
      
      for (i ...)
         r += a[i] * 3;
      for (i ...)
        m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
      and a matrix multiply.
      
      In each case the vectorized version was considerably slower.
      
      radar://13304919
      
      llvm-svn: 176403
      20ef54f4
  14. Feb 20, 2013
    • Elena Demikhovsky's avatar
      I optimized the following patterns: · 0ccdd131
      Elena Demikhovsky authored
       sext <4 x i1> to <4 x i64>
       sext <4 x i8> to <4 x i64>
       sext <4 x i16> to <4 x i64>
       
      I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
       (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
       
       The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
       The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.
      
      I also added a cost of this operations to the AVX costs table.
      
      llvm-svn: 175619
      0ccdd131
  15. Jan 25, 2013
  16. Jan 20, 2013
  17. Jan 16, 2013
  18. Jan 09, 2013
  19. Jan 07, 2013
    • Chandler Carruth's avatar
      Fix the enumerator names for ShuffleKind to match tho coding standards, · 2109f47d
      Chandler Carruth authored
      and make its comments doxygen comments.
      
      llvm-svn: 171688
      2109f47d
    • Chandler Carruth's avatar
      Make the popcnt support enums and methods have more clear names and · 50a36cd1
      Chandler Carruth authored
      follow the conding conventions regarding enumerating a set of "kinds" of
      things.
      
      llvm-svn: 171687
      50a36cd1
    • Chandler Carruth's avatar
      Move TargetTransformInfo to live under the Analysis library. This no · d3e73556
      Chandler Carruth authored
      longer would violate any dependency layering and it is in fact an
      analysis. =]
      
      llvm-svn: 171686
      d3e73556
    • Chandler Carruth's avatar
      Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d
      Chandler Carruth authored
      a TargetMachine to construct (and thus isn't always available), to an
      analysis group that supports layered implementations much like
      AliasAnalysis does. This is a pretty massive change, with a few parts
      that I was unable to easily separate (sorry), so I'll walk through it.
      
      The first step of this conversion was to make TargetTransformInfo an
      analysis group, and to sink the nonce implementations in
      ScalarTargetTransformInfo and VectorTargetTranformInfo into
      a NoTargetTransformInfo pass. This allows other passes to add a hard
      requirement on TTI, and assume they will always get at least on
      implementation.
      
      The TargetTransformInfo analysis group leverages the delegation chaining
      trick that AliasAnalysis uses, where the base class for the analysis
      group delegates to the previous analysis *pass*, allowing all but tho
      NoFoo analysis passes to only implement the parts of the interfaces they
      support. It also introduces a new trick where each pass in the group
      retains a pointer to the top-most pass that has been initialized. This
      allows passes to implement one API in terms of another API and benefit
      when some other pass above them in the stack has more precise results
      for the second API.
      
      The second step of this conversion is to create a pass that implements
      the TargetTransformInfo analysis using the target-independent
      abstractions in the code generator. This replaces the
      ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
      lib/Target with a single pass in lib/CodeGen called
      BasicTargetTransformInfo. This class actually provides most of the TTI
      functionality, basing it upon the TargetLowering abstraction and other
      information in the target independent code generator.
      
      The third step of the conversion adds support to all TargetMachines to
      register custom analysis passes. This allows building those passes with
      access to TargetLowering or other target-specific classes, and it also
      allows each target to customize the set of analysis passes desired in
      the pass manager. The baseline LLVMTargetMachine implements this
      interface to add the BasicTTI pass to the pass manager, and all of the
      tools that want to support target-aware TTI passes call this routine on
      whatever target machine they end up with to add the appropriate passes.
      
      The fourth step of the conversion created target-specific TTI analysis
      passes for the X86 and ARM backends. These passes contain the custom
      logic that was previously in their extensions of the
      ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
      I separated them into their own file, as now all of the interface bits
      are private and they just expose a function to create the pass itself.
      Then I extended these target machines to set up a custom set of analysis
      passes, first adding BasicTTI as a fallback, and then adding their
      customized TTI implementations.
      
      The fourth step required logic that was shared between the target
      independent layer and the specific targets to move to a different
      interface, as they no longer derive from each other. As a consequence,
      a helper functions were added to TargetLowering representing the common
      logic needed both in the target implementation and the codegen
      implementation of the TTI pass. While technically this is the only
      change that could have been committed separately, it would have been
      a nightmare to extract.
      
      The final step of the conversion was just to delete all the old
      boilerplate. This got rid of the ScalarTargetTransformInfo and
      VectorTargetTransformInfo classes, all of the support in all of the
      targets for producing instances of them, and all of the support in the
      tools for manually constructing a pass based around them.
      
      Now that TTI is a relatively normal analysis group, two things become
      straightforward. First, we can sink it into lib/Analysis which is a more
      natural layer for it to live. Second, clients of this interface can
      depend on it *always* being available which will simplify their code and
      behavior. These (and other) simplifications will follow in subsequent
      commits, this one is clearly big enough.
      
      Finally, I'm very aware that much of the comments and documentation
      needs to be updated. As soon as I had this working, and plausibly well
      commented, I wanted to get it committed and in front of the build bots.
      I'll be doing a few passes over documentation later if it sticks.
      
      Commits to update DragonEgg and Clang will be made presently.
      
      llvm-svn: 171681
      664e354d
Loading