Skip to content
  1. Mar 19, 2013
  2. Mar 18, 2013
  3. Mar 16, 2013
  4. Mar 15, 2013
  5. Mar 14, 2013
  6. Mar 11, 2013
  7. Mar 08, 2013
  8. Mar 07, 2013
  9. Mar 06, 2013
    • Michael Liao's avatar
      Fix PR15355 · da22b30b
      Michael Liao authored
      - Clear 'mayStore' flag when loading from the atomic variable before the
        spin loop
      - Clear kill flag from one use to multiple use in registers forming the
        address to that atomic variable
      - don't use a physical register as live-in register in BB (neither entry
        nor landing pad.) by copying it into virtual register
      
      (patch by Cameron Zwarich)
      
      llvm-svn: 176538
      da22b30b
  10. Mar 05, 2013
    • David Sehr's avatar
      The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd
      David Sehr authored
      one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
      does with aligned bundling, this can have a performance impact.  From my
      micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
      one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
      changes NOP emission to emit as many 15-byte (the maximum) as possible followed
      by at most one shorter NOP.
      
      llvm-svn: 176464
      4c8979cd
  11. Mar 04, 2013
    • Preston Gurd's avatar
      Bypass Slow Divides · 485296d1
      Preston Gurd authored
      * Only apply divide bypass optimization when not optimizing for size. 
      * Fixed bug caused by constant for 0 value of type Int32,
        used dividend type to generate the constant instead.
      * For atom x86-64 apply the divide bypass to use 16-bit divides instead of
        64-bit divides when operand values are small enough.
      * Added lit tests for 64-bit divide bypass.
      
      Patch by Tyler Nowicki!
      
      llvm-svn: 176442
      485296d1
  12. Mar 02, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4
      Arnold Schwaighofer authored
      This matters for example in following matrix multiply:
      
      int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
        int i, j, k, val;
        for (i=0; i<rows; i++) {
          for (j=0; j<cols; j++) {
            val = 0;
            for (k=0; k<cols; k++) {
              val += m1[i][k] * m2[k][j];
            }
            m3[i][j] = val;
          }
        }
        return(m3);
      }
      
      Taken from the test-suite benchmark Shootout.
      
      We estimate the cost of the multiply to be 2 while we generate 9 instructions
      for it and end up being quite a bit slower than the scalar version (48% on my
      machine).
      
      Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
      vector into 2 128bits and handle the subvector muls like above with 9
      instructions.
      Only on avx-2 will we have a cost of 9 for v4i64.
      
      I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
      add instead of a mul because with a mul we now no longer vectorize. I did
      verify that the mul would be indeed more expensive when vectorized with 3
      kernels:
      
      for (i ...)
         r += a[i] * 3;
      for (i ...)
        m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
      and a matrix multiply.
      
      In each case the vectorized version was considerably slower.
      
      radar://13304919
      
      llvm-svn: 176403
      20ef54f4
  13. Mar 01, 2013
    • Michael Liao's avatar
      Fix PR10475 · 6af16fc3
      Michael Liao authored
      - ISD::SHL/SRL/SRA must have either both scalar or both vector operands
        but TLI.getShiftAmountTy() so far only return scalar type. As a
        result, backend logic assuming that breaks.
      - Rename the original TLI.getShiftAmountTy() to
        TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
        return target-specificed scalar type or the same vector type as the
        1st operand.
      - Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
        type.
      
      llvm-svn: 176364
      6af16fc3
    • Duncan Sands's avatar
      GCC thinks that this variable might be used uninitialized (it isn't). · 2cb41d37
      Duncan Sands authored
      llvm-svn: 176341
      2cb41d37
  14. Feb 28, 2013
  15. Feb 27, 2013
  16. Feb 26, 2013
  17. Feb 25, 2013
  18. Feb 24, 2013
  19. Feb 23, 2013
  20. Feb 22, 2013
  21. Feb 21, 2013
    • Eli Bendersky's avatar
      Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo · 8da87163
      Eli Bendersky authored
      to TargetFrameLowering, where it belongs. Incidentally, this allows us
      to delete some duplicated (and slightly different!) code in TRI.
      
      There are potentially other layering problems that can be cleaned up
      as a result, or in a similar manner.
      
      The refactoring was OK'd by Anton Korobeynikov on llvmdev.
      
      Note: this touches the target interfaces, so out-of-tree targets may
      be affected.
      
      llvm-svn: 175788
      8da87163
Loading