Skip to content
  1. Apr 18, 2013
  2. Apr 17, 2013
  3. Apr 16, 2013
  4. Apr 13, 2013
    • Andrew Trick's avatar
      X86 machine model: reduce SandyBridge and Haswell ILPWindow. · f7fd6b9e
      Andrew Trick authored
      The initial values were arbitrary. I want them to be more
      conservative. This represents the number of latency cycles hidden by
      OOO execution. In practice, I think it should be within a small factor
      of the complex floating point operation latency so the scheduler can
      make some attempt to hide latency even for smallish blocks.
      
      These are by no means the best values, just a starting point for
      tuning heuristics. Some benchmarks such as TSVC run faster with this
      lower value for SandyBridge. I haven't run anything on Haswell, but
      it's shouldn't be 2x SB.
      
      llvm-svn: 179450
      f7fd6b9e
    • Andrew Trick's avatar
      Catch another case where SD fails to propagate node order. · 52b8387f
      Andrew Trick authored
      I need to handle this for the test case in my following scheduler
      commit.
      
      Work is already under way to redesign the mechanism for node order
      propagation because this case by case approach is unmaintainable.
      
      llvm-svn: 179448
      52b8387f
    • Chad Rosier's avatar
      [ms-inline asm] Simplify the logic by using parsePrimaryExpr. No functional · 43554eed
      Chad Rosier authored
      change intended.  Test case previously added in r178568.
      Part of rdar://13611297
      
      llvm-svn: 179425
      43554eed
  5. Apr 12, 2013
  6. Apr 11, 2013
    • Chad Rosier's avatar
      [ms-inline asm] Remove brackets from around a symbol reference in the target · 8fb83300
      Chad Rosier authored
      specific logic.  This makes the code much less fragile.  Test case coming on the
      clang side in a moment.
      rdar://13634327
      
      llvm-svn: 179323
      8fb83300
    • Michael Liao's avatar
      Optimize vector select from all 0s or all 1s · 55658d42
      Michael Liao authored
      As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
      vector select could be simplified to AND/OR or removed if one or both values
      being selected is all 0s or all 1s.
      
      llvm-svn: 179267
      55658d42
    • Michael Liao's avatar
      Add CLAC/STAC instruction encoding/decoding support · 95d94403
      Michael Liao authored
      As these two instructions in AVX extension are privileged instructions for
      special purpose, it's only expected to be used in inlined assembly.
      
      llvm-svn: 179266
      95d94403
    • Michael Liao's avatar
      Enhance bool simplifcation in X86 to handle more cases · f7bf8705
      Michael Liao authored
      This patch is revised based on patch from Victor Umansky
      <victor.umansky@intel.com>. More cases are handled in X86's bool
      simplification, i.e.
      - SETCC_CARRY
      - value is truncated to i1 with AND
      
      As a by-product, PR5443 is also fixed.
      
      llvm-svn: 179265
      f7bf8705
    • Nico Rieck's avatar
      MC: Support COFF image-relative MCSymbolRefs · 1da4529b
      Nico Rieck authored
      Add support for the COFF relocation types IMAGE_REL_I386_DIR32NB and
      IMAGE_REL_AMD64_ADDR32NB for 32- and 64-bit respectively. These are
      similar to normal 4-byte relocations except that they do not include
      the base address of the image.
      
      Image-relative relocations are used for debug information (32-bit) and
      SEH unwind tables (64-bit).
      
      A new MCSymbolRef variant called 'VK_COFF_IMGREL32' is introduced to
      specify such relocations. For AT&T assembly, this variant can be accessed
      using the symbol suffix '@imgrel'.
      
      llvm-svn: 179240
      1da4529b
  7. Apr 10, 2013
  8. Apr 09, 2013
  9. Apr 08, 2013
    • Arnold Schwaighofer's avatar
      X86 cost model: Model cost for uitofp and sitofp on SSE2 · f47d2d7f
      Arnold Schwaighofer authored
      The costs are overfitted so that I can still use the legalization factor.
      
      For example the following kernel has about half the throughput vectorized than
      unvectorized when compiled with SSE2. Before this patch we would vectorize it.
      
      unsigned short A[1024];
      double B[1024];
      void f() {
        int i;
        for (i = 0; i < 1024; ++i) {
          B[i] = (double) A[i];
        }
      }
      
      radar://13599001
      
      llvm-svn: 179033
      f47d2d7f
Loading