Skip to content
  1. Sep 05, 2016
  2. Sep 04, 2016
  3. Sep 03, 2016
  4. Sep 02, 2016
  5. Sep 01, 2016
    • Michael Kuperstein's avatar
      [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes · 5f17d08f
      Michael Kuperstein authored
      Prior to this, we could generate a vector_shuffle from an IR shuffle when the
      size of the result was exactly the sum of the sizes of the input vectors.
      If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
      with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
      and inserts.
      
      Instead, we can form a larger vector_shuffle, and then extract a subvector
      of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
      and then extract a <12 x i8>.
      
      This also includes a target-specific X86 combine that in the presence of
      AVX2 combines:
      (vector_shuffle <mask> (concat_vectors t1, undef)
                             (concat_vectors t2, undef))
      into:
      (vector_shuffle <mask> (concat_vectors t1, t2), undef)
      in cases where this allows us to form VPERMD/VPERMQ.
      
      (This is not a separate commit, as that pattern does not appear without
      the DAGBuilder change.)
      
      llvm-svn: 280418
      5f17d08f
    • Andrey Turetskiy's avatar
      [X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions. · cde38b6a
      Andrey Turetskiy authored
      According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned
      to 16 bytes. This patch removes this requirement from the memory folding table.
      
      Differential Revision: https://reviews.llvm.org/D23919
      
      llvm-svn: 280402
      cde38b6a
    • Michael Kuperstein's avatar
      [DAGCombine] Don't fold a trunc if it feeds an anyext · 65bc3c89
      Michael Kuperstein authored
      Legalization tends to create anyext(trunc) patterns. This should always be
      combined - into either a single trunc, a single ext, or nothing if the
      types match exactly. But if we happen to combine the trunc first, we may pull
      the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
      -> extract(bitcast) fold).
      
      To prevent this, we can avoid doing the fold, similarly to how we already handle
      fpround(fpextend).
      
      Differential Revision: https://reviews.llvm.org/D23893
      
      llvm-svn: 280386
      65bc3c89
    • Elena Demikhovsky's avatar
      Optimized FMA intrinsic + FNEG , like · 4d7738df
      Elena Demikhovsky authored
      -(a*b+c)
      
      and FNEG + FMA, like
      a*b-c or (-a)*b+c.
      
      The bug description is here :  https://llvm.org/bugs/show_bug.cgi?id=28892
      
      Differential revision: https://reviews.llvm.org/D23313
      
      llvm-svn: 280368
      4d7738df
    • Dean Michael Berris's avatar
      [XRay] Detect and emit sleds for sibling/tail calls · e8ae5baa
      Dean Michael Berris authored
      Summary:
      This change promotes the 'isTailCall(...)' member function to
      TargetInstrInfo as a query interface for determining on a per-target
      basis whether a given MachineInstr is a tail call instruction. We build
      upon this in the XRay instrumentation pass to emit special sleds for
      tail call optimisations, where we emit the correct kind of sled.
      
      The tail call sleds look like a mix between the function entry and
      function exit sleds. Form-wise, the sled comes before the "jmp"
      instruction that implements the tail call similar to how we do it for
      the function entry sled. Functionally, because we know this is a tail
      call, it behaves much like an exit sled -- i.e. at runtime we may use
      the exit trampolines instead of a different kind of trampoline.
      
      A follow-up change to recognise these sleds will be done in compiler-rt,
      so that we can start intercepting these initially as exits, but also
      have the option to have different log entries to more accurately reflect
      that this is actually a tail call.
      
      Reviewers: echristo, rSerge, majnemer
      
      Subscribers: mehdi_amini, dberris, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D23986
      
      llvm-svn: 280334
      e8ae5baa
  6. Aug 31, 2016
    • Tim Northover's avatar
      GlobalISel: use G_TYPE to annotate physregs with a type. · 11a23546
      Tim Northover authored
      More preparation for dropping source types from MachineInstrs: regsters coming
      out of already-selected code (i.e. non-generic instructions) don't have a type,
      but that information is needed so we must add it manually.
      
      This is done via a new G_TYPE instruction.
      
      llvm-svn: 280292
      11a23546
    • Philip Reames's avatar
      [statepoints][experimental] Add support for live-in semantics of values in deopt bundles · 2b1084ac
      Philip Reames authored
      This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.
      
      The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.
      
      Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)
      
      My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.
      
      Differential Revision: https://reviews.llvm.org/D24000
      
      llvm-svn: 280250
      2b1084ac
    • Simon Pilgrim's avatar
      [X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result · 6199b4fd
      Simon Pilgrim authored
      Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles.
      
      Differential Revision: https://reviews.llvm.org/D23797
      
      llvm-svn: 280249
      6199b4fd
    • Simon Pilgrim's avatar
      [X86][SSE] Improve awareness of fptrunc implicit zeroing of upper 64-bits of xmm result · 7b09af19
      Simon Pilgrim authored
      Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps
      
      Differential Revision: https://reviews.llvm.org/D23797
      
      llvm-svn: 280214
      7b09af19
    • Craig Topper's avatar
      [AVX-512] Add patterns to select masked logical operations if the select has a floating point type. · 8f6827c9
      Craig Topper authored
      This is needed in order to replace the masked floating point logical op intrinsics with native IR.
      
      llvm-svn: 280195
      8f6827c9
    • Craig Topper's avatar
      [AVX-512] Add test cases for masked floating point logic operations with... · 0f8fb476
      Craig Topper authored
      [AVX-512] Add test cases for masked floating point logic operations with bitcasts between the logic ops and the select. We don't currently select masked operations for these cases.
      
      Test cases taken from optimized clang output after trying to convert the masked floating point logical op intrinsics to native IR.
      
      llvm-svn: 280194
      0f8fb476
    • Craig Topper's avatar
      [X86] Regenerate a test using update_llc_test_checks.py. · de8b1a00
      Craig Topper authored
      llvm-svn: 280193
      de8b1a00
    • Dean Michael Berris's avatar
      [XRay] Support multiple return instructions in a single basic block · 047669f1
      Dean Michael Berris authored
      Add a .mir test to catch this case, and fix the xray-instrumentation
      pass to handle it appropriately.
      
      llvm-svn: 280192
      047669f1
  7. Aug 29, 2016
Loading