Skip to content
  1. Apr 29, 2019
    • Don Hinton's avatar
      Fix additional cases of more that two dashes for options in tests. · 54dbcfe5
      Don Hinton authored
      llvm-svn: 359484
      54dbcfe5
    • Bjorn Pettersson's avatar
      [DAG] Refactor DAGCombiner::ReassociateOps · 82099457
      Bjorn Pettersson authored
      Summary:
      Extract the logic for doing reassociations
      from DAGCombiner::reassociateOps into a helper
      function DAGCombiner::reassociateOpsCommutative,
      and use that helper to trigger reassociation
      on the original operand order, or the commuted
      operand order.
      
      Codegen is not identical since the operand order will
      be different when doing the reassociations for the
      commuted case. That causes some unfortunate churn in
      some test cases. Apart from that this should be NFC.
      
      Reviewers: spatel, craig.topper, tstellar
      
      Reviewed By: spatel
      
      Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61199
      
      llvm-svn: 359476
      82099457
    • Diogo N. Sampaio's avatar
      [ARM] Add bitcast/extract_subvec. of fp16 vectors · d95abb17
      Diogo N. Sampaio authored
      Summary:
      This patch adds some basic operations for fp16
      vectors, such as bitcast from fp16 to i16,
      required to perform extract_subvector (also added
      here) and extract_element.
      
      Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard
      
      Reviewed By: ostannard
      
      Subscribers: javed.absar, kristof.beyls, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60618
      
      llvm-svn: 359433
      d95abb17
    • Diogo N. Sampaio's avatar
      [ARM] Add v4f16 and v8f16 types to the CallingConv · 2078eb74
      Diogo N. Sampaio authored
      Summary:
      The Procedure Call Standard for the Arm Architecture
      states that float16x4_t and float16x8_t behave just
      as uint16x4_t and uint16x8_t for argument passing.
      This patch adds the fp16 vectors to the
      ARMCallingConv.td file.
      
      Reviewers: miyuki, ostannard
      
      Reviewed By: ostannard
      
      Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60720
      
      llvm-svn: 359431
      2078eb74
  2. Apr 26, 2019
    • Nick Desaulniers's avatar
      [AsmPrinter] refactor to support %c w/ GlobalAddress' · 7ab164c4
      Nick Desaulniers authored
      Summary:
      Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
      printing the address of a MachineOperand::MO_GlobalAddress. Move that
      handling into a new overriden method in each base class. A virtual
      method was added to the base class for handling the generic case.
      
      Refactors a few subclasses to support the target independent %a, %c, and
      %n.
      
      The patch also contains small cleanups for AVRAsmPrinter and
      SystemZAsmPrinter.
      
      It seems that NVPTXTargetLowering is possibly missing some logic to
      transform GlobalAddressSDNodes for
      TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
      inline assembly asm constraints.
      
      Fixes:
      - https://bugs.llvm.org/show_bug.cgi?id=41402
      - https://github.com/ClangBuiltLinux/linux/issues/449
      
      Reviewers: echristo, void
      
      Reviewed By: void
      
      Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60887
      
      llvm-svn: 359337
      7ab164c4
  3. Apr 23, 2019
  4. Apr 17, 2019
  5. Apr 16, 2019
    • Hans Wennborg's avatar
      Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink... · 21eb771d
      Hans Wennborg authored
      Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
      
      The original commit caused false positives from AddressSanitizer's
      use-after-scope checks, which have now been fixed in r358478.
      
      > The code was previously checking that candidates for sinking had exactly
      > one use or were a store instruction (which can't have uses). This meant
      > we could sink call instructions only if they had a use.
      >
      > That limitation seemed a bit arbitrary, so this patch changes it to
      > "instruction has zero or one use" which seems more natural and removes
      > the need to special-case stores.
      >
      > Differential revision: https://reviews.llvm.org/D59936
      
      llvm-svn: 358483
      21eb771d
  6. Apr 15, 2019
    • Amara Emerson's avatar
      [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only. · 946b1246
      Amara Emerson authored
      Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
      be degraded.
      
      This change also improves the IRTranslator so that in most places, but not all,
      it creates constants using the MIRBuilder directly instead of first creating a
      new destination vreg and then creating a constant. By doing this, the
      buildConstant() method can just return the vreg of an existing G_CONSTANT
      instead of having to create a COPY from it.
      
      I measured a 0.2% improvement in compile time and a 0.9% improvement in code
      size at -O0 ARM64.
      
      Compile time:
      Program                                        base   cse    diff
      test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
      test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
      test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
      test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
      test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
      test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
      test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
      test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
      test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
      test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
      Geomean difference                                          -0.2%
      
      Code size:
      Program                                        base     cse      diff
      test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
      test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
      test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
      test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
      test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
      test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
      test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
      test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
      test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
      test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
      Geomean difference                                              -0.9%
      
      Differential Revision: https://reviews.llvm.org/D60580
      
      llvm-svn: 358369
      946b1246
  7. Apr 12, 2019
    • Sanjay Patel's avatar
      [DAGCombiner] narrow shuffle of concatenated vectors · 5e4ad39a
      Sanjay Patel authored
      // shuffle (concat X, undef), (concat Y, undef), Mask -->
      // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
      
      The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.
      
      The x86 changes look neutral or better. There's one test with an
      extra instruction, but that could be reversed for a subtarget with
      the right attributes. But by default, we want to avoid the 256-bit
      op when possible (in my motivating benchmark, a handful of ymm ops
      sprinkled into a sequence of xmm ops are triggering frequency
      throttling on Haswell resulting in significantly worse perf).
      
      Differential Revision: https://reviews.llvm.org/D60545
      
      llvm-svn: 358291
      5e4ad39a
  8. Apr 10, 2019
    • David Green's avatar
      Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg · 0861c87b
      David Green authored
      Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
      seeing through to the constant in other blocks. Revert this patch while we come
      up with a better way to handle that.
      
      I will try to follow this up with some better tests.
      
      llvm-svn: 358113
      0861c87b
    • Diogo N. Sampaio's avatar
      [ARM] [FIX] Add missing f16 vector operations lowering · 651463e4
      Diogo N. Sampaio authored
      Summary:
      Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node.
      As well, allows <8xhalf> and <4xhalf> vldup1 operations.
      
      These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics.
      
      Reviewers: olista01, pbarrio, LukeGeeson, efriedma
      
      Reviewed By: efriedma
      
      Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60319
      
      llvm-svn: 358081
      651463e4
    • Diana Picus's avatar
      [ARM GlobalISel] Select G_FCONSTANT for VFP3 · b6e83b98
      Diana Picus authored
      Make it possible to TableGen code for FCONSTS and FCONSTD.
      
      We need to make two changes to the TableGen descriptions of vfp_f32imm
      and vfp_f64imm respectively:
      * add GISelPredicateCode to check that the immediate fits in 8 bits;
      * extract the SDNodeXForms into separate definitions and create a
      GISDNodeXFormEquiv and a custom renderer function for each of them.
      
      There's a lot of boilerplate to get the actual value of the immediate,
      but it basically just boils down to calling ARM_AM::getFP32Imm or
      ARM_AM::getFP64Imm.
      
      llvm-svn: 358063
      b6e83b98
    • Diana Picus's avatar
      [ARM GlobalISel] Select G_FCONSTANT into pools · 3533ad68
      Diana Picus authored
      Put all floating point constants into constant pools and load their
      values from there.
      
      llvm-svn: 358062
      3533ad68
    • Diana Picus's avatar
      [ARM GlobalISel] Map G_FCONSTANT · 165846b0
      Diana Picus authored
      llvm-svn: 358061
      165846b0
  9. Apr 09, 2019
  10. Apr 05, 2019
    • Simon Pilgrim's avatar
      [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC · 17586cda
      Simon Pilgrim authored
      Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).
      
      This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........
      
      Differential Revision: https://reviews.llvm.org/D60006
      
      llvm-svn: 357765
      17586cda
    • Piotr Sobczak's avatar
      [SelectionDAG] Compute known bits of CopyFromReg · 0376ac1d
      Piotr Sobczak authored
      Summary:
      Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
      the virtual reg used has one def only.
      
      This can be particularly useful when calling isBaseWithConstantOffset()
      with the ISD::CopyFromReg argument, as more optimizations may get enabled
      in the result.
      
      Also add a missing truncation on X86, found by testing of this patch.
      
      Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
      
      Reviewers: bogner, craig.topper, RKSimon
      
      Reviewed By: RKSimon
      
      Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D59535
      
      llvm-svn: 357745
      0376ac1d
  11. Apr 04, 2019
  12. Apr 03, 2019
    • Sanjay Patel's avatar
      [DAGCombiner] loosen restrictions for moving shuffles after vector binop · 00dae6b2
      Sanjay Patel authored
      There are 3 changes to make this correspond to the same transform in instcombine:
      1. Remove the legality check - we can't create anything less legal than we started with.
      2. Ease the use restriction, so we only bail out if both operands have >1 use.
      3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).
      
      As discussed in D60150, there's a scalarization opportunity that will be made
      easier by allowing this transform more generally.
      
      llvm-svn: 357580
      00dae6b2
  13. Apr 02, 2019
  14. Mar 29, 2019
  15. Mar 28, 2019
    • Diana Picus's avatar
      [ARM GlobalISel] Run regbankselect test for Thumb. NFCI · 13ef0c53
      Diana Picus authored
      This should just work, since ARM mode and Thumb2 mode are at the same
      level of support now and should map the same to GPR and FPR.
      
      llvm-svn: 357159
      13ef0c53
    • Diana Picus's avatar
      [ARM GlobalISel] Fix G_STORE with s1 · 52495c47
      Diana Picus authored
      G_STORE for 1-bit values uses a STRBi12, which stores the whole byte.
      Zero out the undefined bits before writing.
      
      llvm-svn: 357154
      52495c47
    • Diana Picus's avatar
      [ARM GlobalISel] Fix selection of G_SELECT · 4d512df3
      Diana Picus authored
      G_SELECT uses a 1-bit scalar for the condition, and is currently
      implemented with a plain CMPri against 0. This means that values such as
      0x1110 are interpreted as true, when instead the higher bits should be
      treated as undefined and therefore ignored. Replace the CMPri with a
      TSTri against 0x1, which performs an implicit AND, yielding the expected
      result.
      
      llvm-svn: 357153
      4d512df3
  16. Mar 27, 2019
  17. Mar 26, 2019
    • Nirav Dave's avatar
      [DAG] Avoid smart constructor-based dangling nodes. · a28c5145
      Nirav Dave authored
      Various SelectionDAG non-combine operations (e.g. the getNode smart
      constructor and legalization) may leave dangling nodes by applying
      optimizations or not fully pruning unused result values. This can
      result in nodes that are never added to the worklist and therefore can
      not be pruned.
      
      Add a node inserter as the current node deleter to make sure such
      nodes have the chance of being pruned.
      
      Many minor changes, mostly positive.
      
      llvm-svn: 356996
      a28c5145
  18. Mar 25, 2019
    • Diana Picus's avatar
      [ARM GlobalISel] 64-bit memops should be aligned · 254b11a0
      Diana Picus authored
      We currently use only VLDR/VSTR for all 64-bit loads/stores, so the
      memory operands must be word-aligned. Mark aligned operations as legal
      and narrow non-aligned ones to 32 bits.
      
      While we're here, also mark non-power-of-2 loads/stores as unsupported.
      
      llvm-svn: 356872
      254b11a0
  19. Mar 22, 2019
  20. Mar 20, 2019
    • Eli Friedman's avatar
      [ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. · 638be660
      Eli Friedman authored
      This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
      to something like "str r0, [sp]".
      
      For regular stack variables, this optimization was already implemented:
      we lower loads and stores using frame indexes, which are expanded later.
      However, when constructing a call frame for a call with more than four
      arguments, the existing optimization doesn't apply.  We need to use
      stores which are actually relative to the current value of sp, and don't
      have an associated frame index.
      
      This patch adds a special case to handle that construct.  At the DAG
      level, this is an ISD::STORE where the address is a CopyFromReg from SP
      (plus a small constant offset).
      
      This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
      instruction can access SP directly, so the COPY gets eliminated by
      existing code.
      
      The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
      cleanup: we shouldn't pretend that it can select anything other than
      frame indexes.
      
      Differential Revision: https://reviews.llvm.org/D59568
      
      llvm-svn: 356601
      638be660
  21. Mar 19, 2019
    • Matt Arsenault's avatar
      RegAllocFast: Remove early selection loop, the spill calculation will report... · c2e35a6f
      Matt Arsenault authored
      RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs
      
      The 2nd loop calculates spill costs but reports free registers as cost
      0 anyway, so there is little benefit from having a separate early
      loop.
      
      Surprisingly this is not NFC, as many register are marked regDisabled
      so the first loop often picks up later registers unnecessarily instead
      of the first one available in the allocation order...
      
      Patch by Matthias Braun
      
      llvm-svn: 356499
      c2e35a6f
Loading