Skip to content
  1. Mar 18, 2020
    • Florian Hahn's avatar
      [VPlan] Use underlying value for printing, if available. · e6a74803
      Florian Hahn authored
      When the an underlying value is available, we can use its name for
      printing, as discussed in D73078.
      
      Reviewers: rengolin, hsaito, Ayal, gilr
      
      Reviewed By: Ayal
      
      Differential Revision: https://reviews.llvm.org/D76200
      e6a74803
    • Simon Tatham's avatar
      [ARM,MVE] Add intrinsics for the VQDMLAD family. · e13d153c
      Simon Tatham authored
      Summary:
      This is another set of instructions too complicated to be sensibly
      expressed in IR by anything short of a target-specific intrinsic.
      Given input vectors a,b, the instruction generates intermediate values
      2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
      half of each double-width values, and overwrites half the lanes in the
      output vector c, which you therefore have to provide the input value
      of. Optionally you can swap the elements of b so that the are things
      like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
      taking the high half; and optionally you can take the difference
      rather than sum of the two products. Finally, saturation is applied
      when converting back to a single-width vector lane.
      
      Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
      
      Reviewed By: miyuki
      
      Subscribers: kristof.beyls, hiraditya, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D76359
      e13d153c
    • Nico Weber's avatar
      Revert "[Syntax] Build template declaration nodes" · 881f5b5a
      Nico Weber authored
      This reverts commit dd128268.
      Breaks tests on Windows, see https://reviews.llvm.org/D76346#1929208
      881f5b5a
    • Guillaume Chatelet's avatar
      [libc] Adding memcpy implementation for x86_64 · 04a309dd
      Guillaume Chatelet authored
      Summary:
      The patch is not ready yet and is here to discuss a few options:
       - How do we customize the implementation? (i.e. how to define `kRepMovsBSize`),
       - How do we specify custom compilation flags? (We'd need `-fno-builtin-memcpy` to be passed in),
       - How do we build? We may want to test in debug but build the libc with `-march=native` for instance,
       - Clang has a brand new builtin `__builtin_memcpy_inline` which makes the implementation easy and efficient, but:
         - If we compile with `gcc` or `msvc` we can't use it, resorting on less efficient code generation,
         - With gcc we can use `__builtin_memcpy` but then we'd need a postprocess step to check that the final assembly do not contain call to `memcpy` (unlikely but allowed),
         - For msvc we'd need to resort on the compiler optimization passes.
      
      Reviewers: sivachandra, abrachet
      
      Subscribers: mgorny, MaskRay, tschuett, libc-commits, courbet
      
      Tags: #libc-project
      
      Differential Revision: https://reviews.llvm.org/D74397
      04a309dd
    • Nico Weber's avatar
      642a424b
    • Sam Parker's avatar
      [NFC][PowerPC] Update test · fc2a5ef9
      Sam Parker authored
      Run the update script on one of the loop unroll tests.
      fc2a5ef9
    • Matt Arsenault's avatar
      AMDGPU: Initial, crude support for indirect calls · 4ea1baf6
      Matt Arsenault authored
      This isn't really usable, and requires using the
      -amdgpu-fixed-function-abi flag to work.
      
      Assumes a uniform call target, and will hit a verifier error if the
      call target ends up in a VGPR. Also doesn't attempt to do anything
      sensible for the reported register/stack usage.
      4ea1baf6
    • Matt Arsenault's avatar
      Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" · ea4597ee
      Matt Arsenault authored
      This reverts commit 9bca8fc4.
      
      Rearrange handling to avoid changing the instruction in the case where
      it's going to be erased and replaced with undef.
      ea4597ee
    • Piotr Sobczak's avatar
      [AMDGPU] Fix AMDGPUUnifyDivergentExitNodes · d1a7bfca
      Piotr Sobczak authored
      Summary:
      For the case where "done" bits on existing exports are removed
      by unifyReturnBlockSet(), unify all return blocks - even the
      uniformly reached ones. We do not want to end up with a non-unified,
      uniformly reached block containing a normal export with the "done"
      bit cleared.
      
      That case is believed to be rare - possible with infinite loops
      in pixel shaders.
      
      This is a fix for D71192.
      
      Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76364
      d1a7bfca
    • Nico Weber's avatar
    • Simon Pilgrim's avatar
    • Nico Weber's avatar
      Reland "[gn build] (manually) port 8b409eab" · 9f981e9a
      Nico Weber authored
      This reverts commit 4060016f
      and re-merges c5b81466.
      9f981e9a
    • Marcel Hlopko's avatar
      [Syntax] Build template declaration nodes · dd128268
      Marcel Hlopko authored
      Summary:
      Copy of https://reviews.llvm.org/D72334, submitting with Ilya's permission.
      
      Handles template declaration of all kinds.
      
      Also builds template declaration nodes for specializations and explicit
      instantiations of classes.
      
      Some missing things will be addressed in the follow-up patches:
      
      specializations of functions and variables,
      template parameters.
      
      Reviewers: gribozavr2
      
      Reviewed By: gribozavr2
      
      Subscribers: cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D76346
      dd128268
    • Sander de Smalen's avatar
      [InstCombine] GEPOperator::accumulateConstantOffset does not support scalable vectors · ef64ba83
      Sander de Smalen authored
      Avoid transforming:
      
       %0 = bitcast i8* %base to <vscale x 16 x i8>*
       %1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %0, i64 1
      
      into:
      
       %0 = getelementptr i8, i8* %base, i64 16
       %1 = bitcast i8* %0 to <vscale x 16 x i8>*
      
      Reviewers: efriedma, ctetreau
      
      Reviewed By: efriedma
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76236
      ef64ba83
    • Chris Bowler's avatar
      [PowerPC][AIX] Implement by-val caller arguments in a single register. · c2186647
      Chris Bowler authored
      This is the first of a series of patches that adds caller support for
      by-value arguments. This patch add support for arguments that are passed in a
      single GPR.
      
      There are 3 limitation cases:
      -The by-value argument is larger than a single register.
      -There are no remaining GPRs even though the by-value argument would
      otherwise fit in a single GPR.
      -The by-value argument requires alignment greater than register width.
      
      Future patches will be required to add support for these cases as well
      as for the callee handling (in LowerFormalArguments_AIX) that
      corresponds to this work.
      
      Differential Revision: https://reviews.llvm.org/D75863
      c2186647
    • Jan Kratochvil's avatar
      [lldb] [testsuite] Enable forgotten -gsplit-dwarf for 2 testfiles · 3481062b
      Jan Kratochvil authored
      D63643 added these testfiles but some of the %t4dwo and %t5dwo builds
      are the same as corresponding %t4 and %t5 builds. Fortunately the
      testcases do PASS.
      
      After just adding -gsplit-dwarf these both skeleton files:
        tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp4dwo
        tools/lldb/test/SymbolFile/DWARF/Output/debug-types-expressions.test.tmp5dwo
      
      were referencing to this one non-skeleton file:
        tools/lldb/test/SymbolFile/DWARF/debug-types-expressions.dwo
      
      Surprisingly it does not affect the other test debug-types-basic.test
      probably because it compiles to .o and then links it. While
      debug-types-expressions.test compiles directly to an executable.
      
      So fixed that while keeping the direct executable compilation.
      
      Differential Revision: https://reviews.llvm.org/D76316
      3481062b
    • Simon Pilgrim's avatar
      [InstCombine][X86] Add additional demandedelts style test for in-range... · 24c2e613
      Simon Pilgrim authored
      [InstCombine][X86] Add additional demandedelts style test for in-range variable per-element shift amounts (PR40391)
      
      If we've shuffled the shift amount some of the (undemanded) elements may have become undef - this should be handled by the missing support in PR36319.
      24c2e613
    • Mehdi Amini's avatar
      Fix `warning: extra ‘;’` (NFC) · 4d506da9
      Mehdi Amini authored
      4d506da9
    • Mehdi Amini's avatar
      Fix build with gcc 7.5 by adding a "redundant move" · f3e297d9
      Mehdi Amini authored
      The constructor of Expected<T> expects as T&&, but gcc-7.5 does not
      infer an rvalue in this context apparently.
      f3e297d9
    • Roman Lebedev's avatar
      [NFCI][SCEV] Avoid recursion in SCEVExpander::isHighCostExpansion*() · 85334b03
      Roman Lebedev authored
      Summary:
      As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
      [[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
      always avoid recursive algorithms, and that causes issues with
      large expression depths and/or smaller stack sizes.
      
      In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
      recursion is rather idiomatic. We simply need to place the root expr
      into a vector, and iterate over vector elements accounting for the cost
      of each one, adding new exprs at the end of the vector,
      thus achieving recursion-less traversal.
      
      The order in which we will visit exprs doesn't matter here,
      so we will be fine with the most basic approach of using SmallVector
      and inserting/extracting from the back, which accidentally is the same
      depth-first traversal that we were doing previously recursively.
      
      Reviewers: mkazantsev, reames, wmi, ekatz
      
      Reviewed By: mkazantsev
      
      Subscribers: hiraditya, javed.absar, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76273
      85334b03
    • Oliver Stannard's avatar
      [IPRA][ARM] Spill extra registers at -Oz · 73cea83a
      Oliver Stannard authored
      When optimising for code size at the expense of performance, it is often
      worth saving and restoring some of r0-r3, if IPRA will be able to take
      advantage of them. This doesn't cost any extra code size if we already
      have a PUSH/POP pair, and increases the number of available registers
      across any calls to the function.
      
      We already have an optimisation which tries fold the subtract/add of the
      SP into the PUSH/POP by using extra registers, which somewhat conflicts
      with this. I've made the new optimisation less aggressive in cases where
      the existing one is likely to trigger, which gives better results than
      either of these optimisations by themselves.
      
      Differential revision: https://reviews.llvm.org/D69936
      73cea83a
    • Guillaume Chatelet's avatar
      [Alignment][NFC] Deprecate getMaxAlignment · d000655a
      Guillaume Chatelet authored
      Summary:
      This is patch is part of a series to introduce an Alignment type.
      See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
      See this patch for the introduction of the type: https://reviews.llvm.org/D64790
      
      Reviewers: courbet
      
      Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76348
      d000655a
    • Kang Zhang's avatar
    • Danila Malyutin's avatar
      2aaafaf5
    • Michael Liao's avatar
      [hip] Revise `GlobalDecl` constructors. NFC. · 4cf01ed7
      Michael Liao authored
      Summary:
      - https://reviews.llvm.org/D68578 revises the `GlobalDecl` constructors
        to ensure all GPU kernels have `ReferenceKenelKind` initialized
        properly with an explicit constructor and static one. But, there are
        lots of places using the implicit constructor triggering the assertion
        on non-GPU kernels. That's found in compilation of many tests and
        workloads.
      - Fixing all of them may change more code and, more importantly, all of
        them assumes the default kernel reference kind. This patch changes
        that constructor to tell `CUDAGlobalAttr` and construct `GlobalDecl`
        properly.
      
      Reviewers: yaxunl
      
      Subscribers: cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D76344
      4cf01ed7
    • Oliver Stannard's avatar
      [ARM] Track epilogue instructions with FrameDestroy flag (NFC) · 6739805e
      Oliver Stannard authored
      Rather than trying to work out which instructions are part of the
      epilogue by examining them, we can just mark them with the FrameDestroy
      flag, like we do in the AArch64 backend.
      6739805e
    • Kazuaki Ishizaki's avatar
      [mlir] NFC: Fix trivial typos in documents · a8901a03
      Kazuaki Ishizaki authored
      Fix trivial typos
      
      Reviewers: mravishankar, antiagainst, ftynse
      
      Reviewed By: ftynse
      
      Subscribers: ftynse, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, bader, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76347
      a8901a03
    • Med Ismail Bennani's avatar
      [lldb/Target] Support more than 2 symbols in StackFrameRecognizer · db31e2e1
      Med Ismail Bennani authored
      This patch changes the way the StackFrame Recognizers match a certain
      frame.
      
      Until now, recognizers could be registered with a function
      name but also an alternate symbol.
      This change is motivated by a test failure for the Assert frame
      recognizer on Linux. Depending the version of the libc, the abort
      function (triggered by an assertion), could have more than two
      signatures (i.e. `raise`, `__GI_raise` and `gsignal`).
      
      Instead of only checking the default symbol name and the alternate one,
      lldb will iterate over a list of symbols to match against.
      
      rdar://60386577
      
      Differential Revision: https://reviews.llvm.org/D76188
      
      
      
      Signed-off-by: default avatarMed Ismail Bennani <medismail.bennani@gmail.com>
      db31e2e1
    • Alexey Bataev's avatar
      [OPENMP50]Codegen for detach clause. · b09cce07
      Alexey Bataev authored
      Implemented codegen for detach clause in task directives.
      b09cce07
    • Francesco Petrogalli's avatar
      [llvm][SVE] Addressing mode for FF/NF loads. · 9bdcd9bf
      Francesco Petrogalli authored
      Summary:
      This patch adds addressing mode computation for the following SVE
      instructions:
      
      * ldff1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, <Xm>{, lsl #imm}}]
      * ldnf1{s}<T1> { <Zt>.<T2> }, <Pg>/Z, [<Xn|SP>{, #<imm>, mul vl}]
      
      Reviewers: andwar, sdesmalen, rengolin, efriedma
      
      Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76209
      9bdcd9bf
    • Sander de Smalen's avatar
      [AArch64][SVE] Change pointer type of nontemporal load/store intrinsics · 4788ca45
      Sander de Smalen authored
      Summary:
      This fixes a discrepancy between the non-temporal loads/store
      intrinsics and other SVE load intrinsics (such as nf/ff), so
      that Clang can use the same code to generate these intrinsics.
      
      Reviewers: andwar, kmclaughlin, rengolin, efriedma
      
      Reviewed By: efriedma
      
      Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D76237
      4788ca45
    • Danila Malyutin's avatar
      Fix possible assertion when using PBQP with debug info · 940ba146
      Danila Malyutin authored
      Skip debug instructions before calling functions not expecting them.
      In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.
      
      Differential Revision: https://reviews.llvm.org/D76129
      940ba146
    • David Stenberg's avatar
      [DebugInfo] Fix multi-byte entry values in call site values · a0a3a9c5
      David Stenberg authored
      Summary:
      In D67768/D67492 I added support for entry values having blocks larger
      than one byte, but I now noticed that the DIE implementation I added there
      was broken. The takeNodes() function, that moves the entry value block
      from a temporary buffer to the output buffer, would destroy the input
      iterator when transferring the first node, meaning that only that node
      was moved.
      
      In practice, this meant that when emitting a call site value using a
      DW_OP_entry_value operation with a DWARF register number larger than 31,
      that multi-byte DW_OP_regx expression would be truncated.
      
      Reviewers: djtodoro, aprantl, vsk
      
      Reviewed By: djtodoro
      
      Subscribers: llvm-commits
      
      Tags: #debug-info, #llvm
      
      Differential Revision: https://reviews.llvm.org/D76279
      a0a3a9c5
    • Florian Hahn's avatar
    • Simon Pilgrim's avatar
      [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element... · f4e495a1
      Simon Pilgrim authored
      [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)
      
      AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
      f4e495a1
    • Sander de Smalen's avatar
      Reland D75470 [SVE] Auto-generate builtins and header for svld1. · c5b81466
      Sander de Smalen authored
      Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
      include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
      generates the enum/flags which is included in TargetBuiltins.h.
      
      Also renamed one of the SveEmitter options to be in line with MVE.
      
      Summary:
      
      This is a first patch in a series for the SveEmitter to generate the arm_sve.h
      header file and builtins.
      
      I've tried my best to strip down this patch as best as I could, but there
      are still a few changes that are not necessarily exercised by the load intrinsics
      in this patch, mostly around the SVEType class which has some common logic to
      represent types from a type and prototype string. I thought it didn't make
      much sense to remove that from this patch and split it up.
      c5b81466
    • Simon Tatham's avatar
      [ARM,MVE] Add intrinsics for the VQDMLAH family. · 928776de
      Simon Tatham authored
      Summary:
      These are complicated integer multiply+add instructions with extra
      saturation, taking the high half of a double-width product, and
      optional rounding. There's no sensible way to represent that in
      standard IR, so I've converted the clang builtins directly to
      target-specific intrinsics.
      
      Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
      
      Reviewed By: miyuki
      
      Subscribers: kristof.beyls, hiraditya, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D76123
      928776de
    • Simon Tatham's avatar
      [ARM,MVE] Add intrinsics and isel for MVE integer VMLA. · 28c5d97b
      Simon Tatham authored
      Summary:
      These instructions compute multiply+add in integers, with one of the
      operands being a splat of a scalar. (VMLA and VMLAS differ in whether
      the splat operand is a multiplier or the addend.)
      
      I've represented these in IR using existing standard IR operations for
      the unpredicated forms. The predicated forms are done with target-
      specific intrinsics, as usual.
      
      When operating on n-bit vector lanes, only the bottom n bits of the
      i32 scalar operand are used. So we have to tell that to isel lowering,
      to allow it to remove a pointless sign- or zero-extension instruction
      on that input register. That's done in `PerformIntrinsicCombine`, but
      first I had to enable `PerformIntrinsicCombine` for MVE targets
      (previously all the intrinsics it handled were for NEON), and make it
      a method of `ARMTargetLowering` so that it can get at
      `SimplifyDemandedBits`.
      
      Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
      
      Reviewed By: dmgreen
      
      Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D76122
      28c5d97b
    • serge-sans-paille's avatar
      Fix ac1d23ed interaction with gold plugin · 8d019cda
      serge-sans-paille authored
      Correctly set RelocationModel, thanks @modocache for spotting this.
      
      Related to differential revision: https://reviews.llvm.org/D75579
      8d019cda
    • Simon Pilgrim's avatar
      [InstCombine][X86] Tests for variable but in-range per-element shift amounts (PR40391) · cda2b076
      Simon Pilgrim authored
      These shifts are masked to be inrange so we should be able to replace them with generic shifts.
      cda2b076
Loading