Skip to content
  1. Sep 21, 2021
    • Jonas Paulsson's avatar
      [SystemZ] Emit EXRL target instructions before text section is ended. · a48b43f9
      Jonas Paulsson authored
      SystemZ adds the EXRL target instructions in the end of each file. This must
      be done before debug info emission since that may end the text section, and
      therefore this is now done in emitConstantPools() (instead of in
      emitEndOfAsmFile).
      
      Review: Ulrich Weigand
      
      Differential Revision: https://reviews.llvm.org/D109513
      a48b43f9
    • Florian Hahn's avatar
    • Nicholas Guy's avatar
      [AArch64] Improve schedule modelling on the Cortex-A55 · 9e4d7267
      Nicholas Guy authored
      Enables the FuseAddress feature in the Cortex-A55 scheduling model
      
      Differential Revision: https://reviews.llvm.org/D109323
      9e4d7267
    • Simon Pilgrim's avatar
      [InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) · fc8f1e44
      Simon Pilgrim authored
      If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector
      
      Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057
      fc8f1e44
    • Simon Pilgrim's avatar
      [CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. · 20b58855
      Simon Pilgrim authored
      Avoid unnecessary copies, reported by MSVC static analyzer.
      20b58855
    • Simon Pilgrim's avatar
      RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI. · f5d23d36
      Simon Pilgrim authored
      Avoid unnecessary copies, reported by MSVC static analyzer.
      f5d23d36
    • Simon Pilgrim's avatar
      [CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. · 0f83456c
      Simon Pilgrim authored
      Reported by MSVC static analyzer.
      0f83456c
    • Dmitry Vyukov's avatar
      tsan: simplify thread context setting · 9d7b7350
      Dmitry Vyukov authored
      Currently we set thr->tctx after OnStarted callback
      taking thread registry mutex again and searching for the context.
      But OnStarted already runs under the thread registry mutex
      and has access to the context, so set it in the OnStarted.
      This makes code simpler and faster.
      
      Depends on D110132.
      
      Reviewed By: melver
      
      Differential Revision: https://reviews.llvm.org/D110133
      9d7b7350
    • Dmitry Vyukov's avatar
      tsan: rearrange thread state callbacks (NFC) · 908256b0
      Dmitry Vyukov authored
      Thread state functions are split into 2 parts:
      tsan entry function (e.g. ThreadStart) and thread registry
      state change callback (e.g. OnStart). Currently these
      pairs of functions are located far from each other and
      in reverse order. This makes it hard to read and follow the logic.
      Reorder the code so that OnFoo directly follows ThreadFoo.
      No other code changes.
      
      Reviewed By: melver
      
      Differential Revision: https://reviews.llvm.org/D110132
      908256b0
    • Dmitry Vyukov's avatar
      tsan: fix debug format strings · 6fe35ef4
      Dmitry Vyukov authored
      Some of the DPrintf's currently produce -Wformat warnings if enabled.
      Fix these format strings.
      
      Reviewed By: melver
      
      Differential Revision: https://reviews.llvm.org/D110131
      6fe35ef4
    • Jay Foad's avatar
      [AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN · 598bebea
      Jay Foad authored
      FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac
      if there are no source modifiers, just like we do for other mad/mac and
      fma/fmac cases.
      
      Differential Revision: https://reviews.llvm.org/D110074
      598bebea
    • Jay Foad's avatar
      [AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used · 86dcb592
      Jay Foad authored
      v_fmac with source modifiers forces VOP3 encoding, but it is strictly
      better to use the VOP3-only v_fma instead, because $dst and $src2 are
      not tied so it gives the register allocator more freedom and avoids a
      copy in some cases.
      
      This is the same strategy we already use for v_mad vs v_mac and
      v_fma_legacy vs v_fmac_legacy.
      
      Differential Revision: https://reviews.llvm.org/D110070
      86dcb592
    • David Green's avatar
    • Max Kazantsev's avatar
      [SCEV] Use isAvailableAtLoopEntry in the asserts · cd166fb2
      Max Kazantsev authored
      This is what is supposed to be there.
      cd166fb2
    • Petar Avramovic's avatar
      GlobalISel/Utils: Refactor constant splat match functions · 8bc71856
      Petar Avramovic authored
      Add generic helper function that matches constant splat. It has option to
      match constant splat with undef (some elements can be undef but not all).
      Add util function and matcher for G_FCONSTANT splat.
      
      Differential Revision: https://reviews.llvm.org/D104410
      8bc71856
    • Max Kazantsev's avatar
      [SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond · 4d5d7254
      Max Kazantsev authored
      The logic in howManyLessThans is fishy. It first checks invariance of
      RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
      is, strictly saying, a different thing. We are seeing a very rare intermittent
      failure of availability checks, and it looks like this precondition is
      sometimes broken. Before we can figure out what's going on, adding asserts
      that all involved values that may possibly to to isLoopEntryGuardedByCond
      are available at loop entry.
      
      If either of these asserts fails (OrigRHS is the most likely suspect), it
      means that the logic here is flawed.
      4d5d7254
    • David Stenberg's avatar
      [LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist · 7b4cc09b
      David Stenberg authored
      This fixes PR51730, a heap-use-after-free bug in
      replaceConditionalBranchesOnConstant().
      
      With the attached reproducer we were left with a function looking
      something like this after replaceAndRecursivelySimplify():
      
        [...]
      
        cont2.i:
          br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i
      
        handler.type_mismatch3.i:
          %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
          unreachable
      
        cont4.i:
          unreachable
      
        [...]
      
      with both the branch instruction and PHI node being in the worklist. As
      a result of replacing the branch instruction with an unconditional
      branch, the PHI node in %handler.type_mismatch3.i would be removed. This
      then resulted in a heap-use-after-free bug due to accessing that removed
      PHI node in the next worklist iteration.
      
      This is solved by using a value handle worklist. I am a unsure if this
      is the most idiomatic solution. Another solution could have been to
      produce a worklist just containing the interesting branch instructions,
      but I thought that it perhaps was a bit cleaner to keep all worklist
      filtering in the loop that does the rewrites.
      
      Reviewed By: lebedev.ri
      
      Differential Revision: https://reviews.llvm.org/D109221
      7b4cc09b
    • Justas Janickas's avatar
      [OpenCL] Test case for C++ for OpenCL 2021 in OpenCL C header test · 57b8b5c1
      Justas Janickas authored
      RUN line representing C++ for OpenCL 2021 added to the test. This
      should have been done as part of earlier commit fb321c2e but
      was missed during rebasing.
      
      Differential Revision: https://reviews.llvm.org/D109492
      57b8b5c1
    • Uday Bondhugula's avatar
      [MLIR] NFC. gpu.launch op argument const folder cleanup · 5c77ed03
      Uday Bondhugula authored
      NFC updates to gpu.launch op argument const folder.
      
      Differential Revision: https://reviews.llvm.org/D110136
      5c77ed03
    • Andrzej Warzynski's avatar
      [flang][docs] Document plugin limitations · 7e7484a8
      Andrzej Warzynski authored
      This was extracted from the discussion on
      https://reviews.llvm.org/D108283
      
      .
      
      Co-authored-by: default avatarKiran Chandramohan <kiran.chandramohan@arm.com>
      
      Differential Revision: https://reviews.llvm.org/D109871
      7e7484a8
    • Sylvestre Ledru's avatar
      Add CMAKE_BUILD_TYPE to the list of BOOTSTRAP_DEFAULT_PASSTHROUGH variables · eccd477c
      Sylvestre Ledru authored
      When building clang in stage2, when -DCMAKE_BUILD_TYPE=RelWithDebInfo is set,
      the developer can expect that the stage2 clang is built using the same mode.
      Especially as the performances are much worst in debug mode.
      (Principle of least astonishment)
      
      Differential Revision: https://reviews.llvm.org/D53014
      eccd477c
    • Cullen Rhodes's avatar
      [PowerPC] NFC: Remove unused tblgen template args · b23d22f7
      Cullen Rhodes authored
      Identified in D109359.
      
      Reviewed By: nemanjai
      
      Differential Revision: https://reviews.llvm.org/D109715
      b23d22f7
    • Morten Borup Petersen's avatar
      [MLIR][SCF] Add for-to-while loop transformation pass · 032cb165
      Morten Borup Petersen authored
      This pass transforms SCF.ForOp operations to SCF.WhileOp. The For loop condition is placed in the 'before' region of the while operation, and indctuion variable incrementation + the loop body in the 'after' region. The loop carried values of the while op are the induction variable (IV) of the for-loop + any iter_args specified for the for-loop.
      Any 'yield' ops in the for-loop are rewritten to additionally yield the (incremented) induction variable.
      
      This transformation is useful for passes where we want to consider structured control flow solely on the basis of a loop body and the computation of a loop condition. As an example, when doing high-level synthesis in CIRCT, the incrementation of an IV in a for-loop is "just another part" of a circuit datapath, and what we really care about is the distinction between our datapath and our control logic (the condition variable).
      
      Differential Revision: https://reviews.llvm.org/D108454
      032cb165
    • Pavel Labath's avatar
      [lldb] Speculative fix to TestGuiExpandThreadsTree · 791b6ebc
      Pavel Labath authored
      This test relies on being able to unwind from an arbitrary place inside
      libc. While I am not sure this is the cause of the observed flakyness,
      it is known that we are not able to unwind correctly from some places in
      (linux) libc.
      
      This patch adds additional synchronization to ensure that the inferior
      is in the main function (instead of pthread guts) when lldb tries to
      unwind it. At the very least, it should make the test runs more
      predictable/repeatable.
      791b6ebc
    • Kunwar Shaanjeet Singh Grover's avatar
      [MLIR] Add mergeLocalIds and mergeSymbolIds · 0d12c991
      Kunwar Shaanjeet Singh Grover authored
      This patch adds mergeLocalIds andmergeSymbolIds as public functions
      for FlatAffineConstraints and FlatAffineValueConstraints respectively.
      
      mergeLocalIds is also required to support divisions in intersection,
      subtraction, equality checks, and complement for PresburgerSet.
      
      This patch is part of a series of patches aimed at generalizing affine
      dependence analysis.
      
      Reviewed By: bondhugula
      
      Differential Revision: https://reviews.llvm.org/D110045
      0d12c991
    • Nathan Ridge's avatar
      [clangd] Deduplicate inlay hints · d87d1aa0
      Nathan Ridge authored
      Duplicates can sometimes appear due to e.g. explicit template
      instantiations
      
      Differential Revision: https://reviews.llvm.org/D110051
      d87d1aa0
    • Amara Emerson's avatar
      [GlobalISel][Legalizer] Use ArtifactValueFinder first for unmerge combines before trying others. · cc65e08f
      Amara Emerson authored
      This is motivated by an pathological compile time issue during unmerge combining.
      
      We should be able to use the AVF to do simplification. However AMDGPU
      has a lot of codegen changes which I'm not sure how to evaluate.
      
      Differential Revision: https://reviews.llvm.org/D109748
      cc65e08f
    • Evgeniy Brevnov's avatar
      [DSE][NFC] Rename Later->Killing, Earlier->Dead · 129cf336
      Evgeniy Brevnov authored
      First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones.
      
      Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too.
      
      Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes.
      
      Reviewed By: fhahn
      
      Differential Revision: https://reviews.llvm.org/D106947
      129cf336
    • Amara Emerson's avatar
      [GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. · 7091a7f7
      Amara Emerson authored
      For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
      seem to have debug users of defs. However, in the legalizer we're always calling
      MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
      In some rare cases, this contributes significantly to unreasonably long compile
      times when we have lots of artifact combiner activity.
      
      To verify this, I added asserts to that function when it actually replaced a debug
      use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
      debug info enabled, I didn't see a single case where it triggered.
      
      In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
      for AArch64 with this change.
      
      Differential Revision: https://reviews.llvm.org/D109750
      7091a7f7
    • Max Kazantsev's avatar
      [SCEV] Generalize implication when signedness of FoundPred doesn't matter · 2c7d5fbc
      Max Kazantsev authored
      The implication logic for two values that are both negative or non-negative
      says that it doesn't matter whether their predicate is signed and unsigned,
      but only flips unsigned into signed for further inference. This patch adds
      support for flipping a signed predicate into unsigned as well.
      
      Differential Revision: https://reviews.llvm.org/D109959
      Reviewed By: nikic
      2c7d5fbc
    • Yonghong Song's avatar
      BPF: make 32bit register spill with 64bit alignment · ea72b031
      Yonghong Song authored
      In llvm, for non-alu32 mode, the stack alignment is 64bit so only one
      64bit spill per 64bit slot. For alu32 mode, the stack alignment
      is 32bit, so it is possible to have two 32bit spills per
      64bit slot.
      
      Currently, bpf kernel verifier does not preserve register states
      for 32bit spills. That is, one 32bit register may hold a constant
      value or a bounded range before spill. After reload from the
      stack, the information is lost and sometimes this may cause
      verifier failure. For 64bit register spill, the verifier
      indeed tries to preserve the register state for reloading.
      
      The current verifier can be modestly changed to handle one
      32bit spill per 64bit stack slot with state-preserving reload.
      Handling two 32bit spills per 64bit stack slot will require
      substantial changes.
      
      This patch changes stack alignment for alu32 to be 64bit.
      This way, for any 64bit slot in alu32 mode, only one
      32bit or 64bit register values can be saved. Together
      with previous-mentioned verifier enhancement, 32bit
      spill can be handled with state preserving.
      
      Note that llvm stack slot coallescing
      seems only doing adjacent packing which may leave some holes
      in the stack. For example,
         stack slot 8   <== 8 bytes
         stack slot 4   <== 8 bytes with 4 byte hole
         stack slot 8   <== 8 bytes
         stack slot 4   <== 4 bytes
      
      Differential Revision: https://reviews.llvm.org/D109073
      ea72b031
    • Chris Lattner's avatar
      [OpAsmParser] Add a parseCommaSeparatedList helper and beef up Delimeter. · 58abc8c3
      Chris Lattner authored
      Lots of custom ops have hand-rolled comma-delimited parsing loops, as does
      the MLIR parser itself.  Provides a standard interface for doing this that
      is less error prone and less boilerplate.
      
      While here, extend Delimiter to support <> and {} delimited sequences as
      well (I have a use for <> in CIRCT specifically).
      
      Differential Revision: https://reviews.llvm.org/D110122
      58abc8c3
    • Max Kazantsev's avatar
      [SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block · 073b254c
      Max Kazantsev authored
      When following a case of a switch instruction is guaranteed to lead to
      UB, we can safely break these edges and redirect those cases into a newly
      created unreachable block. As result, CFG will become simpler and we can
      remove some of Phi inputs to make further analyzes easier.
      
      Patch by Dmitry Bakunevich!
      
      Differential Revision: https://reviews.llvm.org/D109428
      Reviewed By: lebedev.ri
      073b254c
    • Michael Kruse's avatar
      [Polly] Don't generate inter-iteration noalias metadata. · cad9f98a
      Michael Kruse authored
      This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.
      
      Rhe implemention had multiple issues:
      
       * The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.
      
       * This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.
      
       * Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.
      
       * Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.
      
       * The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.
      
      A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.
      
      This effectively reverts D30606 and D35761.
      cad9f98a
    • Max Kazantsev's avatar
    • Kazu Hirata's avatar
      [llvm] Use make_early_inc_range (NFC) · 85b4b21c
      Kazu Hirata authored
      85b4b21c
    • River Riddle's avatar
      [mlir] Refactor ElementsAttr into an AttrInterface · d80d3a35
      River Riddle authored
      This revision refactors ElementsAttr into an Attribute Interface.
      This enables a common interface with which to interact with
      element attributes, without needing to modify the builtin
      dialect. It also removes a majority (if not all?) of the need for
      the current OpaqueElementsAttr, which was originally intended as
      a way to opaquely represent data that was not representable by
      the other builtin constructs.
      
      The new ElementsAttr interface not only allows for users to
      natively represent their data in the way that best suits them,
      it also allows for efficient opaque access and iteration of the
      underlying data. Attributes using the ElementsAttr interface
      can directly expose support for interacting with the held
      elements using any C++ data type they claim to support. For
      example, DenseIntOrFpElementsAttr supports iteration using
      various native C++ integer/float data types, as well as
      APInt/APFloat, and more. ElementsAttr instances that refer to
      DenseIntOrFpElementsAttr can use all of these data types for
      iteration:
      
      ```c++
      DenseIntOrFpElementsAttr intElementsAttr = ...;
      
      ElementsAttr attr = intElementsAttr;
      for (uint64_t value : attr.getValues<uint64_t>())
        ...;
      for (APInt value : attr.getValues<APInt>())
        ...;
      for (IntegerAttr value : attr.getValues<IntegerAttr>())
        ...;
      ```
      
      ElementsAttr also supports failable range/iterator access,
      allowing for selective code paths depending on data type
      support:
      
      ```c++
      ElementsAttr attr = ...;
      if (auto range = attr.tryGetValues<uint64_t>()) {
        for (uint64_t value : *range)
          ...;
      }
      ```
      
      Differential Revision: https://reviews.llvm.org/D109190
      d80d3a35
    • River Riddle's avatar
      [mlir] Add value_begin/value_end methods to DenseElementsAttr · 0cb5d7fc
      River Riddle authored
      Currently DenseElementsAttr only exposes the ability to get the full range of values for a given type T, but there are many situations where we just want the beginning/end iterator. This revision adds proper value_begin/value_end methods for all of the supported T types, and also cleans up a bit of the interface.
      
      Differential Revision: https://reviews.llvm.org/D104173
      0cb5d7fc
    • River Riddle's avatar
      [mlir] Tighten verification of SparseElementsAttr · 4f21152a
      River Riddle authored
      SparseElementsAttr currently does not perform any verfication on construction, with the only verification existing within the parser. This revision moves the parser verification to SparseElementsAttr, and also adds additional verification for when a sparse index is not valid.
      
      Differential Revision: https://reviews.llvm.org/D109189
      4f21152a
    • Stella Laurenzo's avatar
      [mlir][python] Forward _OperationBase _CAPIPtr to the Operation. · 1fb2e842
      Stella Laurenzo authored
      * ODS generated operations extend _OperationBase and without this, cannot be marshalled to CAPI functions.
      * No test case updates: this kind of interop is quite hard to verify with in-tree tests.
      
      Differential Revision: https://reviews.llvm.org/D110030
      1fb2e842
Loading