Skip to content
  1. Mar 10, 2021
    • Mauri Mustonen's avatar
      [VPlan] Support to widen select intructions in VPlan native path · 0de8aeae
      Mauri Mustonen authored
      Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer.
      
      Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139.
      
      Reviewed By: fhahn
      
      Differential Revision: https://reviews.llvm.org/D97136
      0de8aeae
    • Christudasan Devadasan's avatar
      GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM · 4c6ab48f
      Christudasan Devadasan authored
      It is good to have a combined `divrem` instruction when the
      `div` and `rem` are computed from identical input operands.
      Some targets can lower them through a single expansion that
      computes both division and remainder. It effectively reduces
      the number of instructions than individually expanding them.
      
      Reviewed By: arsenm, paquette
      
      Differential Revision: https://reviews.llvm.org/D96013
      4c6ab48f
    • Wei Mi's avatar
      [SampleFDO] Support enabling -funique-internal-linkage-name. · ee35784a
      Wei Mi authored
      now -funique-internal-linkage-name flag is available, and we want to flip
      it on by default since it is beneficial to have separate sample profiles
      for different internal symbols with the same name. As a preparation, we
      want to avoid regression caused by the flip.
      
      When we flip -funique-internal-linkage-name on, the profile is collected
      from binary built without -funique-internal-linkage-name so it has no uniq
      suffix, but the IR in the optimized build contains the suffix. This kind of
      mismatch may introduce transient regression.
      
      To avoid such mismatch, we introduce a NameTable section flag indicating
      whether there is any name in the profile containing uniq suffix. Compiler
      will decide whether to keep uniq suffix during name canonicalization
      depending on the NameTable section flag. The flag is only available for
      extbinary format. For other formats, by default compiler will keep uniq
      suffix so they will only experience transient regression when
      -funique-internal-linkage-name is just flipped.
      
      Another type of regression is caused by places where we miss to call
      getCanonicalFnName. Those places are fixed.
      
      Differential Revision: https://reviews.llvm.org/D96932
      ee35784a
  2. Mar 09, 2021
  3. Mar 08, 2021
    • Stephen Tozer's avatar
      Fix 2: [DebugInfo] Support DIArgList in DbgVariableIntrinsic · 57a0e0d4
      Stephen Tozer authored
      Changes to function calls in LocalTest resulted in comparisons between
      unsigned values and signed literals; the latter have been updated to be
      unsigned to prevent this warning.
      57a0e0d4
    • gbtozers's avatar
      [DebugInfo] Support DIArgList in DbgVariableIntrinsic · e5d958c4
      gbtozers authored
      This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
      location operand, resulting in a significant change to its interface. This patch
      does not update all IR passes to support multiple location operands in a
      dbg.value; the only change is to update the DbgVariableIntrinsic interface and
      its uses. All code outside of the intrinsic classes assumes that an intrinsic
      will always have exactly one location operand; they will still support
      DIArgLists, but only if they contain exactly one Value.
      
      Among other changes, the setOperand and setArgOperand functions in
      DbgVariableIntrinsic have been made private. This is to prevent code from
      setting the operands of these intrinsics directly, which could easily result in
      incorrect/invalid operands being set. This does not prevent these functions from
      being called on a debug intrinsic at all, as they can still be called on any
      CallInst pointer; it is assumed that any code directly setting the operands on a
      generic call instruction is doing so safely. The intention for making these
      functions private is to prevent DIArgLists from being overwritten by code that's
      naively trying to replace one of the Values it points to, and also to fail fast
      if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
      corresponding DIExpression.
      e5d958c4
  4. Mar 05, 2021
    • gbtozers's avatar
      [DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics · 65600cb2
      gbtozers authored
      This patch adds a new metadata node, DIArgList, which contains a list of SSA
      values. This node is in many ways similar in function to the existing
      ValueAsMetadata node, with the difference being that it tracks a list instead of
      a single value. Internally, it uses ValueAsMetadata to track the individual
      values, but there is also a reasonable amount of DIArgList-specific
      value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special
      case in parsing and printing due to the fact that it requires a function state
      (as it may reference function-local values).
      
      This patch should not result in any immediate functional change; it allows for
      DIArgLists to be parsed and printed, but debug variable intrinsics do not yet
      recognize them as a valid argument (outside of parsing).
      
      Differential Revision: https://reviews.llvm.org/D88175
      65600cb2
  5. Mar 04, 2021
    • Petar Avramovic's avatar
      Reland [GlobalISel] Start using vectors in GISelKnownBits · d7834556
      Petar Avramovic authored
      This is recommit of 4c8fb7dd.
      MIR in one unit test had mismatched types.
      
      For vectors we consider a bit as known if it is the same for all demanded
      vector elements (all elements by default). KnownBits BitWidth for vector
      type is size of vector element. Add support for G_BUILD_VECTOR.
      This allows combines of urem_pow2_to_mask in pre-legalizer combiner.
      
      Differential Revision: https://reviews.llvm.org/D96122
      d7834556
    • Nicolas Guillemot's avatar
      Revert "[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream" · 6b8cf735
      Nicolas Guillemot authored
      This reverts commit 7479a2e0.
      
      This commit causes compile errors on clang-x64-windows-msvc, so I'm
      reverting the patch for now.
      
      For reference, the error in question is:
      
      ```
      error C2280: 'llvm::raw_ostream_iterator<char,char>
      &llvm::raw_ostream_iterator<char,char>::operator =(const
      llvm::raw_ostream_iterator<char,char> &)': attempting to reference a deleted
      function
      
      note: compiler has generated 'llvm::raw_ostream_iterator<char,char>::operator ='
      here
      
      note: 'llvm::raw_ostream_iterator<char,char>
      &llvm::raw_ostream_iterator<char,char>::operator =(const
      llvm::raw_ostream_iterator<char,char> &)': function was implicitly deleted
      because 'llvm::raw_ostream_iterator<char,char>' has a data member
      'llvm::raw_ostream_iterator<char,char>::OutStream' of reference type
      ```
      6b8cf735
    • Nicolas Guillemot's avatar
      [Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream · 7479a2e0
      Nicolas Guillemot authored
      Adds a class `raw_ostream_iterator` that behaves like
      std::ostream_iterator, but can be used with raw_ostream.
      This is useful for using raw_ostream with std algorithms.
      
      For example, it can be used to output std containers as follows:
      
      ```
      std::vector<int> V = { 1, 2, 3 };
      std::copy(V.begin(), V.end(), raw_ostream_iterator<int>(outs(), ", "));
      // Output: "1, 2, 3, "
      ```
      
      The API tries to follow std::ostream_iterator as closely as is
      practically possible.
      
      Reviewed By: dblaikie, mkitzan
      
      Differential Revision: https://reviews.llvm.org/D78795
      7479a2e0
    • Daniel Sanders's avatar
      [mir] Fix confusing MIR when MMO's value is nullptr but offset is non-zero · 9fc2be6f
      Daniel Sanders authored
      :: (store 1 + 4, addrspace 1)
      ->
      :: (store 1 into undef + 4, addrspace 1)
      
      An offset without a base isn't terribly useful but it's convenient to update
      the offset without checking the value. For example, when breaking apart
      stores into smaller units
      
      Differential Revision: https://reviews.llvm.org/D97812
      9fc2be6f
    • Xiangling Liao's avatar
      [CMake][AIX] Adjust plugin library extension used on AIX · e9f9ec83
      Xiangling Liao authored
      As stated in the CMake manual, we are supposed to use MODULE rules to generate
      plugin libraries:
      
      "MODULE libraries are plugins that are not linked into other targets but may be
      loaded dynamically at runtime using dlopen-like functionality"
      
      Besides, LLVM's plugin infrastructure fits with the AIX treatment of .so
      shared objects more than it fits with the AIX treatment of .a library archives
      (which may contain shared objects).
      
      Differential revision: https://reviews.llvm.org/D96282
      e9f9ec83
    • Sanjay Patel's avatar
      [Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC · 36a489d1
      Sanjay Patel authored
      Similar to b3a33553, but this shows a TODO and a potential
      miscompile is already present.
      
      We are tracking an FP instruction that does *not* have FMF (reassoc)
      properties, so calling that "Unsafe" seems opposite of the common
      reading.
      
      I also removed one getter method by rolling the null check into
      the access. Further simplification may be possible.
      
      The motivation is to clean up the interactions between FMF and
      function-level attributes in these classes and their callers.
      
      The new test shows that there is an existing bug somewhere in
      the callers. We assumed that the original code was fully 'fast'
      and so we produced IR with 'fast' even though it was just 'reassoc'.
      36a489d1
    • Nico Weber's avatar
      Revert "[GlobalISel] Start using vectors in GISelKnownBits" · 4b101536
      Nico Weber authored
      This reverts commit 4c8fb7dd.
      Breaks check-llvm everywhere, see https://reviews.llvm.org/D96122
      4b101536
    • Petar Avramovic's avatar
      [GlobalISel] Start using vectors in GISelKnownBits · 4c8fb7dd
      Petar Avramovic authored
      For vectors we consider a bit as known if it is the same for all demanded
      vector elements (all elements by default). KnownBits BitWidth for vector
      type is size of vector element. Add support for G_BUILD_VECTOR.
      This allows combines of urem_pow2_to_mask in pre-legalizer combiner.
      
      Differential Revision: https://reviews.llvm.org/D96122
      4c8fb7dd
    • David Green's avatar
      [ARM] Remove new ARMSelectionDAGTest unittest. · 098aea95
      David Green authored
      This removes the unit test from a968e7b8 as it reportedly causes
      some link problems. It can be reinstated once the issues are understood
      and sorted out.
      098aea95
    • Martin Storsjö's avatar
      1bdb6366
    • David Green's avatar
      [ARM] KnownBits for CSINC/CSNEG/CSINV · a968e7b8
      David Green authored
      This adds some simple known bits handling for the three CSINC/NEG/INV
      instructions. From the operands known bits we can compute the common
      bits of the first operand and incremented/negated/inverted second
      operand. The first, especially CSINC ZR, ZR, comes up fair amount in the
      tests. The others are more rare so a unit test for them is added.
      
      Differential Revision: https://reviews.llvm.org/D97788
      a968e7b8
  6. Mar 03, 2021
    • Piotr Sobczak's avatar
      [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm · c3ce7bae
      Piotr Sobczak authored
       * Introduce the new intrinsic amdgcn_strict_wwm
       * Deprecate the old intrinsic amdgcn_wwm
      
      The change is done for consistency as the "strict"
      prefix will become an important, distinguishing factor
      between amdgcn_wqm and amdgcn_strictwqm in the future.
      
      The "strict" prefix indicates that inactive lanes do not
      take part in control flow, specifically an inactive lane
      enabled by a strict mode will always be enabled irrespective
      of control flow decisions.
      
      The amdgcn_wwm will be removed, but doing so in two steps
      gives users time to switch to the new name at their own pace.
      
      Reviewed By: critson
      
      Differential Revision: https://reviews.llvm.org/D96257
      c3ce7bae
  7. Mar 02, 2021
    • Matt Arsenault's avatar
      GlobalISel: Merge and cleanup more AMDGPU call lowering code · fd82cbcf
      Matt Arsenault authored
      This merges more AMDGPU ABI lowering code into the generic call
      lowering. Start cleaning up by factoring away more of the pack/unpack
      logic into the buildCopy{To|From}Parts functions. These could use more
      improvement, and the SelectionDAG versions are significantly more
      complex, and we'll eventually have to emulate all of those cases too.
      
      This is mostly NFC, but does result in some minor instruction
      reordering. It also removes some of the limitations with mismatched
      sizes the old code had. However, similarly to the merge on the input,
      this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we
      actually want, but SelectionDAG is stuck using the weird emergent
      ABI).
      
      This also changes the load/store size for stack passed EVTs for
      AArch64, which makes it consistent with the DAG behavior.
      fd82cbcf
    • dfukalov's avatar
      [AA] Cache (optionally) estimated PartialAlias offsets. · 6e967834
      dfukalov authored
      For the cases of two clobbering loads and one loaded object is fully contained
      in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
      is actually more common case of partial overlap, it doesn't say anything about
      actual overlapping sizes.
      
      AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
      with non-constant offsets. The change stores estimated relative offsets so they
      can be used further.
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D93529
      6e967834
    • Tim Northover's avatar
      AArch64: report fp16 arithmetic is present for apple-a11 CPU. · 888c5c24
      Tim Northover authored
      AArch64.td got it right, but the target-parser dropped it, leading to missing
      feature flags in Clang.
      888c5c24
    • Stefan Gränitz's avatar
      [Orc] Fix a file header (NFC) · b66b73be
      Stefan Gränitz authored
      b66b73be
  8. Feb 26, 2021
  9. Feb 24, 2021
  10. Feb 23, 2021
    • Heejin Ahn's avatar
      [WebAssembly] Fix incorrect grouping and sorting of exceptions · ea8c6375
      Heejin Ahn authored
      This CL is not big but contains changes that span multiple analyses and
      passes. This description is very long because it tries to explain basics
      on what each pass/analysis does and why we need this change on top of
      that. Please feel free to skip parts that are not necessary for your
      understanding.
      
      ---
      
      `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
      unwind destination>. The value (unwind dest) here is where an exception
      should end up when it is not caught by the key (EH pad). We record this
      info in WasmEHPrepare to fix catch mismatches, because the CFG itself
      does not have this info. A CFG only contains BBs and
      predecessor-successor relationship between them, but in `WasmEHFuncInfo`
      the unwind destination BB is not necessarily a successor or the key EH
      pad BB. Their relationship can be intuitively explained by this C++ code
      snippet:
      ```
      try {
        try {
          foo();
        } catch (int) { // EH pad
          ...
        }
      } catch (...) {   // unwind destination
      }
      ```
      So when `foo()` throws, it goes to `catch (int)` first. But if it is not
      caught by it, it ends up in the next unwind destination `catch (...)`.
      This unwind destination is what you see in `catchswitch`'s
      `unwind label %bb` part.
      
      ---
      
      `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
      continuously together in CFGSort, as we do for loops. What this analysis
      does is very simple: it creates a single `WebAssemblyException` per EH
      pad, and all BBs that are dominated by that EH pad are included in this
      exception. We also identify subexception relationship in this way: if
      EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
      EHPad A's exception.
      
      This simple rule turns out to be incorrect in some cases. In
      `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
      semantically EHPad B should not be included in EHPad A's exception,
      because it does not make sense to rethrow/delegate to an inner scope.
      This is what happened in CFGStackify as a result of this:
      ```
      try
        try
        catch
          ...   <- %dest_bb is among here!
        end
      delegate %dest_bb
      ```
      
      So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
      make sure excptions' unwind destinations are not subexceptions of
      their unwind sources in `WasmEHFuncInfo`.
      
      But this alone does not prevent `dest_bb` in the example above from
      being sorted within the inner `catch`'s exception, even if its exception
      is not a subexception of that `catch`'s exception anymore, because of
      how CFGSort works, which will be explained below.
      
      ---
      
      CFGSort places BBs within the same `SortRegion` (loop or exception)
      continuously together so they can be demarcated with `loop`-`end_loop`
      or `catch`-`end_try` in CFGStackify.
      
      `SortRegion` is a wrapper for one of `MachineLoop` or
      `WebAssemblyException`. `SortRegionInfo` already does some complicated
      things because there discrepancies between those two data structures.
      `WebAssemblyException` is what we control, and it is defined as an EH
      pad as its header and BBs dominated by the header as its BBs (with a
      newly added exception of unwind destinations explained in the previous
      paragraph). But `MachineLoop` is an LLVM data structure and uses the
      standard loop detection algorithm. So by the algorithm, BBs that are 1.
      dominated by the loop header and 2. have a path back to its header.
      Because of the second condition, many BBs that are dominated by the loop
      header are not included in the loop. So BBs that contain `return` or
      branches to outside of the loop are not technically included in
      `MachineLoop`, but they can be sorted together with the loop with no
      problem.
      
      Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
      we allow sorting of not only BBs that belong to the current innermost
      region but also BBs that are by the current region header.
      (This was written this way from the first version written by Dan, when
      only loops existed.) But now, we have cases in exceptions when EHPad B
      is the unwind destination for EHPad A, even if EHPad B is dominated by
      EHPad A it should not be included in EHPad A's exception, and should not
      be sorted within EHPad A.
      
      One way to make things work, at least correctly, is change `dominates`
      condition to `contains` condition for `SortRegion` when sorting BBs, but
      this will change compilation results for existing non-EH code and I
      can't be sure it will not degrade performance or code size. I think it
      will degrade performance because it will force many BBs dominated by a
      loop, which don't have the path back to the header, to be placed after
      the loop and it will likely to create more branches and blocks.
      
      So this does a little hacky check when adding BBs to `Preferred` list:
      (`Preferred` list is a ready list. CFGSort maintains ready list in two
      priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
      was written that way from the beginning. BBs are first added to
      `Preferred` list and then some of them are pushed to `Ready` list, so
      here we only need to guard condition for `Preferred` list.)
      
      When adding a BB to `Preferred` list, we check if that BB is an unwind
      destination of another BB. To do this, this adds the reverse mapping,
      `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
      is an unwind destination, it checks if the current stack of regions
      (`Entries`) contains its source BB by traversing the stack backwards. If
      we find its unwind source in there, we add the BB to its `Deferred`
      list, to make sure that unwind destination BB is added to `Preferred`
      list only after that region with the unwind source BB is sorted and
      popped from the stack.
      
      ---
      
      This does not contain a new test that crashes because of this bug, but
      this fix changes the result for one of existing test case. This test
      case didn't crash because it fortunately didn't contain `delegate` to
      the incorrectly placed unwind destination BB.
      
      Fixes https://github.com/emscripten-core/emscripten/issues/13514.
      
      Reviewed By: dschuff, tlively
      
      Differential Revision: https://reviews.llvm.org/D97247
      ea8c6375
    • Alexey Lapshin's avatar
      [Support] Add reserve() method to the raw_ostream. · 875b3b2c
      Alexey Lapshin authored
      If resulting size of the output stream is already known,
      then the space for stream data could be preliminary
      allocated in some cases. f.e. raw_string_ostream could
      preallocate the space for the target string(it allows
      to avoid reallocations during writing into the stream).
      
      Differential Revision: https://reviews.llvm.org/D91693
      875b3b2c
    • Cassie Jones's avatar
      [GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE · 8f956a5e
      Cassie Jones authored
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D96673
      8f956a5e
    • Cassie Jones's avatar
      [GlobalISel] Implement narrowScalar for SADDO/SSUBO · e1532649
      Cassie Jones authored
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D96672
      e1532649
    • Cassie Jones's avatar
      [GlobalISel] Implement narrowScalar for UADDO/USUBO · c63b33b7
      Cassie Jones authored
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D96671
      c63b33b7
    • Lang Hames's avatar
    • Lang Hames's avatar
Loading