Skip to content
  1. Feb 23, 2021
    • Heejin Ahn's avatar
      [WebAssembly] Fix incorrect grouping and sorting of exceptions · ea8c6375
      Heejin Ahn authored
      This CL is not big but contains changes that span multiple analyses and
      passes. This description is very long because it tries to explain basics
      on what each pass/analysis does and why we need this change on top of
      that. Please feel free to skip parts that are not necessary for your
      understanding.
      
      ---
      
      `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
      unwind destination>. The value (unwind dest) here is where an exception
      should end up when it is not caught by the key (EH pad). We record this
      info in WasmEHPrepare to fix catch mismatches, because the CFG itself
      does not have this info. A CFG only contains BBs and
      predecessor-successor relationship between them, but in `WasmEHFuncInfo`
      the unwind destination BB is not necessarily a successor or the key EH
      pad BB. Their relationship can be intuitively explained by this C++ code
      snippet:
      ```
      try {
        try {
          foo();
        } catch (int) { // EH pad
          ...
        }
      } catch (...) {   // unwind destination
      }
      ```
      So when `foo()` throws, it goes to `catch (int)` first. But if it is not
      caught by it, it ends up in the next unwind destination `catch (...)`.
      This unwind destination is what you see in `catchswitch`'s
      `unwind label %bb` part.
      
      ---
      
      `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
      continuously together in CFGSort, as we do for loops. What this analysis
      does is very simple: it creates a single `WebAssemblyException` per EH
      pad, and all BBs that are dominated by that EH pad are included in this
      exception. We also identify subexception relationship in this way: if
      EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
      EHPad A's exception.
      
      This simple rule turns out to be incorrect in some cases. In
      `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
      semantically EHPad B should not be included in EHPad A's exception,
      because it does not make sense to rethrow/delegate to an inner scope.
      This is what happened in CFGStackify as a result of this:
      ```
      try
        try
        catch
          ...   <- %dest_bb is among here!
        end
      delegate %dest_bb
      ```
      
      So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
      make sure excptions' unwind destinations are not subexceptions of
      their unwind sources in `WasmEHFuncInfo`.
      
      But this alone does not prevent `dest_bb` in the example above from
      being sorted within the inner `catch`'s exception, even if its exception
      is not a subexception of that `catch`'s exception anymore, because of
      how CFGSort works, which will be explained below.
      
      ---
      
      CFGSort places BBs within the same `SortRegion` (loop or exception)
      continuously together so they can be demarcated with `loop`-`end_loop`
      or `catch`-`end_try` in CFGStackify.
      
      `SortRegion` is a wrapper for one of `MachineLoop` or
      `WebAssemblyException`. `SortRegionInfo` already does some complicated
      things because there discrepancies between those two data structures.
      `WebAssemblyException` is what we control, and it is defined as an EH
      pad as its header and BBs dominated by the header as its BBs (with a
      newly added exception of unwind destinations explained in the previous
      paragraph). But `MachineLoop` is an LLVM data structure and uses the
      standard loop detection algorithm. So by the algorithm, BBs that are 1.
      dominated by the loop header and 2. have a path back to its header.
      Because of the second condition, many BBs that are dominated by the loop
      header are not included in the loop. So BBs that contain `return` or
      branches to outside of the loop are not technically included in
      `MachineLoop`, but they can be sorted together with the loop with no
      problem.
      
      Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
      we allow sorting of not only BBs that belong to the current innermost
      region but also BBs that are by the current region header.
      (This was written this way from the first version written by Dan, when
      only loops existed.) But now, we have cases in exceptions when EHPad B
      is the unwind destination for EHPad A, even if EHPad B is dominated by
      EHPad A it should not be included in EHPad A's exception, and should not
      be sorted within EHPad A.
      
      One way to make things work, at least correctly, is change `dominates`
      condition to `contains` condition for `SortRegion` when sorting BBs, but
      this will change compilation results for existing non-EH code and I
      can't be sure it will not degrade performance or code size. I think it
      will degrade performance because it will force many BBs dominated by a
      loop, which don't have the path back to the header, to be placed after
      the loop and it will likely to create more branches and blocks.
      
      So this does a little hacky check when adding BBs to `Preferred` list:
      (`Preferred` list is a ready list. CFGSort maintains ready list in two
      priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
      was written that way from the beginning. BBs are first added to
      `Preferred` list and then some of them are pushed to `Ready` list, so
      here we only need to guard condition for `Preferred` list.)
      
      When adding a BB to `Preferred` list, we check if that BB is an unwind
      destination of another BB. To do this, this adds the reverse mapping,
      `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
      is an unwind destination, it checks if the current stack of regions
      (`Entries`) contains its source BB by traversing the stack backwards. If
      we find its unwind source in there, we add the BB to its `Deferred`
      list, to make sure that unwind destination BB is added to `Preferred`
      list only after that region with the unwind source BB is sorted and
      popped from the stack.
      
      ---
      
      This does not contain a new test that crashes because of this bug, but
      this fix changes the result for one of existing test case. This test
      case didn't crash because it fortunately didn't contain `delegate` to
      the incorrectly placed unwind destination BB.
      
      Fixes https://github.com/emscripten-core/emscripten/issues/13514.
      
      Reviewed By: dschuff, tlively
      
      Differential Revision: https://reviews.llvm.org/D97247
      ea8c6375
    • Daniel Hwang's avatar
      [scan-build-py] Add sarif-html support in scan-build-py · 97a304cc
      Daniel Hwang authored
      Update scan-build-py to be able to trigger sarif-html output format in clang static analyzer.
      
      NOTE: testcase `test_sarif_and_html_creates_sarif_and_html_reports` will fail if the default clang does not have change https://reviews.llvm.org/D96389 . This can be remediated by pointing the default clang in arguments.py to a locally built clang. I was unable to figure out where these particular tests for scan-build-py are being invoked (aside from manually), so any help there would be greatly appreciated.
      
      Reviewed By: aabbaabb, xazax.hun
      
      Differential Revision: https://reviews.llvm.org/D96570
      97a304cc
    • Amara Emerson's avatar
      Fix a range-loop-analysis warning. · 4691405b
      Amara Emerson authored
      4691405b
    • Heejin Ahn's avatar
      [WebAssembly] Disable wasm.lsda() optimization in WasmEHPrepare · 445f4e74
      Heejin Ahn authored
      In every catchpad except `catch (...)`, we add a call to
      `_Unwind_CallPersonality`, which is a wapper to call the personality
      function. (In most of other Itanium-based architectures the call is done
      from libunwind, but in wasm we don't have the control over the VM.)
      Because the personatlity function is called to figure out whether the
      current exception is a type we should catch, such as `int` or
      `SomeClass&`, `catch (...)` does not need the personality function call.
      For the same reason, all cleanuppads don't need it.
      
      When we call `_Unwind_CallPersonality`, we store some necessary info in
      a data structure called `__wasm_lpad_context` of type
      `_Unwind_LandingPadContext`, which is defined  in the wasm's port of
      libunwind in Emscripten. Also the personality wrapper function returns
      some info (selector and the caught pointer) in that data structure, so
      it is used as a medium for communication.
      
      One of the info we need to store is the address for LSDA info for the
      current function. `wasm.lsda()` intrinsic returns that address. (This
      intrinsic will be lowered to a symbol that points to the LSDA address.)
      The simpliest thing is call `wasm.lsda()` every time we need to call
      `_Unwind_CallPersonality` and store that info in `__wasm_lpad_context`
      data structure. But we tried to be better than that (D77423 and some
      more previous CLs), so if catchpad A dominates catchpad B and catchpad A
      is not `catch (...)`, we didn't insert `wasm.lsda()` call in catchpad B,
      thinking that the LSDA address is the same for a single function and we
      already visited catchpad A and `__wasm_lpad_context.lsda` field would
      already have that value.
      
      But this can be incorrect if there is a call to another function, which
      also can have the personality function and LSDA, between catchpad A and
      catchpad B, because `__wasm_lpad_context` is a globally defined
      structure and the callee function will overwrite its `lsda` field.
      
      So in this CL we don't try to do any optimizaions on adding
      `wasm.lsda()` call; we store the result of `wasm.lsda()` every time we
      call `_Unwind_CallPersonality`. We can do some complicated analysis,
      like checking if there is a function call between the dominating
      catchpad and the current catchpad, but at this time it seems overkill.
      
      This deletes three tests because they all tested `wasm.ldsa()` call
      optimization.
      
      Fixes https://github.com/emscripten-core/emscripten/issues/13548.
      
      Reviewed By: tlively
      
      Differential Revision: https://reviews.llvm.org/D97309
      445f4e74
    • River Riddle's avatar
      [mlir][Inliner] Use llvm::parallelForEach instead of llvm::parallelTransformReduce · abd3c6f2
      River Riddle authored
      llvm::parallelTransformReduce does not schedule work on the caller thread, which becomes very costly for
      the inliner where a majority of SCCs are small, often ~1 element. The switch to llvm::parallelForEach solves this,
      and also aligns the implementation with the PassManager (which realistically should share the same implementation).
      
      This change dropped compile time on an internal benchmark by ~1(25%) second.
      
      Differential Revision: https://reviews.llvm.org/D96086
      abd3c6f2
    • River Riddle's avatar
      [mlir] Refactor InterfaceMap to use a sorted vector of interfaces, as opposed to a DenseMap · 65a3197a
      River Riddle authored
      A majority of operations have a very small number of interfaces, which means that the cost of using a hash map is generally larger for interface lookups than just a binary search. In the future when there are a number of operations with large amounts of interfaces, we can switch to a hybrid approach that optimizes lookups based on the number of interfaces. For now, however, a binary search is the best approach.
      
      This dropped compile time on a largish TF MLIR module by 20%(half a second).
      
      Differential Revision: https://reviews.llvm.org/D96085
      65a3197a
    • David Green's avatar
      8fa2bbae
    • Matt Arsenault's avatar
      AMDGPU: Use aligned vgprs/agprs in gfx90a mir tests · e844f24a
      Matt Arsenault authored
      These would fail a verifier check in a future change.
      e844f24a
    • David Crook's avatar
      [SEMA] Added warn_decl_shadow support for structured bindings · 039f79c7
      David Crook authored
      https://bugs.llvm.org/show_bug.cgi?id=40858
      
      CheckShadow is now called for each binding in the structured binding to make sure it does not shadow any other variable in scope. This does use a custom implementation of getShadowedDeclaration though because a BindingDecl is not a VarDecl
      
      Added a few unit tests for this. In theory though all the other shadow unit tests should be duplicated for the structured binding variables too but whether it is probably not worth it as they use common code. The MyTuple and std interface code has been copied from live-bindings-test.cpp
      
      Reviewed By: rsmith
      
      Differential Revision: https://reviews.llvm.org/D96147
      039f79c7
    • zero9178's avatar
      [Driver][Windows] Support per-target runtimes dir layout for profile instr generate · 7f9d5d6e
      zero9178 authored
      When targeting a MSVC triple, --dependant-libs with the name of the clang runtime library for profiling is added to the command line args. In it's current implementations clang_rt.profile-<ARCH> is chosen as the name. When building a distribution using LLVM_ENABLE_PER_TARGET_RUNTIME_DIR this fails, due to the runtime file names not having an architecture suffix in the filename.
      
      This patch refactors getCompilerRT and getCompilerRTBasename to always consider per-target runtime directories. getCompilerRTBasename now simply returns the filename component of the path found by getCompilerRT
      
      Differential Revision: https://reviews.llvm.org/D96638
      7f9d5d6e
    • Jorge Gorbe Moya's avatar
      Defer the decision whether to use the CU or TU index until after reading the unit header. · 979ca1c0
      Jorge Gorbe Moya authored
      In DWARF v4 compile units go in .debug_info and type units go in
      .debug_types. However, in v5 both kinds of units are in .debug_info.
      Therefore we can't decide whether to use the CU or TU index just by
      looking at which section we're reading from. We have to wait until we
      have read the unit type from the header.
      
      Differential Revision: https://reviews.llvm.org/D96194
      979ca1c0
    • Aart Bik's avatar
      [mlir][sparse] incorporate vector index into address computation · 17fa9198
      Aart Bik authored
      When computing dense address, a vectorized index must be accounted
      for properly. This bug was formerly undetected because we get 0 * prev + i
      in most cases, which folds away the scalar part. Now it works for all cases.
      
      Reviewed By: bixia
      
      Differential Revision: https://reviews.llvm.org/D97317
      17fa9198
    • Eric Schweitz's avatar
      [flang][fir][NFC] remove dead code · 67406947
      Eric Schweitz authored
      Removes unused function from FatalError.h.
      
      Differential revision: https://reviews.llvm.org/D97328
      67406947
    • Matthew Voss's avatar
      [llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata · 6da7d314
      Matthew Voss authored
      Under certain (currently unknown) conditions, llvm-profdata is outputting
      profiles that have two consecutive entries in the MemOPSize section for the
      value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
      instruction with two cases for 0. As mentioned, we’re not quite sure what’s
      causing this to happen, but this patch prevents llvm-profdata from outputting a
      profile that has this problem and gives an error with a request for a
      reproducible.
      
      Differential Revision: https://reviews.llvm.org/D92074
      6da7d314
    • David Green's avatar
      [AArch64] Introduce UDOT/SDOT DAG nodes · f51b3de4
      David Green authored
      This is used to lower UDOT/SDOT instructions, as opposed to relying on
      the intrinsic. Subsequent optimizations will be able to optimize them
      more cleanly based on these nodes.
      f51b3de4
    • Lang Hames's avatar
      Revert "[docs][ORC] Fix section title and reference." · 479db97a
      Lang Hames authored
      This reverts commit 6e1affe7, which caused an
      error on the Sphinx doc bot.
      479db97a
    • Craig Topper's avatar
      [RISCV] Use a different constant in one of the smulo test cases to avoid... · 5e233ff1
      Craig Topper authored
      [RISCV] Use a different constant in one of the smulo test cases to avoid converting the mul to an add.
      5e233ff1
    • Jessica Paquette's avatar
      Recommit "[AArch64][GlobalISel] Match G_SHUFFLE_VECTOR -> insert elt + extract elt" · ef1f7f1d
      Jessica Paquette authored
      Attempted fix for the added test failing.
      
      https://lab.llvm.org/buildbot/#/builders/104/builds/2355/steps/5/logs/stdio
      
      I can't reproduce the failure anywhere, so I'm going to guess that passing a
      std::function as MatchInfo is sketchy in this context.
      
      Switch it to a std::tuple and hope for the best.
      ef1f7f1d
    • Amara Emerson's avatar
      [AArch64][GlobalISel] Lower G_USUBSAT and G_UADDSAT for scalars. · 939b5ce7
      Amara Emerson authored
      We have some missing optimization counterparts to LowerXALUO, but it's a start.
      939b5ce7
    • Florian Hahn's avatar
      [AArch64] Regenerate check lines for neon-compare-instructions.ll. · fd03e359
      Florian Hahn authored
      Auto-generate tests so they can be updated more easily, e.g. for D97303.
      fd03e359
    • Andrei Elovikov's avatar
      [NFC][VPlan] Use VPUser to store block's predicate · 3605b873
      Andrei Elovikov authored
      Reviewed By: fhahn
      
      Differential Revision: https://reviews.llvm.org/D96529
      3605b873
    • Florian Hahn's avatar
      [LV] Ensure fixNonInductionPHIs uses a valid insertion point. · de40423c
      Florian Hahn authored
      In some cases, Builder's insertion point may be invalidated before using
      it in VPTransformState::get. Make sure the insertion point is
      up-to-date.
      
      This should fix various sanitizer errors, like
      https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio
      de40423c
    • Nathan James's avatar
      [clang-tidy] Add cppcoreguidelines-prefer-member-initializer to ReleaseNotes · 2af5275f
      Nathan James authored
      Following a discussion about the current state of this check on the 12.X branch, it was decided to purge the check as it wasn't in a fit to release state, see https://llvm.org/PR49318.
      This check has since had some of those issues addressed and should be good for the next release cycle now, pending any more bug reports about it.
      
      Reviewed By: aaron.ballman
      
      Differential Revision: https://reviews.llvm.org/D97275
      2af5275f
    • Simon Pilgrim's avatar
      [InstSimplify] Handle nsw shl -> poison patterns · 1020d161
      Simon Pilgrim authored
      Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.
      
      Differential Revision: https://reviews.llvm.org/D97305
      1020d161
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Set threshold for regbanks reassign pass · d1b92c91
      Stanislav Mekhanoshin authored
      This is to limit compile time. I did experiments with some
      inputs and found that compile time keeps reasonable for this
      pass if we have less than 100000 virtual registers and then
      starts to explode somewhere between 100000 and 150000.
      
      Differential Revision: https://reviews.llvm.org/D97218
      d1b92c91
    • Andrzej Warzynski's avatar
      [flang][test] Share all driver test dirs between `f18` and `flang-new` · 5e54bef4
      Andrzej Warzynski authored
      Originally, when we added the new driver, we created dedicated test
      directories for `flang-new`. This way we separated the tests for the
      `throwaway` and the new driver.
      
      As we are increasing test coverage and starting to share tests between
      the two drivers, it makes sense to share all directories and instead
      rely on:
      ```
      ! REQUIRES: new-flang-driver
      ```
      to mark tests as exclusively for the new driver.
      
      Differential Revision: https://reviews.llvm.org/D97207
      5e54bef4
    • Shilei Tian's avatar
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported... · f6c2984a
      Shilei Tian authored
      [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM
      
      `ptx71` is not supported in release version of LLVM yet. As a result,
      the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned
      in D97004. Since the support in D97004 is just a WA for releease, and we'll not
      use it in the near future, using `ptx70` for CUDA 11 is feasible.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D97195
      f6c2984a
    • Adam Straw's avatar
      make Affine parallel and yield ops MemRefsNormalizable · af8adea1
      Adam Straw authored
      Affine parallel ops may contain and yield results from MemRefsNormalizable ops in the loop body.  Thus, both affine.parallel and affine.yield should have the MemRefsNormalizable trait.
      
      Reviewed By: bondhugula
      
      Differential Revision: https://reviews.llvm.org/D96821
      af8adea1
    • Simon Pilgrim's avatar
      [InstructionSimplify] SimplifyShift - rename shift amount KnownBits. NFCI. · 18b9fc48
      Simon Pilgrim authored
      As suggested on D97305.
      18b9fc48
    • Duncan P. N. Exon Smith's avatar
      Revert "Module: Use FileEntryRef and DirectoryEntryRef in Umbrella, Header, and DirectoryName, NFC" · 64d8c781
      Duncan P. N. Exon&nbsp;Smith authored
      This (mostly) reverts 32c501dd.  Hit a
      case where this causes a behaviour change, perhaps the same root cause
      that triggered the revert of a40db550 in
      7799ef71.
      
      (The API changes in DirectoryEntry.h have NOT been reverted as a number
      of subsequent commits depend on those.)
      
      https://reviews.llvm.org/D90497#2582166
      64d8c781
    • Craig Topper's avatar
      [LegalizeIntegerTypes] Improve ExpandIntRes_SADDSUBO codegen on targets without SADDO/SSUBO. · eb165090
      Craig Topper authored
      This code creates 3 setccs that need to be expanded. It was
      creating a sign bit test as setge X, 0 which is non-canonical.
      Canonical would be setgt X, -1. This misses the special case in
      IntegerExpandSetCCOperands for sign bit tests that assumes
      canonical form. If we don't hit this special case we end up
      with a multipart setcc instead of just checking the sign of
      the high part.
      
      To fix this I've reversed the polarity of all of the setccs to
      setlt X, 0 which is canonical. The rest of the logic should
      still work. This seems to produce better code on RISCV which
      lacks a setgt instruction.
      
      This probably still isn't the best code sequence we could use here.
      
      Reviewed By: RKSimon
      
      Differential Revision: https://reviews.llvm.org/D97181
      eb165090
    • Nick Desaulniers's avatar
      [THUMB2] add .w suffixes for ldr/str (immediate) T4 · 1e204ac7
      Nick Desaulniers authored
      The Linux kernel when built with CONFIG_THUMB2_KERNEL makes use of these
      instructions with immediate operands and wide encodings.
      
      These are the T4 variants of the follow sections from the Arm ARM.
      F5.1.72 LDR (immediate)
      F5.1.229 STR (immediate)
      
      I wasn't able to represent these simple aliases using t2InstAlias due to
      the Constraints on the non-suffixed existing instructions, which results
      in some manual parsing logic needing to be added.
      
      F1.2 Standard assembler syntax fields
      describes the use of the .w (wide) vs .n (narrow) encoding suffix.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=49118
      Link: https://github.com/ClangBuiltLinux/linux/issues/1296
      
      
      Reported-by: default avatarStefan Agner <stefan@agner.ch>
      Reported-by: default avatarArnd Bergmann <arnd@kernel.org>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      
      Reviewed By: DavidSpickett
      
      Differential Revision: https://reviews.llvm.org/D96632
      1e204ac7
    • Emily Shi's avatar
      [darwin] use new crash reporter api · 956c90d3
      Emily Shi authored
      Add support for the new crash reporter api if the headers are available. Falls back to the old API if they are not available. This change was based on [[ https://github.com/llvm/llvm-project/blob/0164d546d2691c439fc06c8fff126224276c2d02/llvm/lib/Support/PrettyStackTrace.cpp#L111 | /llvm/lib/Support/PrettyStackTrace.cpp ]]
      
      There is a lit for this behavior here: https://reviews.llvm.org/D96737 but is not included in this diff because it is potentially flaky.
      
      rdar://69767688
      
      Reviewed By: delcypher, yln
      
      Commited by Dan Liew on behalf of Emily Shi.
      
      Differential Revision: https://reviews.llvm.org/D96830
      956c90d3
    • Emily Shi's avatar
      [darwin][asan] add test for application specific information in crash logs · b6099fa5
      Emily Shi authored
      Added a lit test that finds its corresponding crash log and checks to make sure it has asn output under `Application Specific Information`.
      
      This required adding two python commands:
      - `get_pid_from_output`: takes the output from the asan instrumentation and parses out the process ID
      - `print_crashreport_for_pid`: takes in the pid of the process and the file name of the binary that was run and prints the contents of the corresponding crash log.
      
      This test was added in preparation for changing the integration with crash reporter from the old api to the new api, which is implemented in a subsequent commit.
      
      rdar://69767688
      
      Reviewed By: delcypher
      
      Commited by Dan Liew on behalf of Emily Shi.
      
      Differential Revision: https://reviews.llvm.org/D96737
      b6099fa5
    • Jay Foad's avatar
    • Dave Lee's avatar
      [lldb] Add deref support and tests to shared_ptr synthetic · 0ac42fd2
      Dave Lee authored
      Add `frame variable` dereference suppport to libc++ `std::shared_ptr`.
      
      This change allows for commands like `v *thing_sp` and `v thing_sp->m_id`. These
      commands now work the same way they do with raw pointers. This is done by adding an
      unaccounted for child member named `$$dereference$$`.
      
      Also, add API tests for `std::shared_ptr`, previously there were none.
      
      Differential Revision: https://reviews.llvm.org/D97165
      0ac42fd2
    • Florian Hahn's avatar
      Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." · 437f0bbc
      Florian Hahn authored
      This reverts commit 4efa097e, because
      some the compilers used for some bots do not support automatic
      conversions to PointerUnion.
      437f0bbc
    • Florian Hahn's avatar
      [LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends. · 4efa097e
      Florian Hahn authored
      Generalize the return value of tryToCreateWidenRecipe to return either a
      newly create recipe or an existing VPValue. Use this to avoid creating
      unnecessary VPBlendRecipes.
      
      Fixes PR44800.
      4efa097e
    • Nicolai Hähnle's avatar
      [AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24 · 52bc2e75
      Nicolai Hähnle authored
      Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when
      possible. This significantly improves some game cases by eliminating
      v_readfirstlane instructions when the result feeds into a scalar
      operation, like the address calculation for a scalar load or store.
      
      Since isDivergent is only an approximation of whether a value is in
      SGPRs, it can potentially regress some situations where a uniform value
      ends up in a VGPR. These should be rare in real code, although the test
      changes do contain a number of examples.
      
      Most of the test changes are just using s_mul instead of v_mul/mad which
      is generally better for both register pressure and latency (at least on
      GFX10 where sgpr pressure doesn't affect occupancy and vector ALU
      instructions have significantly longer latency than scalar ALU). Some
      R600 tests now use MULLO_INT instead of MUL_UINT24.
      
      GlobalISel appears to handle more scenarios in the desirable way,
      although it can also be thrown off and fails to select the 24-bit
      multiplies in some cases.
      
      Alternative solution considered and rejected was to allow selecting
      MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of
      those SD operations works is don't-care on the most significant 8 bits,
      and this fact is used in some combines via SimplifyDemandedBits.
      
      Based on a patch by Nicolai Hähnle.
      
      Differential Revision: https://reviews.llvm.org/D97063
      52bc2e75
    • Juneyoung Lee's avatar
      [JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns · 19c2e129
      Juneyoung Lee authored
      This allows JumpThreading's computeValueKnownInPredecessors to
      recognize select form of and/or patterns as well.
      19c2e129
Loading