Skip to content
  1. Sep 21, 2021
  2. Sep 20, 2021
    • Nikita Popov's avatar
      [IR] Add helper to convert offset to GEP indices · dd022656
      Nikita Popov authored
      We implement logic to convert a byte offset into a sequence of GEP
      indices for that offset in a number of places. This patch adds a
      DataLayout::getGEPIndicesForOffset() method, which implements the
      core logic. I've updated SROA, ConstantFolding and InstCombine to
      use it, and there's a few more places where it looks relevant.
      
      Differential Revision: https://reviews.llvm.org/D110043
      dd022656
    • Florian Hahn's avatar
      [DSE] Add additional tests to cover review comments. · 963d3a22
      Florian Hahn authored
      Adds additional tests following comments from D109844.
      
      Also removes unusued in.ptr arguments and places in the call tests that
      used loads instead of a getval call.
      963d3a22
    • Alexey Bataev's avatar
      [SLP]Improve graph reordering. · bc69dd62
      Alexey Bataev authored
      Reworked reordering algorithm. Originally, the compiler just tried to
      detect the most common order in the reordarable nodes (loads, stores,
      extractelements,extractvalues) and then fully rebuilding the graph in
      the best order. This was not effecient, since it required an extra
      memory and time for building/rebuilding tree, double the use of the
      scheduling budget, which could lead to missing vectorization due to
      exausted scheduling resources.
      
      Patch provide 2-way approach for graph reodering problem. At first, all
      reordering is done in-place, it doe not required tree
      deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
      the graph node.
      
      The first step (top-to bottom) rotates the whole graph, similarly to the previous
      implementation. Compiler counts the number of the most used orders of
      the graph nodes with the same vectorization factor and then rotates the
      subgraph with the given vectorization factor to the most used order, if
      it is not empty. Then repeats the same procedure for the subgraphs with
      the smaller vectorization factor. We can do this because we still need
      to reshuffle smaller subgraph when buildiong operands for the graph
      nodes with lasrger vectorization factor, we can rotate just subgraph,
      not the whole graph.
      
      The second step (bottom-to-top) scans through the leaves and tries to
      detect the users of the leaves which can be reordered. If the leaves can
      be reorder in the best fashion, they are reordered and their user too.
      It allows to remove double shuffles to the same ordering of the operands in
      many cases and just reorder the user operations instead. Plus, it moves
      the final shuffles closer to the top of the graph and in many cases
      allows to remove extra shuffle because the same procedure is repeated
      again and we can again merge some reordering masks and reorder user nodes
      instead of the operands.
      
      Also, patch improves cost model for gathering of loads, which improves
      x264 benchmark in some cases.
      
      Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
      +3% for 508.namd, improves most of other benchmarks.
      The compile and link time are almost the same, though in some cases it
      should be better (we're not doing an extra instruction scheduling
      anymore) + we may vectorize more code for the large basic blocks again
      because of saving scheduling budget.
      
      Differential Revision: https://reviews.llvm.org/D105020
      bc69dd62
    • David Sherwood's avatar
      [Analysis] Add support for vscale in computeKnownBitsFromOperator · f988f680
      David Sherwood authored
      In ValueTracking.cpp we use a function called
      computeKnownBitsFromOperator to determine the known bits of a value.
      For the vscale intrinsic if the function contains the vscale_range
      attribute we can use the maximum and minimum values of vscale to
      determine some known zero and one bits. This should help to improve
      code quality by allowing certain optimisations to take place.
      
      Tests added here:
      
        Transforms/InstCombine/icmp-vscale.ll
      
      Differential Revision: https://reviews.llvm.org/D109883
      f988f680
    • Max Kazantsev's avatar
      [NFC] Add assert and test showing that revert of D109596 wasn't justified · e9d34c54
      Max Kazantsev authored
      All transforms of IndVars have prerequisite requirement of LCSSA and LoopSimplify
      form and rely on it. Added test that shows that this actually stands.
      e9d34c54
    • Max Kazantsev's avatar
      Revert "Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"" · 471217cf
      Max Kazantsev authored
      This reverts commit 6fec6552.
      
      The patch was reverted on incorrect claim that this patch may break LCSSA form
      when the loop is not in a simplify form. All IndVars' transform insure that
      the loop is in simplify and LCSSA form, so if it wasn't broken before this
      transform, it will also not be broken after it.
      471217cf
    • Max Kazantsev's avatar
      [SCEV] Support negative values in signed/unsigned predicate reasoning · def15c5f
      Max Kazantsev authored
      There is a piece of logic that uses the fact that signed and unsigned
      versions of the same predicate are equivalent when both values are
      non-negative. It's also true when both of them are negative.
      
      Differential Revision: https://reviews.llvm.org/D109957
      Reviewed By: nikic
      def15c5f
  3. Sep 19, 2021
  4. Sep 18, 2021
  5. Sep 17, 2021
  6. Sep 16, 2021
Loading