Skip to content
  1. Oct 24, 2020
  2. Oct 23, 2020
    • Artur Pilipenko's avatar
      GC-parseable element atomic memcpy/memmove · 6ec2c5e4
      Artur Pilipenko authored
      This change introduces a GC parseable lowering for element atomic
      memcpy/memmove intrinsics. This way runtime can provide an
      implementation which can take a safepoint during copy operation.
      
      See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
      for the background and details:
      https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ
      
      Differential Revision: https://reviews.llvm.org/D88861
      6ec2c5e4
    • Nick Desaulniers's avatar
      [IR] add fn attr for no_stack_protector; prevent inlining on mismatch · b7926ce6
      Nick Desaulniers authored
      It's currently ambiguous in IR whether the source language explicitly
      did not want a stack a stack protector (in C, via function attribute
      no_stack_protector) or doesn't care for any given function.
      
      It's common for code that manipulates the stack via inline assembly or
      that has to set up its own stack canary (such as the Linux kernel) would
      like to avoid stack protectors in certain functions. In this case, we've
      been bitten by numerous bugs where a callee with a stack protector is
      inlined into an __attribute__((__no_stack_protector__)) caller, which
      generally breaks the caller's assumptions about not having a stack
      protector. LTO exacerbates the issue.
      
      While developers can avoid this by putting all no_stack_protector
      functions in one translation unit together and compiling those with
      -fno-stack-protector, it's generally not very ergonomic or as
      ergonomic as a function attribute, and still doesn't work for LTO. See also:
      https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
      https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
      
      Typically, when inlining a callee into a caller, the caller will be
      upgraded in its level of stack protection (see adjustCallerSSPLevel()).
      By adding an explicit attribute in the IR when the function attribute is
      used in the source language, we can now identify such cases and prevent
      inlining.  Block inlining when the callee and caller differ in the case that one
      contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
      
      Fixes pr/47479.
      
      Reviewed By: void
      
      Differential Revision: https://reviews.llvm.org/D87956
      b7926ce6
    • Chen Zheng's avatar
      [LSR] ignore profitable chain when reg num is not major cost. · 1e0b6c1d
      Chen Zheng authored
      Reviewed By: samparker
      
      Differential Revision: https://reviews.llvm.org/D89665
      1e0b6c1d
    • Simon Pilgrim's avatar
      [InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags. · 1cab3bf0
      Simon Pilgrim authored
      matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.
      1cab3bf0
    • Simon Pilgrim's avatar
      [InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI. · 19a13bf5
      Simon Pilgrim authored
      This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.
      19a13bf5
    • OCHyams's avatar
      [mem2reg] Remove dbg.values describing contents of dead allocas · fea067bd
      OCHyams authored
      This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The
      motivation and rationale are exactly the same: When mem2reg removes an alloca,
      it erases the dbg.{addr,declare} instructions which refer to the alloca. It
      would be better to instead remove all debug intrinsics which describe the
      contents of the dead alloca, namely all dbg.value(<dead alloca>, ...,
      DW_OP_deref)'s.
      
      As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been
      silently dropped instead of being made `undef`, so we're just returning to
      previous behaviour with these patches.
      
      Testing:
      `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added
      3 tests, each of which covers a dbg.value deletion path in mem2reg:
        mem2reg-promote-alloca-1.ll
        mem2reg-promote-alloca-2.ll
        mem2reg-promote-alloca-3.ll
      The first is based on the dexter test inlining.c from D89543. This patch also
      improves the debugging experience for loop.c from D89543, which suffers
      similarly after arg promotion instead of inlining.
      fea067bd
    • Caroline Concatto's avatar
      [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms · 24156364
      Caroline Concatto authored
      Use isKnownXY comparators when one of the operands can be with
      scalable vectors or getFixedSize() for all the other cases.
      
      This patch also does bug fixes for getPrimitiveSizeInBits by using
      getFixedSize() near the places with the TypeSize comparison.
      
      Differential Revision: https://reviews.llvm.org/D89703
      24156364
    • Max Kazantsev's avatar
      [SCEV][NFC] Cache symbolic max exit count · 6e574abf
      Max Kazantsev authored
      We want to have a caching version of symbolic BE exit count
      rather than recompute it every time we need it.
      
      Differential Revision: https://reviews.llvm.org/D89954
      Reviewed By: nikic, efriedma
      6e574abf
    • Arthur Eubanks's avatar
      [Inliner] Run always-inliner in inliner-wrapper · 0291e2c9
      Arthur Eubanks authored
      An alwaysinline function may not get inlined in inliner-wrapper due to
      the inlining order.
      
      Previously for the following, the inliner would first inline @a() into @b(),
      
      ```
      define void @a() {
      entry:
        call void @b()
        ret void
      }
      
      define void @b() alwaysinline {
      entry:
        br label %for.cond
      
      for.cond:
        call void @a()
        br label %for.cond
      }
      ```
      
      making @b() recursive and unable to be inlined into @a(), ending at
      
      ```
      define void @a() {
      entry:
        call void @b()
        ret void
      }
      
      define void @b() alwaysinline {
      entry:
        br label %for.cond
      
      for.cond:
        call void @b()
        br label %for.cond
      }
      ```
      
      Running always-inliner first makes sure that we respect alwaysinline in more cases.
      
      Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.
      
      Reviewed By: davidxl, rnk
      
      Differential Revision: https://reviews.llvm.org/D86988
      0291e2c9
  3. Oct 22, 2020
    • Vedant Kumar's avatar
      Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)" · 099bffe7
      Vedant Kumar authored
      This reverts commit 26ee8aff.
      
      It's necessary to insert bitcast the pointer operand of a lifetime
      marker if it has an opaque pointer type.
      
      rdar://70560161
      099bffe7
    • Arthur Eubanks's avatar
      Port -instnamer to NPM · 92d9a386
      Arthur Eubanks authored
      Some clang tests use this.
      
      Reviewed By: akhuang
      
      Differential Revision: https://reviews.llvm.org/D89931
      92d9a386
    • Layton Kifer's avatar
      [InstCombine][NFC] Use ConstantExpr::getBinOpIdentity · d49911c2
      Layton Kifer authored
      Delete duplicate implementation getSelectFoldableConstant and
      replace with ConstantExpr::getBinOpIdentity.
      
      Differential Revision: https://reviews.llvm.org/D89839
      d49911c2
    • Nikita Popov's avatar
      [MemCpyOpt] Move GEP during call slot optimization · 3e375431
      Nikita Popov authored
      When performing a call slot optimization to a GEP destination, it
      will currently usually fail, because the GEP is directly before the
      memcpy and as such does not dominate the call. We should move it
      above the call if that satisfies the domination requirement.
      
      I think that a constant-index GEP is the only useful thing to move
      here, as otherwise isDereferenceablePointer couldn't look through
      it anyway. As such I'm not trying to generalize this further.
      
      Differential Revision: https://reviews.llvm.org/D89623
      3e375431
    • Ettore Tiotto's avatar
      [NFC][PartialInliner]: Clean up code · e6521ce0
      Ettore Tiotto authored
      Make member function const where possible, use LLVM_DEBUG to print debug traces
      rather than a custom option, pass by reference to avoid null checking, ...
      
      Reviewed By: fhann
      
      Differential Revision: https://reviews.llvm.org/D89895
      e6521ce0
    • Vedant Kumar's avatar
      [InstCombine] Remove dbg.values describing contents of dead allocas · 3419252a
      Vedant Kumar authored
      When InstCombine removes an alloca, it erases the dbg.{addr,declare}
      instructions which refer to the alloca. It would be better to instead
      remove all debug intrinsics which describe the contents of the dead
      alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.
      
      This effectively undoes work performed in an InstCombine run earlier in
      the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
      before CallInst users of an alloca. The motivating example looks like:
      
      ```
        define void @foo(i32 %0) {
          %a = alloca i32              ; This alloca is erased.
          store i32 %0, i32* %a
          dbg.value(i32 %0, "arg0")    ; This dbg.value survives.
          dbg.value(i32* %a, "arg0", DW_OP_deref)
          call void @trivially_inlinable_no_op(i32* %a)
          ret void
        }
      ```
      
      If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
      after inlining, making "arg0" unavailable. But we already have dbg.value
      descriptions of the alloca's value (from LowerDbgDeclare), so the
      DW_OP_deref dbg.value cannot serve its purpose of describing an
      initialization of the alloca by some callee. It invalidates other useful
      dbg.values, causing large gaps in location coverage, so we should delete
      it (even though doing so may cause stale dbg.values to appear, if
      there's a dead store to `%a` in @trivially_inlinable_no_op).
      
      OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
      alloca. Note that it's possible to describe a variable that takes on
      different pointer values, e.g.:
      
      ```
        void use(int *);
        void t(int a, int b) {
          int *local = &a;     // dbg.value(i32* %a.addr, "local")
          local = &b;          // dbg.value(i32* undef, "local")
          use(&a);             //           (note: %b.addr is optimized out)
          local = &a;          // dbg.value(i32* %a.addr, "local")
        }
      ```
      
      In this example, the alloca for "b" is erased, but we need to describe
      the value of "local" as <unavailable> before the call to "use". This
      prevents "local" from appearing to be equal to "&a" at the callsite.
      
      rdar://66592859
      
      Differential Revision: https://reviews.llvm.org/D85555
      3419252a
    • Serguei Katkov's avatar
      [IRCE] consolidate profitability check · 75d0e0cd
      Serguei Katkov authored
      Use BFI if it is available and BPI otherwise.
      This is a promised follow-up after D89541.
      
      Reviewers: ebrevnov, mkazantsev
      Reviewed By: ebrevnov
      Subscribers: llvm-commits
      Differential Revision: https://reviews.llvm.org/D89773
      75d0e0cd
    • Zequan Wu's avatar
      Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" · 2f293411
      Zequan Wu authored
      This reverts commit 716f7636.
      2f293411
    • Zequan Wu's avatar
      Revert "SimplifyCFG: Clean up optforfuzzing implementation" · 716f7636
      Zequan Wu authored
      See discussion: https://reviews.llvm.org/D89590
      This reverts commit cdd006ee.
      716f7636
  4. Oct 21, 2020
  5. Oct 20, 2020
    • Nicolai Hähnle's avatar
      DomTree: Extract (mostly) read-only logic into type-erased base classes · 848a68a0
      Nicolai Hähnle authored
      Avoid having to instantiate and compile a subset of the dominator tree logic
      separately for each node type. More importantly, this allows generic
      algorithms to be built on top of dominator trees without writing them as
      templates -- such algorithms can now use opaque CfgBlockRef and
      CfgInterface instead.
      
      A type-erased implementation of dominator trees could be written in
      terms of CfgInterface as well, but doing so would change the current
      trade-off: it would slightly reduce code size at the cost of a slight
      runtime overhead.
      
      This patch does not change the trade-off, as it only does type-erasure
      where basic blocks can be treated in a fully opaque way, i.e. it only
      moves methods that don't require iteration over CFG successors and
      predecessors.
      
      v5:
      - rename generic_{begin,end,children} back without the generic_ prefix
        and refer explictly to base class methods in NewGVN, which wants to
        mutate the order of dominator tree node children directly
      
      v6:
      - style change: iDom -> idom; it's arguable whether this is really
        invalid, since it is actually standard camelCase, but clang-tidy
        complains about it so... *shrug*
      - rename {to,from}Generic -> {wrap,unwrap}Ref
      
      Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672
      
      Differential Revision: https://reviews.llvm.org/D83089
      848a68a0
    • Ta-Wei Tu's avatar
      [NPM] port -unify-loop-exits to NPM · 529ecd19
      Ta-Wei Tu authored
      Reviewed By: aeubanks
      
      Differential Revision: https://reviews.llvm.org/D89774
      529ecd19
    • Ta-Wei Tu's avatar
      [NPM] Port -mergereturn to NPM · 59286b36
      Ta-Wei Tu authored
      Reviewed By: aeubanks
      
      Differential Revision: https://reviews.llvm.org/D89781
      59286b36
    • Florian Hahn's avatar
      [DSE] Do not scan users of memory terminators for further reads. · 2e580102
      Florian Hahn authored
      isMemTerminator checks if the current def is a memory terminator that
      terminates the memory pointed to by DefLoc. We do not have to add any of
      their users to the worklist, because the follow-on users cannot read the
      memory in question.
      
      This leads to more stores eliminated in the presence of lifetime calls.
      Previously we added the users of those intrinsics to the worklist,
      limiting elimination.
      
      In terms of removed stores, this gives a nice boost on some benchmarks
      (MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):
      
      Same hash: 205 (filtered out)
      Remaining: 32
      Metric: dse.NumFastStores
      
      Program                                          base   patch   diff
       test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
       test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
       test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
       test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
       test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
       test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
       test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
       test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
       test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
       test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
       test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
       test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
       test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
       test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
       test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%
      2e580102
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
    • Florian Hahn's avatar
      [DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. · 6439fde6
      Florian Hahn authored
      This change should currently not have any impact, but guard against
      further inconsistencies between MemoryLocation and function attributes.
      6439fde6
    • Simon Pilgrim's avatar
      [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3))... · e372a5f8
      Simon Pilgrim authored
      [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support
      
      Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types
      e372a5f8
    • Nicolai Hähnle's avatar
      Introduce CfgTraits abstraction · c0cdd22c
      Nicolai Hähnle authored
      The CfgTraits abstraction simplfies writing algorithms that are
      generic over the type of CFG, and enables writing such algorithms
      as regular non-template code that operates on opaque references
      to CFG blocks and values.
      
      Implementations of CfgTraits provide operations on the concrete
      CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.
      
      CfgInterface is an abstract base class which provides operations
      on opaque types CfgBlockRef and CfgValueRef. Those opaque types
      encapsulate a `void *`, but the meaning depends on the concrete
      CFG type. For example, MachineCfgTraits -- for use with MachineIR
      in SSA form -- encodes a Register inside CfgValueRef. Converting
      between concrete references and opaque/generic ones is done by
      CfgTraits::{fromGeneric,toGeneric}. Convenience methods
      CfgTraits::{un}wrap{Iterator,Range} are available as well.
      
      Writing algorithms in terms of CfgInterface adds some overhead
      (virtual method calls, plus in same cases it removes the
      opportunity to inline iterators), but can be much more convenient
      since generic algorithms can be written as non-templates.
      
      This patch adds implementations of CfgTraits for all CFGs on
      which dominator trees are calculated, so that the dominator
      tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
      and MachineCfgTraits (Machine IR in SSA form) are complete, the
      other implementations are limited to the absolute minimum
      required to make the upcoming dominator tree changes work.
      
      v5:
      - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
        the instructions in a bundle
      - use MachineBasicBlock::printName
      
      v6:
      - implement predecessors/successors for all CfgTraits implementations
      - fix error in unwrapRange
      - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
        that is consistent with {wrap,unwrap}{Iterator,Range}
      - use getVRegDef instead of getUniqueVRegDef
      
      v7:
      - std::forward fix in wrapping_iterator
      - fix typos
      
      v8:
      - cleanup operators on CfgOpaqueType
      - address other review comments
      
      Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d
      
      Differential Revision: https://reviews.llvm.org/D83088
      c0cdd22c
    • Simon Pilgrim's avatar
    • Atmn Patel's avatar
      [IR] Adds mustprogress as a LLVM IR attribute · 595c6156
      Atmn Patel authored
      This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D85393
      595c6156
    • Serguei Katkov's avatar
      [IRCE] Do not transform if loop has small number of iterations · 38799975
      Serguei Katkov authored
      IRCE has some overhead for runtime checks and in case number of iteration is small
      the overhead can kill the benefit from optimizations.
      
      This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
      number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.
      
      Probably it is better to make more complex cost model but for simplicity it seems the be enough.
      
      The usage of BFI is added only for new pass manager and tries to use it efficiently.
      
      Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
      Reviewed By: mkazantsev
      Subscribers: llvm-commits, fhahn
      Differential Revision: https://reviews.llvm.org/D89541
      38799975
    • Jordan Rupprecht's avatar
      [NFC] Inline assertion-only variable · 8a377f1e
      Jordan Rupprecht authored
      8a377f1e
  6. Oct 19, 2020
    • Roman Lebedev's avatar
      [NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer · e0567582
      Roman Lebedev authored
      The main tricky thing here is forward-declaring the enum:
      we have to specify it's underlying data type.
      
      In particular, this avoids the danger of switching over the SCEVTypes,
      but actually switching over an integer, and not being notified
      when some case is not handled.
      
      I have updated most of such switches to be exaustive and not have
      a default case, where it's pretty obvious to be the intent,
      however not all of them.
      e0567582
Loading