Skip to content
  1. Jan 19, 2019
  2. Jan 18, 2019
    • Florian Hahn's avatar
      [LCSSA] Skip blocks in sub-loops when scanning for uses. · be7cbe3f
      Florian Hahn authored
      Summary:
      Scanning blocks in sub-loops for uses is unnecessary, as they were
      already handled while dealing with the containing sub-loop.
      
      This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
      halves the time spent in LCSSA. In cases were we won't be able to skip
      any blocks, the additional lookup should be negligible.
      
      Time-passes without this patch for test case from PR37202:
      
        Total Execution Time: 48.5505 seconds (48.5511 wall clock)
      
         ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
        10.0822 ( 21.0%)   0.1406 ( 27.0%)  10.2228 ( 21.1%)  10.2228 ( 21.1%)  Loop-Closed SSA Form Pass
        10.0417 ( 20.9%)   0.1467 ( 28.2%)  10.1884 ( 21.0%)  10.1890 ( 21.0%)  Loop-Closed SSA Form Pass #2
         4.2703 (  8.9%)   0.0040 (  0.8%)   4.2742 (  8.8%)   4.2742 (  8.8%)  Unswitch loops
         2.7376 (  5.7%)   0.0229 (  4.4%)   2.7605 (  5.7%)   2.7611 (  5.7%)  Loop-Closed SSA Form Pass #5
         2.7332 (  5.7%)   0.0214 (  4.1%)   2.7546 (  5.7%)   2.7546 (  5.7%)  Loop-Closed SSA Form Pass #3
         2.7088 (  5.6%)   0.0230 (  4.4%)   2.7319 (  5.6%)   2.7324 (  5.6%)  Loop-Closed SSA Form Pass #4
         2.6855 (  5.6%)   0.0236 (  4.5%)   2.7091 (  5.6%)   2.7090 (  5.6%)  Loop-Closed SSA Form Pass #6
         2.1648 (  4.5%)   0.0018 (  0.4%)   2.1666 (  4.5%)   2.1664 (  4.5%)  Unroll loops
         1.8371 (  3.8%)   0.0009 (  0.2%)   1.8379 (  3.8%)   1.8380 (  3.8%)  Value Propagation
         1.8149 (  3.8%)   0.0021 (  0.4%)   1.8170 (  3.7%)   1.8169 (  3.7%)  Loop Invariant Code Motion
         1.6755 (  3.5%)   0.0226 (  4.3%)   1.6981 (  3.5%)   1.6980 (  3.5%)  Loop-Closed SSA Form Pass #7
      
      Time-passes with this patch
      
        Total Execution Time: 29.9285 seconds (29.9276 wall clock)
      
         ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
         5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
         4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
         4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
         2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
         2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
         1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
         1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
         1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
         1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
         1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
         1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3
      
      Reviewers: davide, efriedma, mzolotukhin
      
      Reviewed By: davide, efriedma
      
      Subscribers: hiraditya, dmgreen, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56848
      
      llvm-svn: 351567
      be7cbe3f
    • Max Kazantsev's avatar
      b4cd50be
  3. Jan 17, 2019
    • Vedant Kumar's avatar
      [HotColdSplit] Allow outlining with live outputs · f529b507
      Vedant Kumar authored
      Prior to r348205, extracting code regions with live output values was
      disabled because of a miscompilation (PR39433). Lift the restriction as
      PR39433 has been addressed.
      
      Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
      with a full build of iOS (with hot/cold splitting enabled).
      
      As a drive-by, remove an errant TODO.
      
      llvm-svn: 351492
      f529b507
    • Vedant Kumar's avatar
      [HotColdSplit] Consider resume instructions to be cold · b70e20db
      Vedant Kumar authored
      Resuming exception unwinding is roughly as unlikely as throwing an
      exception.
      
      Tested on LNT+externals (in particular, the C++ EH regression tests
      provide end-to-end test coverage), as well as with a full build of iOS.
      
      llvm-svn: 351491
      b70e20db
    • Vedant Kumar's avatar
      [HotColdSplit] Relax requirement that the cold sink block be extractable · 4541be06
      Vedant Kumar authored
      Relaxing this requirement creates opportunities to split code dominated
      by an EH pad.
      
      Tested on LNT+externals.
      
      llvm-svn: 351483
      4541be06
    • Vedant Kumar's avatar
      [HotColdSplit] Simplify tests by lowering their splitting thresholds · 32a014d0
      Vedant Kumar authored
      This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
      peppered throughout the test cases. They are no longer needed to force
      splitting to occur.
      
      llvm-svn: 351480
      32a014d0
    • Wei Mi's avatar
      [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink · 3bcccdfe
      Wei Mi authored
      If the sample profile has no inlining hierachy information included, we call
      the sample profile is flattened. For flattened profile, in ThinLTO postlink
      phase, SampleProfileLoader's hot function inlining and profile annotation will
      do nothing, so it is better to save the effort to read in the profile and run
      the sample profile loader pass. It is helpful for reducing compile time when
      the flattened profile is huge.
      
      Differential Revision: https://reviews.llvm.org/D54819
      
      llvm-svn: 351476
      3bcccdfe
    • Reid Kleckner's avatar
      [InstCombine] Don't sink dynamic allocas · edd653bc
      Reid Kleckner authored
      Summary:
      InstCombine's sinking algorithm only thinks about memory. It doesn't
      think about non-memory constraints like stack object lifetime. It can
      sink dynamic allocas across a stacksave call, which may be used with
      stackrestore, which can incorrectly reduce the lifetime of the dynamic
      alloca.
      
      Fixes PR40365
      
      Reviewers: hfinkel, efriedma
      
      Subscribers: hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56872
      
      llvm-svn: 351475
      edd653bc
    • Teresa Johnson's avatar
      Revert "[ThinLTO] Add summary entries for index-based WPD" · 8d86f1ba
      Teresa Johnson authored
      Mistaken commit of something still under review!
      
      This reverts commit r351453.
      
      llvm-svn: 351455
      8d86f1ba
    • Teresa Johnson's avatar
      [ThinLTO] Add summary entries for index-based WPD · 4fcf3b16
      Teresa Johnson authored
      Summary:
      If LTOUnit splitting is disabled, the module summary analysis computes
      the summary information necessary to perform single implementation
      devirtualization during the thin link with the index and no IR. The
      information collected from the regular LTO IR in the current hybrid WPD
      algorithm is summarized, including:
      1) For vtable definitions, record the function pointers and their offset
      within the vtable initializer (subsumes the information collected from
      IR by tryFindVirtualCallTargets).
      2) A record for each type metadata summarizing the vtable definitions
      decorated with that metadata (subsumes the TypeIdentiferMap collected
      from IR).
      
      Also added are the necessary bitcode records, and the corresponding
      assembly support.
      
      The index-based WPD will be sent as a follow-on.
      
      Depends on D53890.
      
      Reviewers: pcc
      
      Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54815
      
      llvm-svn: 351453
      4fcf3b16
    • Max Kazantsev's avatar
      [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling · 61a8d3fb
      Max Kazantsev authored
      During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
      parent loop may stop being reachable from the child loop, and therefore they become
      siblings. If the former child loop had uses of some values from its former parent loop,
      now such uses will require LCSSA Phis, even if they weren't needed before. So we must
      form LCSSA for all loops that stopped being ancestors of the current loop in this case.
      
      Differential Revision: https://reviews.llvm.org/D56144
      Reviewed By: fedor.sergeev
      
      llvm-svn: 351434
      61a8d3fb
    • Max Kazantsev's avatar
      [LoopSimplifyCFG] Fix order of deletion of complex dead subloops · 8b134169
      Max Kazantsev authored
      Function `DeleteDeadBlock` requires that all predecessors of a block
      being deleted have already been deleted, with the exception of a
      single-block loop. When we use it for removal of dead subloops that
      contain more than one block, we may not fulfull this requirement and
      fail an assertion.
      
      This patch replaces invocation of `DeleteDeadBlock` with a generalized
      version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
      even if they contain some cycles.
      
      Differential Revision: https://reviews.llvm.org/D56121
      Reviewed By: fedor.sergeev
      
      llvm-svn: 351433
      8b134169
    • Max Kazantsev's avatar
      [NFC] Factor out some local vars · ee613085
      Max Kazantsev authored
      llvm-svn: 351416
      ee613085
    • Vedant Kumar's avatar
      [MergeFunc] Prevent silent miscompile of vararg functions · a9906c1e
      Vedant Kumar authored
      The function merging pass miscompiles identical vararg functions. The
      forwarding thunk it emits doesn't forward the full variable-length list
      of arguments. Disable merging for vararg functions for now.
      
      I've filed llvm.org/PR40345 to track the issue.
      
      rdar://47326238
      
      llvm-svn: 351411
      a9906c1e
    • Vedant Kumar's avatar
      [FunctionComparator] Consider tail call kinds · e21ab221
      Vedant Kumar authored
      Essentially, do not treat `call` and `musttail call` as the same thing.
      
      As a drive-by, fold CallInst and InvokeInst handling together using the
      CallSite helper.
      
      Differential Revision: https://reviews.llvm.org/D56815
      
      llvm-svn: 351405
      e21ab221
    • Wei Mi's avatar
      Fix a mistake in rL351392. · 79c4408a
      Wei Mi authored
      PGOInstrGen should be initialized to "" instead of false.
      
      llvm-svn: 351397
      79c4408a
    • Wei Mi's avatar
      [PGO] Make pgo related options in opt more consistent. · c876e3d4
      Wei Mi authored
      Currently we have pgo options defined in PassManagerBuilder.cpp only for
      instrument pgo, but not for sample pgo. We also have pgo options defined
      in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
      pgo. They have some inconsistency.
      
      To make the options more consistent and make tests writing easier, the
      patch let old pass manager to share the same pgo options with new pass
      manager in opt, and removes the options in PassManagerBuilder.cpp.
      
      Differential Revision: https://reviews.llvm.org/D56749
      
      llvm-svn: 351392
      c876e3d4
  4. Jan 16, 2019
    • Alexey Bataev's avatar
      [SLP] Fix PR40310: The reduction nodes should stay scalar. · 18809a6b
      Alexey Bataev authored
      Summary:
      Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
      nodes during regular vectorization. This may happen inside of the loops,
      when there are some vectorizable PHIs. Patch fixes this by checking if
      the node is the reduction node and thus it must not be vectorized, it must
      be gathered.
      
      Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56783
      
      llvm-svn: 351349
      18809a6b
    • Gabor Buella's avatar
      Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker · 3ec170c8
      Gabor Buella authored
      For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.
      
      For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)
      
      ```
      %rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]*    <----- this one causing the assertion and is an extra bitcast
      %5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
      call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
      ```
      
      isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.
      
      As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.
      
      The test is a greatly simplified to just reproduce the assertion.
      
      Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>
      
      Reviewers: chandlerc, craig.topper
      
      Reviewed By: chandlerc
      
      Differential Revision: https://reviews.llvm.org/D55934
      
      llvm-svn: 351325
      3ec170c8
    • Philip Pfaffe's avatar
      [MSan] Apply the ctor creation scheme of TSan · 81101de5
      Philip Pfaffe authored
      Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.
      
      Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56734
      
      llvm-svn: 351322
      81101de5
    • Philip Pfaffe's avatar
      [NewPM][TSan] Reiterate the TSan port · 685c76d7
      Philip Pfaffe authored
      Summary:
      Second iteration of D56433 which got reverted in rL350719. The problem
      in the previous version was that we dropped the thunk calling the tsan init
      function. The new version keeps the thunk which should appease dyld, but is not
      actually OK wrt. the current semantics of function passes. Hence, add a
      helper to insert the functions only on the first time. The helper
      allows hooking into the insertion to be able to append them to the
      global ctors list.
      
      Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56538
      
      llvm-svn: 351314
      685c76d7
    • Tom Stellard's avatar
      Only promote args when function attributes are compatible · 3d36e5c3
      Tom Stellard authored
      Summary:
      Check to make sure that the caller and the callee have compatible
      function arguments before promoting arguments.  This uses the same
      TargetTransformInfo queries that are used to determine if attributes
      are compatible for inlining.
      
      The goal here is to avoid breaking ABI when a called function's ABI
      depends on a target feature that is not enabled in the caller.
      
      This is a very conservative fix for PR37358.  Ideally we would have a more
      sophisticated check for ABI compatiblity rather than checking if the
      attributes are compatible for inlining.
      
      Reviewers: echristo, chandlerc, eli.friedman, craig.topper
      
      Reviewed By: echristo, chandlerc
      
      Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D53554
      
      llvm-svn: 351296
      3d36e5c3
    • Serguei Katkov's avatar
      [InstCombine]Avoid introduction of unaligned mem access · a5b0e558
      Serguei Katkov authored
      InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
      It might result in generation of unaligned atomic load/store which later in backend
      will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
      and handle it at backend.
      
      Reviewers: reames, anna, apilipenko, mkazantsev
      Reviewed By: reames
      Subscribers: t.p.northover, jfb, llvm-commits
      Differential Revision: https://reviews.llvm.org/D56582
      
      llvm-svn: 351295
      a5b0e558
  5. Jan 15, 2019
  6. Jan 14, 2019
Loading