Skip to content
  1. Jan 19, 2019
    • Chandler Carruth's avatar
      Convert two more files that were using Windows line endings and remove · 4a50956c
      Chandler Carruth authored
      a stray single '\r' from one file. These are the last line ending issues
      I can find in the files containing parts of LLVM's file headers.
      
      llvm-svn: 351634
      4a50956c
    • Johannes Doerfert's avatar
      Enable IPConstantPropagation to work with abstract call sites · 36872b5d
      Johannes Doerfert authored
      This modification of the currently unused inter-procedural constant
      propagation pass (IPConstantPropagation) shows how abstract call sites
      enable optimization of callback calls alongside direct and indirect
      calls. Through minimal changes, mostly dealing with the partial mapping
      of callbacks, inter-procedural constant propagation was enabled for
      callbacks, e.g., OpenMP runtime calls or pthreads_create.
      
      Differential Revision: https://reviews.llvm.org/D56447
      
      llvm-svn: 351628
      36872b5d
    • Johannes Doerfert's avatar
      AbstractCallSite -- A unified interface for (in)direct and callback calls · 18251842
      Johannes Doerfert authored
        An abstract call site is a wrapper that allows to treat direct,
        indirect, and callback calls the same. If an abstract call site
        represents a direct or indirect call site it behaves like a stripped
        down version of a normal call site object. The abstract call site can
        also represent a callback call, thus the fact that the initially
        called function (=broker) may invoke a third one (=callback callee).
        In this case, the abstract call side hides the middle man, hence the
        broker function. The result is a representation of the callback call,
        inside the broker, but in the context of the original instruction that
        invoked the broker.
      
        Again, there are up to three functions involved when we talk about
        callback call sites. The caller (1), which invokes the broker
        function. The broker function (2), that may or may not invoke the
        callback callee. And finally the callback callee (3), which is the
        target of the callback call.
      
        The abstract call site will handle the mapping from parameters to
        arguments depending on the semantic of the broker function. However,
        it is important to note that the mapping is often partial. Thus, some
        arguments of the call/invoke instruction are mapped to parameters of
        the callee while others are not. At the same time, arguments of the
        callback callee might be unknown, thus "null" if queried.
      
        This patch introduces also !callback metadata which describe how a
        callback broker maps from parameters to arguments. This metadata is
        directly created by clang for known broker functions, provided through
        source code attributes by the user, or later deduced by analyses.
      
      For motivation and additional information please see the corresponding
      talk (slides/video)
        https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20
      as well as the LCPC paper
        http://compilers.cs.uni-saarland.de/people/doerfert/par_opt_lcpc18.pdf
      
      Differential Revision: https://reviews.llvm.org/D54498
      
      llvm-svn: 351627
      18251842
    • Roman Tereshin's avatar
      Reapply "[CGP] Check for existing inttotpr before creating new one" · a0383d6c
      Roman Tereshin authored
      Original commit: r351582
      
      llvm-svn: 351626
      a0383d6c
    • Vedant Kumar's avatar
      [MergeFunc] Allow merging identical vararg functions using aliases · b537b946
      Vedant Kumar authored
      Thanks to Nikita Popov for pointing out this missed case.
      
      This is a follow-up to r351411, which disabled function merging for
      vararg functions outright due to a miscompile (see llvm.org/PR40345).
      
      Differential Revision: https://reviews.llvm.org/D56865
      
      llvm-svn: 351624
      b537b946
    • Vedant Kumar's avatar
      [HotColdSplit] Mark inherently cold functions as such · b755a2df
      Vedant Kumar authored
      If an inherently cold function is found, mark it as cold. For now this
      means applying the `cold` and `minsize` attributes.
      
      As a drive-by, revisit and clean up the criteria for considering a
      function for splitting. Add tests.
      
      llvm-svn: 351623
      b755a2df
    • Vedant Kumar's avatar
      [HotColdSplit] Remove a set which tracked split functions (NFC) · 4de1962b
      Vedant Kumar authored
      Use the begin/end iterator idiom to avoid visiting split functions,
      instead of doing a set lookup.
      
      llvm-svn: 351622
      4de1962b
    • Vedant Kumar's avatar
      [CodeExtractor] Emit lifetime markers around reloads of outputs · 17d9f14b
      Vedant Kumar authored
      CodeExtractor permits extracting a region of blocks from a function even
      when values defined within the region are used outside of it.
      
      This is typically done by creating an alloca in the original function
      and reloading the alloca after a call to the extracted function.
      
      Wrap the reload in lifetime start/end markers to promote stack coloring.
      
      Suggested by Sergei Kachkov!
      
      Differential Revision: https://reviews.llvm.org/D56045
      
      llvm-svn: 351621
      17d9f14b
    • Roman Tereshin's avatar
      Revert "Reapply "[CGP] Check for existing inttotpr before creating new one"" · 022bf3e8
      Roman Tereshin authored
      This reverts commit r351618.
      
      Compiler RT + ASAN tests are failing for PowerPC. Not sure
      how would I reproduce these on macOS, so reverting (again)
      until I do.
      
      llvm-svn: 351619
      022bf3e8
    • Roman Tereshin's avatar
      Reapply "[CGP] Check for existing inttotpr before creating new one" · dd6f9f68
      Roman Tereshin authored
      Original commit: r351582
      
      llvm-svn: 351618
      dd6f9f68
    • Amara Emerson's avatar
      Revert r351584: "GlobalISel: Verify g_zextload and g_sextload" · d5015edb
      Amara Emerson authored
      This new assertion triggered on the AArch64 GlobalISel bots. Reverting while it's being investigated.
      
      llvm-svn: 351617
      d5015edb
    • Reid Kleckner's avatar
      [X86] Deduplicate static calling convention helpers for code size, NFC · 38f9900a
      Reid Kleckner authored
      Summary:
      Right now we include ${TGT}GenCallingConv.inc once per each instruction
      selection method implemented by ${TGT}:
      - ${TGT}ISelLowering.cpp
      - ${TGT}CallLowering.cpp
      - ${TGT}FastISel.cpp
      
      Instead, add a mechanism to tablegen for marking a particular convention
      as "External", which causes tablegen to emit into the ::llvm namespace,
      instead of as a static helper. This allows us to provide a header to
      forward declare it, so we can simply call the function from all the
      places it is referenced. Typically the calling convention analyzer is
      called indirectly, so it doesn't benefit from inlining.
      
      This saves a bit of final binary size, but mostly just saves object file
      size:
      
      before  after   diff   artifact
      12852K  12492K  -360K  X86ISelLowering.cpp.obj
      4640K   4280K   -360K  X86FastISel.cpp.obj
      1704K   2092K   +388K  X86CallingConv.cpp.obj
      52448K  52336K  -112K  llc.exe
      
      I didn't collect before numbers for X86CallLowering.cpp.obj, which is
      for GlobalISel, but we should save 360K there as well.
      
      This patch applies the strategy to the X86 backend, but there is no
      reason it couldn't be applied to the other backends that implement
      multiple ISel strategies, like AArch64.
      
      Reviewers: craig.topper, hfinkel, efriedma
      
      Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56883
      
      llvm-svn: 351616
      38f9900a
    • Rui Ueyama's avatar
      Remove F_modify flag from FileOutputBuffer. · 8e7600dc
      Rui Ueyama authored
      This code is dead. There is no use of the feature in the entire LLVM codebase.
      
      Differential Revision: https://reviews.llvm.org/D56939
      
      llvm-svn: 351613
      8e7600dc
  2. Jan 18, 2019
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Legalize more types for select · 96e47014
      Matt Arsenault authored
      llvm-svn: 351599
      96e47014
    • Roman Tereshin's avatar
      Revert "[CGP] Check for existing inttotpr before creating new one" · 86ac5326
      Roman Tereshin authored
      This reverts commit r351582.
      
      Bots are failing. Reverting this to fix and re-commit later.
      
      llvm-svn: 351598
      86ac5326
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Legalize illegal g_constant · 4599159a
      Matt Arsenault authored
      llvm-svn: 351596
      4599159a
    • Matt Arsenault's avatar
      GlobalISel: Verify G_BITCAST · bd3a5b29
      Matt Arsenault authored
      llvm-svn: 351594
      bd3a5b29
    • Matt Arsenault's avatar
      GlobalISel: Verify G_ICMP/G_FCMP vector types · 215c4f68
      Matt Arsenault authored
      llvm-svn: 351591
      215c4f68
    • Matt Arsenault's avatar
      AMDGPU: Remove llvm.SI.load.const · 85af701e
      Matt Arsenault authored
      It's taken 3 years, but now all of the old AMDGPU and SI intrinsics
      are finally gone
      
      llvm-svn: 351586
      85af701e
    • Matt Arsenault's avatar
      GlobalISel: Verify g_zextload and g_sextload · f67ae611
      Matt Arsenault authored
      llvm-svn: 351584
      f67ae611
    • Craig Topper's avatar
      [X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of... · 08d3d32e
      Craig Topper authored
      [X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of going directly to MachineSDNode.
      
      This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.
      
      llvm-svn: 351583
      08d3d32e
    • Roman Tereshin's avatar
      [CGP] Check for existing inttotpr before creating new one · 85a0467a
      Roman Tereshin authored
      Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
      the same integer value while sinking address computations, but rather
      CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
      that related pointers have nothing to do with each other.
      
      This problem blocks LoadStoreVectorizer from vectorizing some of the
      loads / stores in a downstream target.
      
      Reviewed By: hfinkel
      
      Differential Revision: https://reviews.llvm.org/D56838
      
      llvm-svn: 351582
      85a0467a
    • Bjorn Pettersson's avatar
      [SelectionDAG] Updates for -dag-dump-verbose · d4023bd2
      Bjorn Pettersson authored
      Summary:
      This patch makes some changes related to -dag-dump-verbose.
      Main use case has been when debugging how SelectionDAG is
      dealing with debug info (SDDbgValue nodes).
      
      1) We now print the number of DbgValues that are mapped to each
         SDNode.
      2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
         printed also when not using -dag-dump-verbose).
      3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
         new SDDbgValue::dump that will start a new line after calling
         print.
      4) SDDbgValue::print now prints "Order", and it also prints
         some additional information when kind is CONST/FRAMEIX/VREG.
      5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
         the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
         Invalidated nodes are not printed.
      6) Prohibit inline printing of SDNode operands that has SDDbgValue
         nodes associated to them.
      
      Reviewers: jmorse, aprantl
      
      Reviewed By: aprantl
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56793
      
      llvm-svn: 351581
      d4023bd2
    • Sanjin Sijaric's avatar
      Fix the buildbot issue introduced by r351421 · eaa421d1
      Sanjin Sijaric authored
      The EXPENSIVE_CHECK x86_64 Windows buildbot is failing due to this change. Fix
      the map access.
      
      llvm-svn: 351577
      eaa421d1
    • Mandeep Singh Grang's avatar
      [GlobalISel] Change to range-based invocation of llvm::sort · f0e99779
      Mandeep Singh Grang authored
      llvm-svn: 351574
      f0e99779
    • Florian Hahn's avatar
      [SelectionDAG] Split very large token factors for chained stores to 64k chunks. · dc4e1547
      Florian Hahn authored
      Similar to D55073. Without this change, the DAG combiner crashes on code
      with more than 64k of stores in a single basic block that form parallelizable
      chains.
      
      No test case, as it would be very IR file.
      
      Reviewed By: RKSimon
      
      Differential Revision: https://reviews.llvm.org/D56740
      
      llvm-svn: 351571
      dc4e1547
    • Craig Topper's avatar
      [X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of... · b9d4461f
      Craig Topper authored
      [X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.:
      
      This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.
      
      Differential Revision: https://reviews.llvm.org/D56827
      
      llvm-svn: 351570
      b9d4461f
    • Florian Hahn's avatar
      [LCSSA] Skip blocks in sub-loops when scanning for uses. · be7cbe3f
      Florian Hahn authored
      Summary:
      Scanning blocks in sub-loops for uses is unnecessary, as they were
      already handled while dealing with the containing sub-loop.
      
      This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
      halves the time spent in LCSSA. In cases were we won't be able to skip
      any blocks, the additional lookup should be negligible.
      
      Time-passes without this patch for test case from PR37202:
      
        Total Execution Time: 48.5505 seconds (48.5511 wall clock)
      
         ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
        10.0822 ( 21.0%)   0.1406 ( 27.0%)  10.2228 ( 21.1%)  10.2228 ( 21.1%)  Loop-Closed SSA Form Pass
        10.0417 ( 20.9%)   0.1467 ( 28.2%)  10.1884 ( 21.0%)  10.1890 ( 21.0%)  Loop-Closed SSA Form Pass #2
         4.2703 (  8.9%)   0.0040 (  0.8%)   4.2742 (  8.8%)   4.2742 (  8.8%)  Unswitch loops
         2.7376 (  5.7%)   0.0229 (  4.4%)   2.7605 (  5.7%)   2.7611 (  5.7%)  Loop-Closed SSA Form Pass #5
         2.7332 (  5.7%)   0.0214 (  4.1%)   2.7546 (  5.7%)   2.7546 (  5.7%)  Loop-Closed SSA Form Pass #3
         2.7088 (  5.6%)   0.0230 (  4.4%)   2.7319 (  5.6%)   2.7324 (  5.6%)  Loop-Closed SSA Form Pass #4
         2.6855 (  5.6%)   0.0236 (  4.5%)   2.7091 (  5.6%)   2.7090 (  5.6%)  Loop-Closed SSA Form Pass #6
         2.1648 (  4.5%)   0.0018 (  0.4%)   2.1666 (  4.5%)   2.1664 (  4.5%)  Unroll loops
         1.8371 (  3.8%)   0.0009 (  0.2%)   1.8379 (  3.8%)   1.8380 (  3.8%)  Value Propagation
         1.8149 (  3.8%)   0.0021 (  0.4%)   1.8170 (  3.7%)   1.8169 (  3.7%)  Loop Invariant Code Motion
         1.6755 (  3.5%)   0.0226 (  4.3%)   1.6981 (  3.5%)   1.6980 (  3.5%)  Loop-Closed SSA Form Pass #7
      
      Time-passes with this patch
      
        Total Execution Time: 29.9285 seconds (29.9276 wall clock)
      
         ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
         5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
         4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
         4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
         2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
         2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
         1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
         1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
         1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
         1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
         1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
         1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3
      
      Reviewers: davide, efriedma, mzolotukhin
      
      Reviewed By: davide, efriedma
      
      Subscribers: hiraditya, dmgreen, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56848
      
      llvm-svn: 351567
      be7cbe3f
    • Neil Henning's avatar
      [AMDGPU] Add some missing always-uniform values. · 3ed09f8e
      Neil Henning authored
      This commit adds some missing intrinsics into the isAlwaysUniform list
      for the AMDGPU backend.
      
      Differential Revision: https://reviews.llvm.org/D56845
      
      llvm-svn: 351562
      3ed09f8e
    • Nirav Dave's avatar
      [SelectionDAGBuilder] Cleanup InlineAsm Output generation. NFCI. · 0a45bf0e
      Nirav Dave authored
      Defer inline asm's output fixup work until after we've generated the
      inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
      RetValRegs in favor of using ConstraintOperands.
      
      llvm-svn: 351558
      0a45bf0e
    • Sanjay Patel's avatar
      [x86] simplify code for SDValue.getOperand(); NFC · b6c91a1a
      Sanjay Patel authored
      llvm-svn: 351557
      b6c91a1a
    • Dmitry Preobrazhensky's avatar
      [AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operands · 6bc26aaa
      Dmitry Preobrazhensky authored
      See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332
      
      Reviewers: artem.tamazov, arsenm
      
      Differential Revision: https://reviews.llvm.org/D56794
      
      llvm-svn: 351555
      6bc26aaa
    • Florian Hahn's avatar
      [SelectionDAG] Add getTokenFactor, which splits nodes with > 64k operands. · d2c733b4
      Florian Hahn authored
      This functionality is required at multiple places which potentially
      create large operand lists, like SelectionDAGBuilder or DAGCombiner.
      
      Differential Revision: https://reviews.llvm.org/D56739
      
      llvm-svn: 351552
      d2c733b4
    • James Henderson's avatar
      Add __[_[_]]Z demangling to new common demangle function · f5356944
      James Henderson authored
      This is a follow-up to r351448. It adds support for other _*Z extensions
      of the Itanium demanling, to the newly available demangle function
      heuristic.
      
      Reviewed by: erik.pilkington, rupprecht, grimar
      
      Differential Revision: https://reviews.llvm.org/D56855
      
      llvm-svn: 351551
      f5356944
    • Dmitry Preobrazhensky's avatar
      [AMDGPU][MC] Disabled use of 2 different literals with SOP2/SOPC instructions · 61105bab
      Dmitry Preobrazhensky authored
      See bug 39319: https://bugs.llvm.org/show_bug.cgi?id=39319
      
      Reviewers: artem.tamazov, arsenm, rampitec
      
      Differential Revision: https://reviews.llvm.org/D56847
      
      llvm-svn: 351549
      61105bab
    • Pavel Labath's avatar
      [ADT] Add streaming operators for llvm::Optional · 47e9a21d
      Pavel Labath authored
      Summary:
      The operators simply print the underlying value or "None".
      
      The trickier part of this patch is making sure the streaming operators
      work even in unit tests (which was my primary motivation, though I can
      also see them being useful elsewhere). Since the stream operator was a
      template, implicit conversions did not kick in, and our gtest glue code
      was explicitly introducing an implicit conversion to make sure other
      implicit conversions do not kick in :P. I resolve that by specializing
      llvm_gtest::StreamSwitch for llvm:Optional<T>.
      
      Reviewers: sammccall, dblaikie
      
      Reviewed By: sammccall
      
      Subscribers: mgorny, dexonsmith, kristina, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56795
      
      llvm-svn: 351548
      47e9a21d
    • Dylan McKay's avatar
      [AVR] Fix codegen bug in 16-bit loads · 77364be4
      Dylan McKay authored
      Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to
      instructions of this pattern:
      
          ld  $GPR8, [PTR:XYZ]+
          ld  $GPR8, [PTR]+1
      
      This has a problem; the [PTR] is incremented in-place once, but never
      decremented.
      
      Future uses of the same pointer will use the now clobbered value,
      leading to the pointer being incorrect by an offset of one.
      
      This patch modifies the expansion code of the LDWRdPtr pseudo
      instruction so that the pointer variable is not silently clobbered in
      future uses in the same live range.
      
      Patch by Keshav Kini.
      
      llvm-svn: 351544
      77364be4
    • Florian Hahn's avatar
      [SelectionDAG] Add static getMaxNumOperands function to SDNode. · 1b817723
      Florian Hahn authored
      Summary:
      Use this helper to make sure we use the same value at various places.
      This will likely be needed at more places were we currently crash
      because we use more operands than possible.
      
      Also makes it easier to change in the future.
      
      Reviewers: RKSimon, craig.topper, efriedma, aemerson
      
      Reviewed By: RKSimon
      
      Subscribers: hiraditya, arsenm, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56859
      
      llvm-svn: 351537
      1b817723
    • Shiva Chen's avatar
      [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent · e84c729a
      Shiva Chen authored
      We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
      or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
      ADJCALLSTACKUP may hold CallResource too long and make other
      calls can't be scheduled. If there's no other available node
      to schedule, the scheduler will try to rename the register by
      creating copy to avoid the conflict which will fail because
      CallResource is not a real physical register.
      
      llvm-svn: 351527
      e84c729a
    • Dylan McKay's avatar
      [AVR] Rewrite the CBRRdK instruction as an alias of ANDIRdK · d770da98
      Dylan McKay authored
      The CBR instruction is just an ANDI instruction with the immediate
      complemented.
      
      Because of this, prior to this change TableGen would warn due to a
      decoding conflict.
      
      This commit fixes the existing compilation warning:
      
        ===============
        [423/492] Building AVRGenDisassemblerTables.inc...
        Decoding Conflict:
                        0111............
                        01..............
                        ................
                ANDIRdK 0111____________
                CBRRdK 0111____________
        ================
      
      After this commit, there are no more decoding conflicts in the AVR
      backend's instruction definitions.
      
      Thanks to Eli F for pointing me torward `t2_so_imm_not` as an example of
      how to perform a complement in an instruction alias.
      
      Fixes BugZilla PR38802.
      
      llvm-svn: 351526
      d770da98
Loading