Skip to content
  1. Jan 29, 2019
    • Bjorn Pettersson's avatar
      [IPCP] Don't crash due to arg count/type mismatch between caller/callee · d014d576
      Bjorn Pettersson authored
      Summary:
      This patch avoids an assert in IPConstantPropagation when
      there is a argument count/type mismatch between the caller and
      the callee.
      
      While this is actually UB on C-level (clang emits a warning),
      the IR verifier seems to accept it. I'm not sure what other
      frontends/languages might think about this, so simply bailing out
      to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
      Value::doRAUW) seems like a simple solution.
      
      The problem is exposed by the fact that AbstractCallSites will look
      through a bitcast at the callee position of a call/invoke.
      
      Reviewers: jdoerfert, reames, efriedma
      
      Reviewed By: jdoerfert, efriedma
      
      Subscribers: eli.friedman, efriedma, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D57052
      
      llvm-svn: 352469
      d014d576
    • Philip Reames's avatar
      Demanded elements support for vector GEPs · 6c5341bc
      Philip Reames authored
      GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations
      
      Differential Revision: https://reviews.llvm.org/D57177
      
      llvm-svn: 352440
      6c5341bc
  2. Jan 28, 2019
    • Teresa Johnson's avatar
      [ThinLTO] Refine reachability check to fix compile time increase · 5b2f6a1b
      Teresa Johnson authored
      Summary:
      A recent fix to the ThinLTO whole program dead code elimination (D56117)
      increased the thin link time on a large MSAN'ed binary by 2x.
      It's likely that the time increased elsewhere, but was more noticeable
      here since it was already large and ended up timing out.
      
      That change made it so we would repeatedly scan all copies of linkonce
      symbols for liveness every time they were encountered during the graph
      traversal. This was needed since we only mark one copy of an aliasee as
      live when we encounter a live alias. This patch fixes the issue in a
      more efficient manner by simply proactively visiting the aliasee (thus
      marking all copies live) when we encounter a live alias.
      
      Two notes: One, this requires a hash table lookup (finding the aliasee
      summary in the index based on aliasee GUID). However, the impact of this
      seems to be small compared to the original pre-D56117 thin link time. It
      could be addressed if we keep the aliasee ValueInfo in the alias summary
      instead of the aliasee GUID, which I am exploring in a separate patch.
      
      Second, we only populate the aliasee GUID field when reading summaries
      from bitcode (whether we are reading individual summaries and merging on
      the fly to form the compiled index, or reading in a serialized combined
      index). Thankfully, that's currently the only way we can get to this
      code as we don't yet support reading summaries from LLVM assembly
      directly into a tool that performs the thin link (they must be converted
      to bitcode first). I added a FIXME, however I have the fix under test
      already. The easiest fix is to simply populate this field always, which
      isn't hard, but more likely the change I am exploring to store the
      ValueInfo instead as described above will subsume this. I don't want to
      hold up the regression fix for this though.
      
      Reviewers: trentxintong
      
      Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D57203
      
      llvm-svn: 352438
      5b2f6a1b
    • Vedant Kumar's avatar
      [CodeExtractor] Add support for the `swifterror` attribute · 1c3694a4
      Vedant Kumar authored
      When passing a `swifterror` argument or alloca as an input to an
      extraction region, mark the input parameter `swifterror`.
      
      llvm-svn: 352408
      1c3694a4
    • Alina Sbirlea's avatar
      [SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA. · 93210870
      Alina Sbirlea authored
      Summary:
      If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
      (volatile loads are also Defs).
      We still need to check all instructions for "canThrow", even if no Defs are found.
      
      Reviewers: chandlerc
      
      Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D57129
      
      llvm-svn: 352393
      93210870
  3. Jan 25, 2019
    • Alina Sbirlea's avatar
      Revert rL352238. · a34bcbf3
      Alina Sbirlea authored
      llvm-svn: 352241
      a34bcbf3
    • Alina Sbirlea's avatar
      [WarnMissedTransforms] Set default to 1. · 890a8e57
      Alina Sbirlea authored
      Summary:
      Set default value for retrieved attributes to 1, since the check is against 1.
      Eliminates the warning noise generated when the attributes are not present.
      
      Reviewers: sanjoy
      
      Subscribers: jlebar, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D57253
      
      llvm-svn: 352238
      890a8e57
    • Vedant Kumar's avatar
      [HotColdSplit] Introduce a cost model to control splitting behavior · db3f9774
      Vedant Kumar authored
      The main goal of the model is to avoid *increasing* function size, as
      that would eradicate any memory locality benefits from splitting. This
      happens when:
      
        - There are too many inputs or outputs to the cold region. Argument
          materialization and reloads of outputs have a cost.
      
        - The cold region has too many distinct exit blocks, causing a large
          switch to be formed in the caller.
      
        - The code size cost of the split code is less than the cost of a
          set-up call.
      
      A secondary goal is to prevent excessive overall binary size growth.
      
      With the cost model in place, I experimented to find a splitting
      threshold that works well in practice. To make warm & cold code easily
      separable for analysis purposes, I moved split functions to a "cold"
      section. I experimented with thresholds between [0, 4] and set the
      default to the threshold which minimized geomean __text size.
      
      Experiment data from building LNT+externals for X86 (N = 639 programs,
      all sizes in bytes):
      
      | Configuration | __text geom size | __cold geom size | TEXT geom size |
      | **-Os**       | 1736.3           | 0, n=0           | 10961.6        |
      | -Os, thresh=0 | 1740.53          | 124.482, n=134   | 11014          |
      | -Os, thresh=1 | 1734.79          | 57.8781, n=90    | 10978.6        |
      | -Os, thresh=2 | ** 1733.85 **    | 65.6604, n=61    | 10977.6        |
      | -Os, thresh=3 | 1733.85          | 65.3071, n=61    | 10977.6        |
      | -Os, thresh=4 | 1735.08          | 67.5156, n=54    | 10965.7        |
      | **-Oz**       | 1554.4           | 0, n=0           | 10153          |
      | -Oz, thresh=2 | ** 1552.2 **     | 65.633, n=61     | 10176          |
      | **-O3**       | 2563.37          | 0, n=0           | 13105.4        |
      | -O3, thresh=2 | ** 2559.49 **    | 71.1072, n=61    | 13162.4        |
      
      Picking thresh=2 reduces the geomean __text section size by 0.14% at
      -Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
      TEXT size is page-aligned, whereas section sizes are byte-aligned.
      
      Experiment data from building LNT+externals for ARM64 (N = 558 programs,
      all sizes in bytes):
      
      | Configuration | __text geom size | __cold geom size | TEXT geom size |
      | **-Os**       | 1763.96          | 0, n=0           | 42934.9        |
      | -Os, thresh=2 | ** 1760.9 **     | 76.6755, n=61    | 42934.9        |
      
      Picking thresh=2 reduces the geomean __text section size by 0.17% at
      -Os and causes no growth in the TEXT segment.
      
      Measurements were done with D57082 (r352080) applied.
      
      Differential Revision: https://reviews.llvm.org/D57125
      
      llvm-svn: 352228
      db3f9774
    • Max Kazantsev's avatar
      [LoopSimplifyCFG] Fix inconsistency in blocks in loop markup · 38cd9acb
      Max Kazantsev authored
      2nd part of D57095 with the same reason, just in another place. We never
      fold branches that are not immediately in the current loop, but this check
      is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
      dead while it's live. It's a pessimization in the current stance.
      
      Differential Revision: https://reviews.llvm.org/D57147
      Reviewed By: rupprecht	
      
      llvm-svn: 352170
      38cd9acb
    • Vedant Kumar's avatar
      [HotColdSplit] Describe the pass in more detail, NFC · 9d70f2b9
      Vedant Kumar authored
      llvm-svn: 352161
      9d70f2b9
    • Vedant Kumar's avatar
      [HotColdSplit] Split more aggressively before/after cold invokes · 65de025d
      Vedant Kumar authored
      While a cold invoke itself and its unwind destination can't be
      extracted, code which unconditionally executes before/after the invoke
      may still be profitable to extract.
      
      With cost model changes from D57125 applied, this gives a 3.5% increase
      in split text across LNT+externals on arm64 at -Os.
      
      llvm-svn: 352160
      65de025d
    • Peter Collingbourne's avatar
      hwasan: If we split the entry block, move static allocas back into the entry block. · 1a8acfb7
      Peter Collingbourne authored
      Otherwise they are treated as dynamic allocas, which ends up increasing
      code size significantly. This reduces size of Chromium base_unittests
      by 2MB (6.7%).
      
      Differential Revision: https://reviews.llvm.org/D57205
      
      llvm-svn: 352152
      1a8acfb7
  4. Jan 24, 2019
    • Haojian Wu's avatar
      Fix a compiler error introduced in r352093. · b9613a39
      Haojian Wu authored
      llvm-svn: 352098
      b9613a39
    • Alina Sbirlea's avatar
      [LICM] Cleanup duplicated code. [NFCI] · 0a436720
      Alina Sbirlea authored
      llvm-svn: 352093
      0a436720
    • Alina Sbirlea's avatar
      [MemorySSA +LICM CFHoist] Solve PR40317. · 52f6e2a1
      Alina Sbirlea authored
      Summary:
      MemorySSA needs updating each time an instruction is moved.
      LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
      Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.
      
      Reviewers: jnspaulsson
      
      Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D57176
      
      llvm-svn: 352092
      52f6e2a1
    • Vedant Kumar's avatar
      [HotColdSplit] Move splitting earlier in the pipeline · ef1ebed1
      Vedant Kumar authored
      Performing splitting early has several advantages:
      
        - Inhibiting inlining of cold code early improves code size. Compared
          to scheduling splitting at the end of the pipeline, this cuts code
          size growth in half within the iOS shared cache (0.69% to 0.34%).
      
        - Inhibiting inlining of cold code improves compile time. There's no
          need to inline split cold functions, or to inline as much *within*
          those split functions as they are marked `minsize`.
      
        - During LTO, extra work is only done in the pre-link step. Less code
          must be inlined during cross-module inlining.
      
      An additional motivation here is that the most common cold regions
      identified by the static/conservative splitting heuristic can (a) be
      found before inlining and (b) do not grow after inlining. E.g.
      __assert_fail, os_log_error.
      
      The disadvantages are:
      
        - Some opportunities for splitting out cold code may be missed. This
          gap can potentially be narrowed by adding a worklist algorithm to the
          splitting pass.
      
        - Some opportunities to reduce code size may be lost (e.g. store
          sinking, when one side of the CFG diamond is split). This does not
          outweigh the code size benefits of splitting earlier.
      
      On net, splitting early in the pipeline has substantial code size
      benefits, and no major effects on memory locality or performance. We
      measured memory locality using ktrace data, and consistently found that
      10% fewer pages were needed to capture 95% of text page faults in key
      iOS benchmarks. We measured performance on frequency-stabilized iOS
      devices using LNT+externals.
      
      This reverses course on the decision made to schedule splitting late in
      r344869 (D53437).
      
      Differential Revision: https://reviews.llvm.org/D57082
      
      llvm-svn: 352080
      ef1ebed1
    • Julian Lettner's avatar
      b62e9dc4
    • Philip Reames's avatar
      [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx · 4d683ee7
      Philip Reames authored
      After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.
      
      Differential Revision: https://reviews.llvm.org/D57161
      
      llvm-svn: 352061
      4d683ee7
    • Philip Reames's avatar
      [RS4GC] Avoid crashing on gep scalar_base, vector_idx · a657510e
      Philip Reames authored
      This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.
      
      The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.
      
      This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.
      
      Differential Revision: https://reviews.llvm.org/D57138
      
      llvm-svn: 352059
      a657510e
    • Florian Hahn's avatar
      Revert "[HotColdSplitting] Get DT and PDT from the pass manager." · bed7f9ea
      Florian Hahn authored
      This reverts commit a6982414 (llvm-svn: 352036),
      because it causes a memory leak in the pass manager. Failing bot
      
      http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio
      
      llvm-svn: 352041
      bed7f9ea
    • Florian Hahn's avatar
      [HotColdSplitting] Get DT and PDT from the pass manager. · a6982414
      Florian Hahn authored
      Instead of manually computing DT and PDT, we can get the from the pass
      manager, which ideally has them already cached. With the new pass
      manager, we could even preserve DT/PDT on a per function basis in a
      module pass.
      
      I think this also addresses the TODO about re-using the computed DTs for
      BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
      will fetch the cached version later.
      
      Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop
      
      Reviewed By: vsk
      
      Differential Revision: https://reviews.llvm.org/D57092
      
      llvm-svn: 352036
      a6982414
    • Max Kazantsev's avatar
      [LoopSimplifyCFG] Fix inconsistency in live blocks markup · 56515a2c
      Max Kazantsev authored
      When we choose whether or not we should mark block as dead, we have an
      inconsistent logic in markup of live blocks.
      - We take candidate IF its terminator branches on constant AND it is immediately
        in current loop;
      - We mark successor live IF its terminator doesn't branch by constant OR it branches
        by constant and the successor is its always taken block.
      
      What we are missing here is that when the terminator branches on a constant but is
      not taken as a candidate because is it not immediately in the current loop, we will
      mark only one (always taken) successor as live. Therefore, we do NOT do the actual
      folding but may NOT mark one of the successors as live. So the result of markup is
      wrong in this case, and we may then hit various asserts.
      
      Thanks Jordan Rupprech for reporting this!
      
      Differential Revision: https://reviews.llvm.org/D57095
      Reviewed By: rupprecht
      
      llvm-svn: 352024
      56515a2c
    • Julian Lettner's avatar
      [Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls · cea84ab9
      Julian Lettner authored
      Summary:
      UBSan wants to detect when unreachable code is actually reached, so it
      adds instrumentation before every `unreachable` instruction. However,
      the optimizer will remove code after calls to functions marked with
      `noreturn`. To avoid this UBSan removes `noreturn` from both the call
      instruction as well as from the function itself. Unfortunately, ASan
      relies on this annotation to unpoison the stack by inserting calls to
      `_asan_handle_no_return` before `noreturn` functions. This is important
      for functions that do not return but access the the stack memory, e.g.,
      unwinder functions *like* `longjmp` (`longjmp` itself is actually
      "double-proofed" via its interceptor). The result is that when ASan and
      UBSan are combined, the `noreturn` attributes are missing and ASan
      cannot unpoison the stack, so it has false positives when stack
      unwinding is used.
      
      Changes:
        # UBSan now adds the `expect_noreturn` attribute whenever it removes
          the `noreturn` attribute from a function
        # ASan additionally checks for the presence of this attribute
      
      Generated code:
      ```
      call void @__asan_handle_no_return    // Additionally inserted to avoid false positives
      call void @longjmp
      call void @__asan_handle_no_return
      call void @__ubsan_handle_builtin_unreachable
      unreachable
      ```
      
      The second call to `__asan_handle_no_return` is redundant. This will be
      cleaned up in a follow-up patch.
      
      rdar://problem/40723397
      
      Reviewers: delcypher, eugenis
      
      Tags: #sanitizers
      
      Differential Revision: https://reviews.llvm.org/D56624
      
      llvm-svn: 352003
      cea84ab9
    • David Callahan's avatar
      Update entry count for cold calls · d2eeb251
      David Callahan authored
      Summary:
      Profile sample files include the number of times each entry or inlined
      call site is sampled. This is translated into the entry count metadta
      on functions.
      
      When sample data is being read, if a call site that was inlined
      in the sample program is considered cold and not inlined, then
      the entry count of the out-of-line functions does not reflect
      the current compilation.
      
      In this patch, we note call sites where the function was not inlined
      and as a last action of the sample profile loading, we update the
      called function's entry count to reflect the calls from these
      call sites which are not included in the profile file.
      
      Reviewers: danielcdh, wmi, Kader, modocache
      
      Reviewed By: wmi
      
      Subscribers: davidxl, eraman, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D52845
      
      llvm-svn: 352001
      d2eeb251
    • Mircea Trofin's avatar
      [llvm] Clarify responsiblity of some of DILocation discriminator APIs · ec026302
      Mircea Trofin authored
      Summary:
      Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
      similar APIs. Also changed its behavior to copy over the other
      discriminator components, instead of eliding them.
      
      Renamed cloneWithDuplicationFactor to
      cloneByMultiplyingDuplicationFactor, which more closely matches what
      this API does.
      
      Reviewers: dblaikie, wmi
      
      Reviewed By: dblaikie
      
      Subscribers: zzheng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56220
      
      llvm-svn: 351996
      ec026302
  5. Jan 23, 2019
    • Hideki Saito's avatar
      · 4e4ecae0
      Hideki Saito authored
      [LV][VPlan] Change to implement VPlan based predication for
      VPlan-native path
      
      Context: Patch Series #2 for outer loop vectorization support in LV
      using VPlan. (RFC:
      http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
      
      Patch series #2 checks that inner loops are still trivially lock-step
      among all vector elements. Non-loop branches are blindly assumed as
      divergent.
      
      Changes here implement VPlan based predication algorithm to compute
      predicates for blocks that need predication. Predicates are computed
      for the VPLoop region in reverse post order. A block's predicate is
      computed as OR of the masks of all incoming edges. The mask for an
      incoming edge is computed as AND of predecessor block's predicate and
      either predecessor's Condition bit or NOT(Condition bit) depending on
      whether the edge from predecessor block to the current block is true
      or false edge.
      
      Reviewers: fhahn, rengolin, hsaito, dcaballe
      
      Reviewed By: fhahn
      
      Patch by Satish Guggilla, thanks!
      
      Differential Revision: https://reviews.llvm.org/D53349
      
      llvm-svn: 351990
      4e4ecae0
    • Peter Collingbourne's avatar
      hwasan: Read shadow address from ifunc if we don't need a frame record. · 020ce3f0
      Peter Collingbourne authored
      This saves a cbz+cold call in the interceptor ABI, as well as a realign
      in both ABIs, trading off a dcache entry against some branch predictor
      entries and some code size.
      
      Unfortunately the functionality is hidden behind a flag because ifunc is
      known to be broken on static binaries on Android.
      
      Differential Revision: https://reviews.llvm.org/D57084
      
      llvm-svn: 351989
      020ce3f0
    • Florian Hahn's avatar
      [HotColdSplitting] Remove unused SSAUpdater.h include (NFC). · 68cea130
      Florian Hahn authored
      llvm-svn: 351945
      68cea130
    • Max Kazantsev's avatar
      [IRCE] Support narrow latch condition for wide range checks · d9aee3c0
      Max Kazantsev authored
      This patch relaxes restrictions on types of latch condition and range check.
      In current implementation, they should match. This patch allows to handle
      wide range checks against narrow condition. The motivating example is the
      following:
      
        int N = ...
        for (long i = 0; (int) i < N; i++) {
          if (i >= length) deopt;
        }
      
      In this patch, the option that enables this support is turned off by
      default. We'll wait until it is switched to true.
      
      Differential Revision: https://reviews.llvm.org/D56837
      Reviewed By: reames
      
      llvm-svn: 351926
      d9aee3c0
    • Peter Collingbourne's avatar
      hwasan: Move memory access checks into small outlined functions on aarch64. · 73078ecd
      Peter Collingbourne authored
      Each hwasan check requires emitting a small piece of code like this:
      https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses
      
      The problem with this is that these code blocks typically bloat code
      size significantly.
      
      An obvious solution is to outline these blocks of code. In fact, this
      has already been implemented under the -hwasan-instrument-with-calls
      flag. However, as currently implemented this has a number of problems:
      - The functions use the same calling convention as regular C functions.
        This means that the backend must spill all temporary registers as
        required by the platform's C calling convention, even though the
        check only needs two registers on the hot path.
      - The functions take the address to be checked in a fixed register,
        which increases register pressure.
      Both of these factors can diminish the code size effect and increase
      the performance hit of -hwasan-instrument-with-calls.
      
      The solution that this patch implements is to involve the aarch64
      backend in outlining the checks. An intrinsic and pseudo-instruction
      are created to represent a hwasan check. The pseudo-instruction
      is register allocated like any other instruction, and we allow the
      register allocator to select almost any register for the address to
      check. A particular combination of (register selection, type of check)
      triggers the creation in the backend of a function to handle the check
      for specifically that pair. The resulting functions are deduplicated by
      the linker. The pseudo-instruction (really the function) is specified
      to preserve all registers except for the registers that the AAPCS
      specifies may be clobbered by a call.
      
      To measure the code size and performance effect of this change, I
      took a number of measurements using Chromium for Android on aarch64,
      comparing a browser with inlined checks (the baseline) against a
      browser with outlined checks.
      
      Code size: Size of .text decreases from 243897420 to 171619972 bytes,
      or a 30% decrease.
      
      Performance: Using Chromium's blink_perf.layout microbenchmarks I
      measured a median performance regression of 6.24%.
      
      The fact that a perf/size tradeoff is evident here suggests that
      we might want to make the new behaviour conditional on -Os/-Oz.
      But for now I've enabled it unconditionally, my reasoning being that
      hwasan users typically expect a relatively large perf hit, and ~6%
      isn't really adding much. We may want to revisit this decision in
      the future, though.
      
      I also tried experimenting with varying the number of registers
      selectable by the hwasan check pseudo-instruction (which would result
      in fewer variants being created), on the hypothesis that creating
      fewer variants of the function would expose another perf/size tradeoff
      by reducing icache pressure from the check functions at the cost of
      register pressure. Although I did observe a code size increase with
      fewer registers, I did not observe a strong correlation between the
      number of registers and the performance of the resulting browser on the
      microbenchmarks, so I conclude that we might as well use ~all registers
      to get the maximum code size improvement. My results are below:
      
      Regs | .text size | Perf hit
      -----+------------+---------
      ~all | 171619972  | 6.24%
        16 | 171765192  | 7.03%
         8 | 172917788  | 5.82%
         4 | 177054016  | 6.89%
      
      Differential Revision: https://reviews.llvm.org/D56954
      
      llvm-svn: 351920
      73078ecd
  6. Jan 22, 2019
  7. Jan 20, 2019
    • Serge Guelton's avatar
      Replace llvm::isPodLike<...> by llvm::is_trivially_copyable<...> · be88539b
      Serge Guelton authored
      As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
      isPodLike<std::pair<...>> did not match the expectation of
      std::is_trivially_copyable which makes the memcpy optimization invalid.
      
      This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
      Unfortunately std::is_trivially_copyable is not portable across compiler / STL
      versions. So a portable version is provided too.
      
      Note that the following specialization were invalid:
      
          std::pair<T0, T1>
          llvm::Optional<T>
      
      Tests have been added to assert that former specialization are respected by the
      standard usage of llvm::is_trivially_copyable, and that when a decent version
      of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
      compared to std::is_trivially_copyable.
      
      As of this patch, llvm::Optional is no longer considered trivially copyable,
      even if T is. This is to be fixed in a later patch, as it has impact on a
      long-running bug (see r347004)
      
      Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.
      
      Differential Revision: https://reviews.llvm.org/D54472
      
      llvm-svn: 351701
      be88539b
    • Simon Pilgrim's avatar
      [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons · e1143c13
      Simon Pilgrim authored
      This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.
      
      Noticed while cleaning up vector integer comparison costs for PR40376.
      
      llvm-svn: 351697
      e1143c13
    • Vedant Kumar's avatar
      [ConstantMerge] Factor out check for un-mergeable globals, NFC · 857cacd9
      Vedant Kumar authored
      llvm-svn: 351671
      857cacd9
  8. Jan 19, 2019
Loading