Skip to content
  1. May 07, 2020
  2. May 06, 2020
  3. May 05, 2020
    • Kazu Hirata's avatar
      [Inlining] Teach shouldBeDeferred to take the total cost into account · e8984fe6
      Kazu Hirata authored
      Summary:
      This patch teaches shouldBeDeferred to take into account the total
      cost of inlining.
      
      Suppose we have a call hierarchy {A1,A2,A3,...}->B->C.  (Each of A1,
      A2, A3, ... calls B, which in turn calls C.)
      
      Without this patch, shouldBeDeferred essentially returns true if
      
        TotalSecondaryCost < IC.getCost()
      
      where TotalSecondaryCost is the total cost of inlining B into As.
      This means that if B is a small wraper function, for example, it would
      get inlined into all of As.  In turn, C gets inlined into all of As.
      In other words, shouldBeDeferred ignores the cost of inlining C into
      each of As.
      
      This patch adds an option, inline-deferral-scale, to replace the
      expression above with:
      
        TotalCost < Allowance
      
      where
      
      - TotalCost is TotalSecondaryCost + IC.getCost() * # of As, and
      - Allowance is IC.getCost() * Scale
      
      For now, the new option defaults to -1, disabling the new scheme.
      
      Reviewers: davidxl
      
      Subscribers: eraman, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79138
      e8984fe6
    • Sanjay Patel's avatar
      [SLP] add another bailout for load-combine patterns · 86dfbc67
      Sanjay Patel authored
      This builds on the or-reduction bailout that was added with D67841.
      We still do not have IR-level load combining, although that could
      be a target-specific enhancement for -vector-combiner.
      
      The heuristic is narrowly defined to catch the motivating case from
      PR39538:
      https://bugs.llvm.org/show_bug.cgi?id=39538
      ...while preserving existing functionality.
      
      That is, there's an unmodified test of pure load/zext/store that is
      not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
      That's the reason for the logic difference to require the 'or'
      instructions. The chances that vectorization would actually help a
      memory-bound sequence like that seem small, but it looks nicer with:
      
        vpmovzxwd	(%rsi), %xmm0
        vmovdqu	%xmm0, (%rdi)
      
      rather than:
      
        movzwl	(%rsi), %eax
        movl	%eax, (%rdi)
        ...
      
      In the motivating test, we avoid creating a vector mess that is
      unrecoverable in the backend, and SDAG forms the expected bswap
      instructions after load combining:
      
        movzbl (%rdi), %eax
        vmovd %eax, %xmm0
        movzbl 1(%rdi), %eax
        vmovd %eax, %xmm1
        movzbl 2(%rdi), %eax
        vpinsrb $4, 4(%rdi), %xmm0, %xmm0
        vpinsrb $8, 8(%rdi), %xmm0, %xmm0
        vpinsrb $12, 12(%rdi), %xmm0, %xmm0
        vmovd %eax, %xmm2
        movzbl 3(%rdi), %eax
        vpinsrb $1, 5(%rdi), %xmm1, %xmm1
        vpinsrb $2, 9(%rdi), %xmm1, %xmm1
        vpinsrb $3, 13(%rdi), %xmm1, %xmm1
        vpslld $24, %xmm0, %xmm0
        vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
        vpslld $16, %xmm1, %xmm1
        vpor %xmm0, %xmm1, %xmm0
        vpinsrb $1, 6(%rdi), %xmm2, %xmm1
        vmovd %eax, %xmm2
        vpinsrb $2, 10(%rdi), %xmm1, %xmm1
        vpinsrb $3, 14(%rdi), %xmm1, %xmm1
        vpinsrb $1, 7(%rdi), %xmm2, %xmm2
        vpinsrb $2, 11(%rdi), %xmm2, %xmm2
        vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
        vpinsrb $3, 15(%rdi), %xmm2, %xmm2
        vpslld $8, %xmm1, %xmm1
        vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
        vpor %xmm2, %xmm1, %xmm1
        vpor %xmm1, %xmm0, %xmm0
        vmovdqu %xmm0, (%rsi)
      
        movl	(%rdi), %eax
        movl	4(%rdi), %ecx
        movl	8(%rdi), %edx
        movbel	%eax, (%rsi)
        movbel	%ecx, 4(%rsi)
        movl	12(%rdi), %ecx
        movbel	%edx, 8(%rsi)
        movbel	%ecx, 12(%rsi)
      
      Differential Revision: https://reviews.llvm.org/D78997
      86dfbc67
    • Simon Pilgrim's avatar
      [TTI] getScalarizationOverhead - use explicit VectorType operand · 4e3c0055
      Simon Pilgrim authored
      getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).
      
      Followup to D78357
      
      Reviewed By: @samparker
      
      Differential Revision: https://reviews.llvm.org/D79341
      4e3c0055
    • Arthur Eubanks's avatar
      Remove unnecessary check for inalloca in IPConstantPropagation · d056c0c7
      Arthur Eubanks authored
      Summary:
      This was added in https://reviews.llvm.org/D2449, but I'm not sure it's
      necessary since an inalloca value is never a Constant (should be an
      AllocaInst).
      
      Reviewers: hans, rnk
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79350
      d056c0c7
    • Jay Foad's avatar
      [InstCombine] Allow denormal C in pow(C,y) -> exp2(log2(C)*y) · 22829ab5
      Jay Foad authored
      We check that C is finite and strictly positive, but there's no need to
      check that it's normal too. exp2 should be just as accurate on denormals
      as pow is.
      
      Differential Revision: https://reviews.llvm.org/D79413
      22829ab5
    • David Green's avatar
      [LSR] Don't require register reuse under postinc · 146d44c2
      David Green authored
      LSR has some logic that tries to aggressively reuse registers in
      formula. This can lead to sub-optimal decision in complex loops where
      the backend it trying to use shouldFavorPostInc. This disables the
      re-use in those situations.
      
      Differential Revision: https://reviews.llvm.org/D79301
      146d44c2
    • Jay Foad's avatar
      [InstCombine] Remove hasOneUse check for pow(C,x) -> exp2(log2(C)*x) · fa2783d7
      Jay Foad authored
      I don't think there's any good reason not to do this transformation when
      the pow has multiple uses.
      
      Differential Revision: https://reviews.llvm.org/D79407
      fa2783d7
    • Simon Pilgrim's avatar
      [InstCombine] Fold or(zext(bswap(x)),shl(zext(bswap(y)),bw/2)) ->... · 5c91aa66
      Simon Pilgrim authored
      [InstCombine] Fold or(zext(bswap(x)),shl(zext(bswap(y)),bw/2)) -> bswap(or(zext(x),shl(zext(y), bw/2))
      
      This adds a general combine that can be used to fold:
      
        or(zext(OP(x)), shl(zext(OP(y)),bw/2))
      -->
        OP(or(zext(x), shl(zext(y),bw/2)))
      
      Allowing us to widen 'concat-able' style or+zext patterns - I've just set this up for BSWAP but we could use this for other similar ops (BITREVERSE for instance).
      
      We already do something similar for bitop(bswap(x),bswap(y)) --> bswap(bitop(x,y))
      
      Fixes PR45715
      
      Reviewed By: @lebedev.ri
      
      Differential Revision: https://reviews.llvm.org/D79041
      5c91aa66
    • Sam Parker's avatar
      [NFC][CostModel] Add TargetCostKind to relevant APIs · 40574fef
      Sam Parker authored
      Make the kind of cost explicit throughout the cost model which,
      apart from making the cost clear, will allow the generic parts to
      calculate better costs. It will also allow some backends to
      approximate and correlate the different costs if they wish. Another
      benefit is that it will also help simplify the cost model around
      immediate and intrinsic costs, where we currently have multiple APIs.
      
      RFC thread:
      http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
      
      Differential Revision: https://reviews.llvm.org/D79002
      40574fef
    • Pratyai Mazumder's avatar
      [SanitizerCoverage] Replace the unconditional store with a load, then a conditional store. · 08032e71
      Pratyai Mazumder authored
      Reviewers: vitalybuka, kcc
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79392
      08032e71
    • Sergey Dmitriev's avatar
      [CallGraphUpdater] Removed references to calles when deleting function · f637334d
      Sergey Dmitriev authored
      Summary: Otherwise we can get unaccounted references to call graph nodes.
      
      Reviewers: jdoerfert, sstefan1
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79382
      f637334d
  4. May 04, 2020
    • Zola Bridges's avatar
      [llvm][dfsan][NFC] Factor out fcn initialization · 8d8fda49
      Zola Bridges authored
      Summary:
      Moving these function initializations into separate functions makes it easier
      to read the runOnModule function. There is also precedent in the sanitizer code:
      asan has a function ModuleAddressSanitizer::initializeCallbacks(Module &M). I
      thought it made sense to break the initializations into two sets. One for the
      compiler runtime functions and one for the event callbacks.
      
      Tested with: check-all
      
      Reviewed By: morehouse
      
      Differential Revision: https://reviews.llvm.org/D79307
      8d8fda49
    • Simon Pilgrim's avatar
      [InstCombine] Fold (mul(abs(x),abs(x))) -> (mul(x,x)) (PR39476) · 94006143
      Simon Pilgrim authored
      This patch adds support for discarding integer absolutes (abs + nabs variants) from self-multiplications.
      
      ABS Alive2: http://volta.cs.utah.edu:8080/z/rwcc8W
      NABS Alive2: http://volta.cs.utah.edu:8080/z/jZXUwQ
      
      This is an InstCombine version of D79304 - I'm not sure yet if we'll need that after this.
      
      Reviewed By: @lebedev.ri and @xbolva00
      
      Differential Revision: https://reviews.llvm.org/D79319
      94006143
    • Jay Foad's avatar
      [SLC] Allow llvm.pow(x,2.0) -> x*x etc even if no pow() lib func · e737847b
      Jay Foad authored
      optimizePow does not create any new calls to pow, so it should work
      regardless of whether the pow library function is available. This allows
      it to optimize the llvm.pow intrinsic on targets with no math library.
      
      Based on a patch by Tim Renouf.
      
      Differential Revision: https://reviews.llvm.org/D68231
      e737847b
    • Florian Hahn's avatar
      [SCCP] Re-use pushToWorkList in pushToWorkListMsg (NFC). · 935685f4
      Florian Hahn authored
      There's no need to duplicate the logic to push to the different
      work-lists.
      935685f4
    • Johannes Doerfert's avatar
      [Attributor][NFC] Replace the nested AAMap with a key pair · 14cb0bdf
      Johannes Doerfert authored
      No functional change is intended.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 512375 (362871/s)
      temporary memory allocations: 98746 (69933/s)
      peak heap memory consumption: 22.54MB
      peak RSS (including heaptrack overhead): 106.78MB
      total memory leaked: 269.10KB
      ```
      
      After:
      ```
      calls to allocation functions: 509833 (338534/s)
      temporary memory allocations: 98902 (65671/s)
      peak heap memory consumption: 18.71MB
      peak RSS (including heaptrack overhead): 103.00MB
      total memory leaked: 269.10KB
      ```
      
      Difference:
      ```
      calls to allocation functions: -2542 (-27042/s)
      temporary memory allocations: 156 (1659/s)
      peak heap memory consumption: -3.83MB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      14cb0bdf
    • Johannes Doerfert's avatar
      [Attributor] Remember only necessary dependences · 95e0d28b
      Johannes Doerfert authored
      Before we eagerly put dependences into the QueryMap as soon as we
      encountered them (via `Attributor::getAAFor<>` or
      `Attributor::recordDependence`). Now we will wait to see if the
      dependence is useful, that is if the target is not already in a fixpoint
      state at the end of the update. If so, there is no need to record the
      dependence at all.
      
      Due to the abstraction via `Attributor::updateAA` we will now also treat
      the very first update (during attribute creation) as we do subsequent
      updates.
      
      Finally this resolves the problematic usage of QueriedNonFixAA.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 554675 (389245/s)
      temporary memory allocations: 101574 (71280/s)
      peak heap memory consumption: 28.46MB
      peak RSS (including heaptrack overhead): 116.26MB
      total memory leaked: 269.10KB
      ```
      
      After:
      ```
      calls to allocation functions: 512465 (345559/s)
      temporary memory allocations: 98832 (66643/s)
      peak heap memory consumption: 22.54MB
      peak RSS (including heaptrack overhead): 106.58MB
      total memory leaked: 269.10KB
      ```
      
      Difference:
      ```
      calls to allocation functions: -42210 (-727758/s)
      temporary memory allocations: -2742 (-47275/s)
      peak heap memory consumption: -5.92MB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      95e0d28b
    • Johannes Doerfert's avatar
      [Attributor] Inititialize "value attributes" w/ must-be-executed-context info · 231026a5
      Johannes Doerfert authored
      Attributes that only depend on the value (=bit pattern) can be
      initialized from uses in the must-be-executed-context (MBEC). We did use
      `AAComposeTwoGenericDeduction` and `AAFromMustBeExecutedContext` before
      to do this for some positions of these attributes but not for all. This
      was fairly complicated and also problematic as we did run it in every
      `updateImpl` call even though we only use known information. The new
      implementation removes `AAComposeTwoGenericDeduction`* and
      `AAFromMustBeExecutedContext` in favor of a simple interface
      `AddInformation::fromMBEContext(...)` which we call from the
      `initialize` methods of the "value attribute" `Impl` classes, e.g.
      `AANonNullImpl:initialize`.
      
      There can be two types of test changes:
        1) Artifacts were we miss some information that was known before a
           global fixpoint was reached and therefore available in an update
           but not at the beginning.
        2) Deduction for values we did not derive via the MBEC before or which
           were not found as the `AAFromMustBeExecutedContext::updateImpl` was
           never invoked.
      
      * An improved version of AAComposeTwoGenericDeduction can be found in
        D78718. Once we find a new use case that implementation will be able
        to handle "generic" AAs better.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 468428 (328952/s)
      temporary memory allocations: 77480 (54410/s)
      peak heap memory consumption: 32.71MB
      peak RSS (including heaptrack overhead): 122.46MB
      total memory leaked: 269.10KB
      ```
      
      After:
      ```
      calls to allocation functions: 554720 (351310/s)
      temporary memory allocations: 101650 (64376/s)
      peak heap memory consumption: 28.46MB
      peak RSS (including heaptrack overhead): 116.75MB
      total memory leaked: 269.10KB
      ```
      
      Difference:
      ```
      calls to allocation functions: 86292 (556722/s)
      temporary memory allocations: 24170 (155935/s)
      peak heap memory consumption: -4.25MB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      
      Reviewed By: uenoku
      
      Differential Revision: https://reviews.llvm.org/D78719
      231026a5
    • Johannes Doerfert's avatar
    • Johannes Doerfert's avatar
      [Attributor][NFC] Proactively ask for `nocapure` on call site arguments · 2f97b8b8
      Johannes Doerfert authored
      This minimizes test noise later on and is in line with other attributes
      we derive proactively.
      2f97b8b8
  5. May 03, 2020
    • Sergey Dmitriev's avatar
      [Attributor] Bitcast constant to the returned value type if it has different type · 0f70f733
      Sergey Dmitriev authored
      Reviewers: jdoerfert, sstefan1, uenoku
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, uenoku, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79277
      0f70f733
    • Hongtao Yu's avatar
      [ICP] Handling must tail calls in indirect call promotion · 911e06f5
      Hongtao Yu authored
      Per the IR convention, a musttail call must precede a ret with an optional bitcast. This was violated by the indirect call promotion optimization which could result an IR like:
      
          ; <label>:2192:
            br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483
      
          ; <label>:2199:                                   ; preds = %2192
            musttail call fastcc void @foo(i8* %2195), !dbg !226012
            br label %2202, !dbg !226012
      
          ; <label>:2201:                                   ; preds = %2192
            musttail call fastcc void %2197(i8* %2195), !dbg !226012
            br label %2202, !dbg !226012
      
          ; <label>:2202:                                   ; preds = %605, %2201, %2199
            ret void, !dbg !229485
      
      This is being fixed in this change where the return statement goes together with the promoted indirect call. The code generated is like:
      
          ; <label>:2192:
            br i1 %2198, label %2199, label %2201, !dbg !226012, !prof !229483
      
          ; <label>:2199:                                   ; preds = %2192
            musttail call fastcc void @foo(i8* %2195), !dbg !226012
            ret void, !dbg !229485
      
          ; <label>:2201:                                   ; preds = %2192
            musttail call fastcc void %2197(i8* %2195), !dbg !226012
            ret void, !dbg !229485
      
      Differential Revision: https://reviews.llvm.org/D79258
      911e06f5
    • Mircea Trofin's avatar
      [llvm][NFC] Inliner: factor cost and reporting out of inlining process · bec4ab95
      Mircea Trofin authored
      Summary:
      This factors cost and reporting out of the inlining workflow, thus
      making it easier to reuse when driving inlining from the upcoming
      InliningAdvisor.
      
      Depends on: D79215
      
      Reviewers: davidxl, echristo
      
      Subscribers: eraman, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79275
      bec4ab95
    • Florian Hahn's avatar
      bbdfcf8f
    • Johannes Doerfert's avatar
      [Attributor][NFC] Encode IRPositions in the bits of a single pointer · 8228153f
      Johannes Doerfert authored
      This reduces memory consumption for IRPositions by eliminating the
      vtable pointer and the `KindOrArgNo` integer. Since each abstract
      attribute has an associated IRPosition, the 12-16 bytes we save add up
      quickly.
      
      No functional change is intended.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 469545 (260135/s)
      temporary memory allocations: 77137 (42735/s)
      peak heap memory consumption: 30.50MB
      peak RSS (including heaptrack overhead): 119.50MB
      total memory leaked: 269.07KB
      ```
      
      After:
      ```
      calls to allocation functions: 468999 (274108/s)
      temporary memory allocations: 77002 (45004/s)
      peak heap memory consumption: 28.83MB
      peak RSS (including heaptrack overhead): 118.05MB
      total memory leaked: 269.07KB
      ```
      
      Difference:
      ```
      calls to allocation functions: -546 (5808/s)
      temporary memory allocations: -135 (1436/s)
      peak heap memory consumption: -1.67MB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      
      ---
      
      CTMark 15 runs
      
      Metric: compile_time
      
      Program                                        lhs    rhs    diff
       test-suite...:: CTMark/sqlite3/sqlite3.test    25.07  24.09 -3.9%
       test-suite...Mark/mafft/pairlocalalign.test    14.58  14.14 -3.0%
       test-suite...-typeset/consumer-typeset.test    21.78  21.58 -0.9%
       test-suite :: CTMark/SPASS/SPASS.test          21.95  22.03  0.4%
       test-suite :: CTMark/lencod/lencod.test        25.43  25.50  0.3%
       test-suite...ark/tramp3d-v4/tramp3d-v4.test    23.88  23.83 -0.2%
       test-suite...TMark/7zip/7zip-benchmark.test    60.24  60.11 -0.2%
       test-suite :: CTMark/kimwitu++/kc.test         15.69  15.69 -0.0%
       test-suite...:: CTMark/ClamAV/clamscan.test    25.43  25.42 -0.0%
       test-suite :: CTMark/Bullet/bullet.test        37.63  37.62 -0.0%
       Geomean difference                                          -0.8%
      
      ---
      
      Reviewed By: lebedev.ri
      
      Differential Revision: https://reviews.llvm.org/D78722
      8228153f
    • Johannes Doerfert's avatar
      [Attributor][NFC] Let AbstractAttribute be an IRPosition · 6bf16ee4
      Johannes Doerfert authored
      Since every AbstractAttribute so far, and for the foreseeable future,
      corresponds to a single IRPosition we can simplify the class structure.
      We already did this for IRAttribute but there is no reason to stop
      there.
      6bf16ee4
    • Mircea Trofin's avatar
Loading