Skip to content
  1. Nov 08, 2019
    • Gil Rapaport's avatar
      Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)" · 9f08ce0d
      Gil Rapaport authored
      This reverts commit 11ed1c02 - causes an assert failure.
      9f08ce0d
    • Nikita Popov's avatar
      Reapply [LVI] Normalize pointer behavior · 885a05f4
      Nikita Popov authored
      Fix cache invalidation by not guarding the dereferenced pointer cache
      erasure by SeenBlocks. SeenBlocks is only populated when actually
      caching a value in the block, which doesn't necessarily have to happen
      just because dereferenced pointers were calculated.
      
      -----
      
      Related to D69686. As noted there, LVI currently behaves differently
      for integer and pointer values: For integers, the block value is always
      valid inside the basic block, while for pointers it is only valid at
      the end of the basic block. I believe the integer behavior is the
      correct one, and CVP relies on it via its getConstantRange() uses.
      
      The reason for the special pointer behavior is that LVI checks whether
      a pointer is dereferenced in a given basic block and marks it as
      non-null in that case. Of course, this information is valid only after
      the dereferencing instruction, or in conservative approximation,
      at the end of the block.
      
      This patch changes the treatment of dereferencability: Instead of
      including it inside the block value, we instead treat it as something
      similar to an assume (it essentially is a non-nullness assume) and
      incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
      if the context instruction is the terminator of the basic block.
      This happens either when determining an edge-value internally in LVI,
      or when a terminator was explicitly passed to getValueAt(). The latter
      case makes this change not fully NFC, because we can now fold
      terminator icmps based on the dereferencability information in the
      same block. This is the reason why I changed one JumpThreading test
      (it would optimize the condition away without the change).
      
      Of course, we do not want to recompute dereferencability on each
      intersectAssume call, so we need a new cache for this. The
      dereferencability analysis requires walking the entire basic block
      and computing underlying objects of all memory operands. This was
      previously done separately for each queried pointer value. In the
      new implementation (both because this makes the caching simpler,
      and because it is faster), I instead only walk the full BB once and
      cache all the dereferenced pointers. So the traversal is now performed
      only once per BB, instead of once per queried pointer value.
      
      I think the overall model now makes more sense than before, and there
      will be no more pitfalls due to differing integer/pointer behavior.
      
      Differential Revision: https://reviews.llvm.org/D69914
      885a05f4
    • Jan Korous's avatar
      [clang] Add VFS support for sanitizers' blacklists · 590f279c
      Jan Korous authored
      Differential Revision: https://reviews.llvm.org/D69648
      590f279c
    • Tom Stellard's avatar
      [cmake] Remove LLVM_{BUILD,LINK}_LLVM_DYLIB options on Windows · 3ffbf972
      Tom Stellard authored
      Summary: The options aren't supported so they can be removed.
      
      Reviewers: beanz, smeenai, compnerd
      
      Reviewed By: compnerd
      
      Subscribers: mgorny, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69877
      3ffbf972
    • evgeny's avatar
      [ThinLTO] Fix bug when importing writeonly variables · 7f92d66f
      evgeny authored
      Patch enables import of write-only variables with non-trivial initializers
      to fix linker errors. Initializers of imported variables are converted to
      'zeroinitializer' to avoid promotion of referenced objects.
      
      Differential revision: https://reviews.llvm.org/D70006
      7f92d66f
    • Tom Stellard's avatar
      [cmake] Remove SVN support from VersionFromVCS.cmake · caad2170
      Tom Stellard authored
      Reviewers: phosek
      
      Subscribers: mgorny, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69682
      caad2170
    • Kazu Hirata's avatar
      [JumpThreading] Fix a comment typo (NFC) · 9aff5e1c
      Kazu Hirata authored
      Reviewers: kazu
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D70013
      9aff5e1c
    • Nikita Popov's avatar
      Revert "[LVI] Normalize pointer behavior" · 43ae5f43
      Nikita Popov authored
      This reverts commit 15bc4dc9.
      
      clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few
      compile-time regressions in test-suite, will investigate.
      43ae5f43
    • Nikita Popov's avatar
      [LVI] Normalize pointer behavior · 15bc4dc9
      Nikita Popov authored
      Related to D69686. As noted there, LVI currently behaves differently
      for integer and pointer values: For integers, the block value is always
      valid inside the basic block, while for pointers it is only valid at
      the end of the basic block. I believe the integer behavior is the
      correct one, and CVP relies on it via its getConstantRange() uses.
      
      The reason for the special pointer behavior is that LVI checks whether
      a pointer is dereferenced in a given basic block and marks it as
      non-null in that case. Of course, this information is valid only after
      the dereferencing instruction, or in conservative approximation,
      at the end of the block.
      
      This patch changes the treatment of dereferencability: Instead of
      including it inside the block value, we instead treat it as something
      similar to an assume (it essentially is a non-nullness assume) and
      incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
      if the context instruction is the terminator of the basic block.
      This happens either when determining an edge-value internally in LVI,
      or when a terminator was explicitly passed to getValueAt(). The latter
      case makes this change not fully NFC, because we can now fold
      terminator icmps based on the dereferencability information in the
      same block. This is the reason why I changed one JumpThreading test
      (it would optimize the condition away without the change).
      
      Of course, we do not want to recompute dereferencability on each
      intersectAssume call, so we need a new cache for this. The
      dereferencability analysis requires walking the entire basic block
      and computing underlying objects of all memory operands. This was
      previously done separately for each queried pointer value. In the
      new implementation (both because this makes the caching simpler,
      and because it is faster), I instead only walk the full BB once and
      cache all the dereferenced pointers. So the traversal is now performed
      only once per BB, instead of once per queried pointer value.
      
      I think the overall model now makes more sense than before, and there
      will be no more pitfalls due to differing integer/pointer behavior.
      
      Differential Revision: https://reviews.llvm.org/D69914
      15bc4dc9
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
      CrashRecoveryContextCleanup - fix uninitialized variable warnings. NFCI. · 24d507f4
      Simon Pilgrim authored
      Remove default values from constructor.
      24d507f4
    • Philip Reames's avatar
      [LICM] Support hosting of dynamic allocas out of loops · 8d22100f
      Philip Reames authored
      This patch implements a correct, but not terribly useful, transform. In particular, if we have a dynamic alloca in a loop which is guaranteed to execute, and provably not captured, we hoist the alloca out of the loop. The capture tracking is needed so that we can prove that each previous stack region dies before the next one is allocated. The transform decreases the amount of stack allocation needed by a linear factor (e.g. the iteration count of the loop).
      
      Now, I really hope no one is actually using dynamic allocas. As such, why this patch?
      
      Well, the actual problem I'm hoping to make progress on is allocation hoisting. There's a large draft patch out for review (https://reviews.llvm.org/D60056), and this patch was the smallest chunk of testable functionality I could come up with which takes a step vaguely in that direction.
      
      Once this is in, it makes motivating the changes to capture tracking mentioned in TODOs testable. After that, I hope to extend this to trivial malloc free regions (i.e. free dominating all loop exits) and allocation functions for GCed languages.
      
      Differential Revision: https://reviews.llvm.org/D69227
      8d22100f
    • Philip Reames's avatar
      [LICM] Hoisting of widenable conditions out of loops · 787dba7a
      Philip Reames authored
      The change itself is straight forward and obvious, but ... there's an existing test checking for exactly the opposite. Both I and Artur think this is simply conservatism in the initial implementation.  If anyone bisects a problem to this, a counter example will be very interesting.
      
      Differential Revision: https://reviews.llvm.org/D69907
      787dba7a
    • Tim Renouf's avatar
      [CostModel] Fixed isExtractSubvectorMask for undef index off end · 0703db39
      Tim Renouf authored
      ShuffleVectorInst::isExtractSubvectorMask, introduced in
        [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)
      
      erroneously thought that
      %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef>
      
      is a subvector extract, even though it goes off the end of the parent
      vector with the undef index. That then caused an assert in
      BasicTTIImplBase::getExtractSubvectorOverhead.
      
      This commit fixes that, by not considering the above a subvector
      extract.
      
      Differential Revision: https://reviews.llvm.org/D70005
      
      Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf
      0703db39
    • Yi-Hong Lyu's avatar
      [PowerPC] Remove redundant CRSET/CRUNSET in custom lowering of known CR bit spills · a3db9c08
      Yi-Hong Lyu authored
      We lower known CR bit spills (CRSET/CRUNSET) to load and spill the known value
      but forgot to remove the redundant spills.
      
      e.g., This sequence was used to spill a CRUNSET:
          crclr   4*cr5+lt
          mfocrf  r3,4
          rlwinm  r3,r3,20,0,0
          stw     r3,132(r1)
      
      Custom lowering of known CR bit spills lower it to:
          crxor 4*cr5+lt, 4*cr5+lt, 4*cr5+lt
          li  r3,0
          stw r3,132(r1)
      
      crxor is redundant if there is no use of 4*cr5+lt so we should remove it
      
      Differential revision: https://reviews.llvm.org/D67722
      a3db9c08
    • Simon Pilgrim's avatar
      raw_ostream - fix static analyzer warnings. NFCI. · 9ee76ab3
      Simon Pilgrim authored
       - uninitialized variables
       - make BufferKind a scoped enum class
      9ee76ab3
    • Simon Pilgrim's avatar
    • Roman Lebedev's avatar
      7dddfa2a
    • Roman Lebedev's avatar
      [ConstantRange] Add umul_sat()/smul_sat() methods · 5a9fd76d
      Roman Lebedev authored
      Summary:
      To be used in `ConstantRange::mulWithNoOverflow()`,
      may in future be useful for when saturating shift/mul ops are added.
      
      These are precise as far as i can tell.
      
      I initially though i will need `APInt::[us]mul_sat()` for these,
      but it turned out much simpler to do what `ConstantRange::multiply()`
      does - perform multiplication in twice the bitwidth, and then truncate.
      Though here we want saturating signed truncation.
      
      Reviewers: nikic, reames, spatel
      
      Reviewed By: nikic
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69994
      5a9fd76d
    • Roman Lebedev's avatar
      [APInt] Add saturating truncation methods · 9ca363d8
      Roman Lebedev authored
      Summary:
      The signed one is needed for implementation of `ConstantRange::smul_sat()`,
      unsigned is for completeness only.
      
      Reviewers: nikic, RKSimon, spatel
      
      Reviewed By: nikic
      
      Subscribers: hiraditya, dexonsmith, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69993
      9ca363d8
    • Kristof Beyls's avatar
    • Simon Pilgrim's avatar
      OutputStream - fix static analyzer warnings. NFCI. · 43eeaa14
      Simon Pilgrim authored
       - uninitialized variables
       - make getBufferCapacity() const
      43eeaa14
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
      b2a1593f
    • Simon Pilgrim's avatar
    • Simon Pilgrim's avatar
      483ed646
    • Aditya Kumar's avatar
      [llvm-xray] Add AArch64 to llvm-xray extract · 1d321434
      Aditya Kumar authored
      This required adding support for resolving R_AARCH64_ABS64 relocations to
      get accurate addresses for function names to resolve.
      
      Authored by: ianlevesque (Ian Levesque)
      Reviewers: dberris, phosek, smeenai, tetsuo-cpp
      Differential Revision: https://reviews.llvm.org/D69967
      1d321434
    • LLVM GN Syncbot's avatar
      gn build: Merge 0dc0572b · f96de257
      LLVM GN Syncbot authored
      f96de257
    • Jason Liu's avatar
      [XCOFF][AIX] Differentiate usage of label symbol and csect symbol · 0dc0572b
      Jason Liu authored
      Summary:
       We are using symbols to represent label and csect interchangeably before, and that could be a problem.
      There are cases we would need to add storage mapping class to the symbol if that symbol is actually the name of a csect, but it's hard for us to figure out whether that symbol is a label or csect.
      
      This patch intend to do the following:
          1. Construct a QualName (A name include the storage mapping class)
             MCSymbolXCOFF for every MCSectionXCOFF.
          2. Keep a pointer to that QualName inside of MCSectionXCOFF.
          3. Use that QualName whenever we need a symbol refers to that
             MCSectionXCOFF.
          4. Adapt the snowball effect from the above changes in
             XCOFFObjectWriter.cpp.
      
      Reviewers: xingxue, DiggerLin, sfertile, daltenty, hubert.reinterpretcast
      
      Reviewed By: DiggerLin, daltenty
      
      Subscribers: wuzish, nemanjai, mgorny, hiraditya, kbarton, jsji, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69633
      0dc0572b
    • Dmitry Preobrazhensky's avatar
    • Gil Rapaport's avatar
      [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) · 11ed1c02
      Gil Rapaport authored
      This recommits 100e797a (reverted in
      009e0326 for failing an assert). While the
      root cause was independently reverted in eaff3004,
      this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
      try to sink branch instructions.
      11ed1c02
    • Simon Pilgrim's avatar
      BinaryStream - fix static analyzer warnings. NFCI. · ef459ded
      Simon Pilgrim authored
       - uninitialized variables
       - documention warnings
       - shadow variable names
      ef459ded
    • Djordje Todorovic's avatar
      Reland: [TII] Use optional destination and source pair as a return value; NFC · 8d2ccd1a
      Djordje Todorovic authored
      Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
      to return optional machine operand pair of destination and source
      registers.
      
      Patch by Nikola Prica
      
      Differential Revision: https://reviews.llvm.org/D69622
      8d2ccd1a
    • Russell Gallop's avatar
      [cmake] Enable thin lto cache when building with lld-link · 0a8bd77e
      Russell Gallop authored
      This was enabled for other platforms. Added option for Windows/lld-link.
      
      Differential Revision: https://reviews.llvm.org/D69941
      0a8bd77e
    • Hans Wennborg's avatar
      Revert d91ed80e "[codeview] Reference types in type parent scopes" · ff3b5134
      Hans Wennborg authored
      This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
      details and reproducer.
      
      > Without this change, when a nested tag type of any kind (enum, class,
      > struct, union) is used as a variable type, it is emitted without
      > emitting the parent type. In CodeView, parent types point to their inner
      > types, and inner types do not point back to their parents. We already
      > walk over all of the parent scopes to build the fully qualified name.
      > This change simply requests their type indices as we go along to enusre
      > they are all emitted.
      >
      > Fixes PR43905
      >
      > Reviewers: akhuang, amccarth
      >
      > Differential Revision: https://reviews.llvm.org/D69924
      ff3b5134
    • Sanne Wouda's avatar
      [RAGreedy] Enable -consider-local-interval-cost for AArch64 · f649f24d
      Sanne Wouda authored
      Summary:
      The greedy register allocator occasionally decides to insert a large number of
      unnecessary copies, see below for an example.  The -consider-local-interval-cost
      option (which X86 already enables by default) fixes this.  We enable this option
      for AArch64 only after receiving feedback that this change is not beneficial for
      PowerPC.
      
      We evaluated the impact of this change on compile time, code size and
      performance benchmarks.
      
      This option has a small impact on compile time, measured on CTMark. A 0.1%
      geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
      on individual benchmarks.
      
      The effect on both code size and performance on AArch64 for the LLVM test suite
      is nil on the geomean with individual outliers (ignoring short exec_times)
      between:
      
                       best     worst
        size..text     -3.3%    +0.0%
        exec_time      -5.8%    +2.3%
      
      On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
      most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
      geomean.  Neither intrate nor fprate show any change in performance.
      
      This patch makes the following changes.
      
      - For the AArch64 target, enableAdvancedRASplitCost() now returns true.
      
      - Ensures that -consider-local-interval-cost=false can disable the new
        behaviour if necessary.
      
      This matrix multiply example:
      
         $ cat test.c
         long A[8][8];
         long B[8][8];
         long C[8][8];
      
         void run_test() {
           for (int k = 0; k < 8; k++) {
             for (int i = 0; i < 8; i++) {
      	 for (int j = 0; j < 8; j++) {
      	   C[i][j] += A[i][k] * B[k][j];
      	 }
             }
           }
         }
      
      results in the following generated code on AArch64:
      
        $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
        [...]
                                              // %for.cond1.preheader
                                              // =>This Inner Loop Header: Depth=1
              add     x14, x11, x9
              str     q0, [sp, #16]           // 16-byte Folded Spill
              ldr     q0, [x14]
              mov     v2.16b, v15.16b
              mov     v15.16b, v14.16b
              mov     v14.16b, v13.16b
              mov     v13.16b, v12.16b
              mov     v12.16b, v11.16b
              mov     v11.16b, v10.16b
              mov     v10.16b, v9.16b
              mov     v9.16b, v8.16b
              mov     v8.16b, v31.16b
              mov     v31.16b, v30.16b
              mov     v30.16b, v29.16b
              mov     v29.16b, v28.16b
              mov     v28.16b, v27.16b
              mov     v27.16b, v26.16b
              mov     v26.16b, v25.16b
              mov     v25.16b, v24.16b
              mov     v24.16b, v23.16b
              mov     v23.16b, v22.16b
              mov     v22.16b, v21.16b
              mov     v21.16b, v20.16b
              mov     v20.16b, v19.16b
              mov     v19.16b, v18.16b
              mov     v18.16b, v17.16b
              mov     v17.16b, v16.16b
              mov     v16.16b, v7.16b
              mov     v7.16b, v6.16b
              mov     v6.16b, v5.16b
              mov     v5.16b, v4.16b
              mov     v4.16b, v3.16b
              mov     v3.16b, v1.16b
              mov     x12, v0.d[1]
              fmov    x15, d0
              ldp     q1, q0, [x14, #16]
              ldur    x1, [x10, #-256]
              ldur    x2, [x10, #-192]
              add     x9, x9, #64             // =64
              mov     x13, v1.d[1]
              fmov    x16, d1
              ldr     q1, [x14, #48]
              mul     x3, x15, x1
              mov     x14, v0.d[1]
              fmov    x17, d0
              mov     x18, v1.d[1]
              fmov    x0, d1
              mov     v1.16b, v3.16b
              mov     v3.16b, v4.16b
              mov     v4.16b, v5.16b
              mov     v5.16b, v6.16b
              mov     v6.16b, v7.16b
              mov     v7.16b, v16.16b
              mov     v16.16b, v17.16b
              mov     v17.16b, v18.16b
              mov     v18.16b, v19.16b
              mov     v19.16b, v20.16b
              mov     v20.16b, v21.16b
              mov     v21.16b, v22.16b
              mov     v22.16b, v23.16b
              mov     v23.16b, v24.16b
              mov     v24.16b, v25.16b
              mov     v25.16b, v26.16b
              mov     v26.16b, v27.16b
              mov     v27.16b, v28.16b
              mov     v28.16b, v29.16b
              mov     v29.16b, v30.16b
              mov     v30.16b, v31.16b
              mov     v31.16b, v8.16b
              mov     v8.16b, v9.16b
              mov     v9.16b, v10.16b
              mov     v10.16b, v11.16b
              mov     v11.16b, v12.16b
              mov     v12.16b, v13.16b
              mov     v13.16b, v14.16b
              mov     v14.16b, v15.16b
              mov     v15.16b, v2.16b
              ldr     q2, [sp]                // 16-byte Folded Reload
              fmov    d0, x3
              mul     x3, x12, x1
        [...]
      
      With -consider-local-interval-cost the same section of code results in the
      following:
      
        $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
        [...]
        .LBB0_1:                              // %for.cond1.preheader
                                              // =>This Inner Loop Header: Depth=1
              add     x14, x11, x9
              ldp     q0, q1, [x14]
              ldur    x1, [x10, #-256]
              ldur    x2, [x10, #-192]
              add     x9, x9, #64             // =64
              mov     x12, v0.d[1]
              fmov    x15, d0
              mov     x13, v1.d[1]
              fmov    x16, d1
              ldp     q0, q1, [x14, #32]
              mul     x3, x15, x1
              cmp     x9, #512                // =512
              mov     x14, v0.d[1]
              fmov    x17, d0
              fmov    d0, x3
              mul     x3, x12, x1
        [...]
      
      Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet
      
      Reviewed By: dmgreen
      
      Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69437
      f649f24d
    • Roger Ferrer's avatar
      [RISCV] Fix evaluation of %pcrel_lo · 41449c58
      Roger Ferrer authored
      The following testcase
      
        function:
        .Lpcrel_label1:
        	auipc	a0, %pcrel_hi(other_function)
        	addi	a1, a0, %pcrel_lo(.Lpcrel_label1)
        	.p2align	2          # Causes a new fragment to be emitted
      
        	.type	other_function,@function
        other_function:
        	ret
      
      exposes an odd behaviour in which only the %pcrel_hi relocation is
      evaluated but not the %pcrel_lo.
      
        $ llvm-mc -triple riscv64 -filetype obj t.s | llvm-objdump  -d -r -
      
        <stdin>:	file format ELF64-riscv
      
        Disassembly of section .text:
        0000000000000000 function:
               0:	17 05 00 00	auipc	a0, 0
               4:	93 05 05 00	mv	a1, a0
        		0000000000000004:  R_RISCV_PCREL_LO12_I	other_function+4
      
        0000000000000008 other_function:
               8:	67 80 00 00	ret
      
      The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we
      only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we
      consider the section. This usually works but there are cases where the
      section may still be the same but the fragment may be another one. In
      that case we end forcing a %pcrel_lo relocation without any %pcrel_hi.
      
      This patch makes RISCVAsmBackend::shouldForceRelocation use the section,
      if any, to determine if the relocation must be forced or not.
      
      Differential Revision: https://reviews.llvm.org/D60657
      41449c58
    • Daniil Suchkov's avatar
      [NFC][IndVarS] Adjust a comment · 7b9f5401
      Daniil Suchkov authored
      (test commit)
      7b9f5401
Loading