Skip to content
  1. Nov 08, 2019
    • Gil Rapaport's avatar
      [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) · 11ed1c02
      Gil Rapaport authored
      This recommits 100e797a (reverted in
      009e0326 for failing an assert). While the
      root cause was independently reverted in eaff3004,
      this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
      try to sink branch instructions.
      11ed1c02
    • Simon Pilgrim's avatar
      BinaryStream - fix static analyzer warnings. NFCI. · ef459ded
      Simon Pilgrim authored
       - uninitialized variables
       - documention warnings
       - shadow variable names
      ef459ded
    • Djordje Todorovic's avatar
      Reland: [TII] Use optional destination and source pair as a return value; NFC · 8d2ccd1a
      Djordje Todorovic authored
      Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
      to return optional machine operand pair of destination and source
      registers.
      
      Patch by Nikola Prica
      
      Differential Revision: https://reviews.llvm.org/D69622
      8d2ccd1a
    • Russell Gallop's avatar
      [cmake] Enable thin lto cache when building with lld-link · 0a8bd77e
      Russell Gallop authored
      This was enabled for other platforms. Added option for Windows/lld-link.
      
      Differential Revision: https://reviews.llvm.org/D69941
      0a8bd77e
    • Hans Wennborg's avatar
      Revert d91ed80e "[codeview] Reference types in type parent scopes" · ff3b5134
      Hans Wennborg authored
      This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
      details and reproducer.
      
      > Without this change, when a nested tag type of any kind (enum, class,
      > struct, union) is used as a variable type, it is emitted without
      > emitting the parent type. In CodeView, parent types point to their inner
      > types, and inner types do not point back to their parents. We already
      > walk over all of the parent scopes to build the fully qualified name.
      > This change simply requests their type indices as we go along to enusre
      > they are all emitted.
      >
      > Fixes PR43905
      >
      > Reviewers: akhuang, amccarth
      >
      > Differential Revision: https://reviews.llvm.org/D69924
      ff3b5134
    • Sanne Wouda's avatar
      [RAGreedy] Enable -consider-local-interval-cost for AArch64 · f649f24d
      Sanne Wouda authored
      Summary:
      The greedy register allocator occasionally decides to insert a large number of
      unnecessary copies, see below for an example.  The -consider-local-interval-cost
      option (which X86 already enables by default) fixes this.  We enable this option
      for AArch64 only after receiving feedback that this change is not beneficial for
      PowerPC.
      
      We evaluated the impact of this change on compile time, code size and
      performance benchmarks.
      
      This option has a small impact on compile time, measured on CTMark. A 0.1%
      geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
      on individual benchmarks.
      
      The effect on both code size and performance on AArch64 for the LLVM test suite
      is nil on the geomean with individual outliers (ignoring short exec_times)
      between:
      
                       best     worst
        size..text     -3.3%    +0.0%
        exec_time      -5.8%    +2.3%
      
      On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
      most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
      geomean.  Neither intrate nor fprate show any change in performance.
      
      This patch makes the following changes.
      
      - For the AArch64 target, enableAdvancedRASplitCost() now returns true.
      
      - Ensures that -consider-local-interval-cost=false can disable the new
        behaviour if necessary.
      
      This matrix multiply example:
      
         $ cat test.c
         long A[8][8];
         long B[8][8];
         long C[8][8];
      
         void run_test() {
           for (int k = 0; k < 8; k++) {
             for (int i = 0; i < 8; i++) {
      	 for (int j = 0; j < 8; j++) {
      	   C[i][j] += A[i][k] * B[k][j];
      	 }
             }
           }
         }
      
      results in the following generated code on AArch64:
      
        $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
        [...]
                                              // %for.cond1.preheader
                                              // =>This Inner Loop Header: Depth=1
              add     x14, x11, x9
              str     q0, [sp, #16]           // 16-byte Folded Spill
              ldr     q0, [x14]
              mov     v2.16b, v15.16b
              mov     v15.16b, v14.16b
              mov     v14.16b, v13.16b
              mov     v13.16b, v12.16b
              mov     v12.16b, v11.16b
              mov     v11.16b, v10.16b
              mov     v10.16b, v9.16b
              mov     v9.16b, v8.16b
              mov     v8.16b, v31.16b
              mov     v31.16b, v30.16b
              mov     v30.16b, v29.16b
              mov     v29.16b, v28.16b
              mov     v28.16b, v27.16b
              mov     v27.16b, v26.16b
              mov     v26.16b, v25.16b
              mov     v25.16b, v24.16b
              mov     v24.16b, v23.16b
              mov     v23.16b, v22.16b
              mov     v22.16b, v21.16b
              mov     v21.16b, v20.16b
              mov     v20.16b, v19.16b
              mov     v19.16b, v18.16b
              mov     v18.16b, v17.16b
              mov     v17.16b, v16.16b
              mov     v16.16b, v7.16b
              mov     v7.16b, v6.16b
              mov     v6.16b, v5.16b
              mov     v5.16b, v4.16b
              mov     v4.16b, v3.16b
              mov     v3.16b, v1.16b
              mov     x12, v0.d[1]
              fmov    x15, d0
              ldp     q1, q0, [x14, #16]
              ldur    x1, [x10, #-256]
              ldur    x2, [x10, #-192]
              add     x9, x9, #64             // =64
              mov     x13, v1.d[1]
              fmov    x16, d1
              ldr     q1, [x14, #48]
              mul     x3, x15, x1
              mov     x14, v0.d[1]
              fmov    x17, d0
              mov     x18, v1.d[1]
              fmov    x0, d1
              mov     v1.16b, v3.16b
              mov     v3.16b, v4.16b
              mov     v4.16b, v5.16b
              mov     v5.16b, v6.16b
              mov     v6.16b, v7.16b
              mov     v7.16b, v16.16b
              mov     v16.16b, v17.16b
              mov     v17.16b, v18.16b
              mov     v18.16b, v19.16b
              mov     v19.16b, v20.16b
              mov     v20.16b, v21.16b
              mov     v21.16b, v22.16b
              mov     v22.16b, v23.16b
              mov     v23.16b, v24.16b
              mov     v24.16b, v25.16b
              mov     v25.16b, v26.16b
              mov     v26.16b, v27.16b
              mov     v27.16b, v28.16b
              mov     v28.16b, v29.16b
              mov     v29.16b, v30.16b
              mov     v30.16b, v31.16b
              mov     v31.16b, v8.16b
              mov     v8.16b, v9.16b
              mov     v9.16b, v10.16b
              mov     v10.16b, v11.16b
              mov     v11.16b, v12.16b
              mov     v12.16b, v13.16b
              mov     v13.16b, v14.16b
              mov     v14.16b, v15.16b
              mov     v15.16b, v2.16b
              ldr     q2, [sp]                // 16-byte Folded Reload
              fmov    d0, x3
              mul     x3, x12, x1
        [...]
      
      With -consider-local-interval-cost the same section of code results in the
      following:
      
        $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
        [...]
        .LBB0_1:                              // %for.cond1.preheader
                                              // =>This Inner Loop Header: Depth=1
              add     x14, x11, x9
              ldp     q0, q1, [x14]
              ldur    x1, [x10, #-256]
              ldur    x2, [x10, #-192]
              add     x9, x9, #64             // =64
              mov     x12, v0.d[1]
              fmov    x15, d0
              mov     x13, v1.d[1]
              fmov    x16, d1
              ldp     q0, q1, [x14, #32]
              mul     x3, x15, x1
              cmp     x9, #512                // =512
              mov     x14, v0.d[1]
              fmov    x17, d0
              fmov    d0, x3
              mul     x3, x12, x1
        [...]
      
      Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet
      
      Reviewed By: dmgreen
      
      Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69437
      f649f24d
    • Roger Ferrer's avatar
      [RISCV] Fix evaluation of %pcrel_lo · 41449c58
      Roger Ferrer authored
      The following testcase
      
        function:
        .Lpcrel_label1:
        	auipc	a0, %pcrel_hi(other_function)
        	addi	a1, a0, %pcrel_lo(.Lpcrel_label1)
        	.p2align	2          # Causes a new fragment to be emitted
      
        	.type	other_function,@function
        other_function:
        	ret
      
      exposes an odd behaviour in which only the %pcrel_hi relocation is
      evaluated but not the %pcrel_lo.
      
        $ llvm-mc -triple riscv64 -filetype obj t.s | llvm-objdump  -d -r -
      
        <stdin>:	file format ELF64-riscv
      
        Disassembly of section .text:
        0000000000000000 function:
               0:	17 05 00 00	auipc	a0, 0
               4:	93 05 05 00	mv	a1, a0
        		0000000000000004:  R_RISCV_PCREL_LO12_I	other_function+4
      
        0000000000000008 other_function:
               8:	67 80 00 00	ret
      
      The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we
      only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we
      consider the section. This usually works but there are cases where the
      section may still be the same but the fragment may be another one. In
      that case we end forcing a %pcrel_lo relocation without any %pcrel_hi.
      
      This patch makes RISCVAsmBackend::shouldForceRelocation use the section,
      if any, to determine if the relocation must be forced or not.
      
      Differential Revision: https://reviews.llvm.org/D60657
      41449c58
    • Daniil Suchkov's avatar
      [NFC][IndVarS] Adjust a comment · 7b9f5401
      Daniil Suchkov authored
      (test commit)
      7b9f5401
    • Roman Lebedev's avatar
      [CR] ConstantRange::sshl_sat(): check sigdness of the min/max, not ranges · 72a21ad6
      Roman Lebedev authored
      This was pointed out in review,
      but forgot to stage this change into the commit itself..
      72a21ad6
    • Roman Lebedev's avatar
      [ConstantRange] Add `ushl_sat()`/`sshl_sat()` methods. · e0ea842b
      Roman Lebedev authored
      Summary:
      To be used in `ConstantRange::shlWithNoOverflow()`,
      may in future be useful for when saturating shift/mul ops are added.
      
      Unlike `ConstantRange::shl()`, these are precise.
      
      Reviewers: nikic, spatel, reames
      
      Reviewed By: nikic
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D69960
      e0ea842b
    • Yonghong Song's avatar
      [BPF] turn on -mattr=+alu32 for cpu version v3 and later · 6b8baf30
      Yonghong Song authored
      -mattr=+alu32 has shown good performance vs. without this attribute.
      Based on discussion at
        https://lore.kernel.org/bpf/1ec37838-966f-ec0b-5223-ca9b6eb0860d@fb.com/T/#t
      cpu version v3 should support -mattr=+alu32.
      This patch enabled alu32 if cpu version is v3, either specified by user
      or probed by the llvm.
      
      Differential Revision: https://reviews.llvm.org/D69957
      6b8baf30
    • Nemanja Ivanovic's avatar
      [PowerPC] Option for enabling absolute jumptables with command line · 9af28400
      Nemanja Ivanovic authored
      This option allows the user to specify the use of absolute jumptables instead
      of relative which is the default on most PPC subtargets.
      
      Patch by Kamauu Bridgeman
      
      Differential revision: https://reviews.llvm.org/D69108
      9af28400
    • Shu-Chun Weng's avatar
      [llvm/test] Update test comments · 79367983
      Shu-Chun Weng authored
      79367983
    • Fangrui Song's avatar
    • Craig Topper's avatar
      [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement · 6749dc34
      Craig Topper authored
      x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64.
      
      I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx.
      
      Differential Revision: https://reviews.llvm.org/D69964
      6749dc34
    • Sanjay Patel's avatar
      2f32da3d
  2. Nov 07, 2019
Loading