Skip to content
  1. Nov 24, 2018
  2. Nov 23, 2018
    • Nikita Popov's avatar
      [InstCombine] Simplify funnel shift with zero/undef operand to shift · 6e81d421
      Nikita Popov authored
      The following simplifications are implemented:
      
       * `fshl(X, 0, C) -> shl X, C%BW`
       * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
       * `fshl(0, X, C) -> lshr X, BW-C%BW`
       * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
       * `fshr(X, 0, C) -> shl X, (BW-C%BW)`
       * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
       * `fshr(0, X, C) -> lshr X, C%BW`
       * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)
      
      The simplification is only performed if the shift amount C is constant,
      because we can explicitly compute C%BW and BW-C%BW in this case.
      
      Differential Revision: https://reviews.llvm.org/D54778
      
      llvm-svn: 347505
      6e81d421
    • Evandro Menezes's avatar
      [TableGen] Emit more variant transitions · 079bf4b7
      Evandro Menezes authored
      `llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order
      to resolve the scheduling for variant instructions.  Otherwise, it aborts
      the building of the instruction model early.
      
      However, the scheduling model emitter in `TableGen` gives up too soon, unless
      all processors use only such predicates.
      
      In order to allow more processors to be used with `llvm-mca`, this patch
      emits scheduling transitions if any processor uses these predicates.  The
      transition emitted for the processors using legacy predicates is the one
      specified with `NoSchedPred`, which is based on `MCSchedPredicate`.
      
      Preferably, `llvm-mca` should instead assume a reasonable default when a
      variant transition is not based on `MCSchedPredicate` for a given processor.
      This issue should be revisited in the future.
      
      Differential revision: https://reviews.llvm.org/D54648
      
      llvm-svn: 347504
      079bf4b7
    • Andrea Di Biagio's avatar
      [llvm-mca] Refactor some of the logic in InstrBuilder, and add a verifyOperands method. · 7e32cc83
      Andrea Di Biagio authored
      With this change, InstrBuilder emits an error if the MCInst sequence contains an
      instruction with a variadic opcode, and a non-zero number of variadic operands.
      
      Currently we don't know how to correctly analyze variadic opcodes. The problem
      with variadic operands is that there is no information for them in the opcode
      descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands
      are defs, and which are uses.
      
      In future, we could try to conservatively assume that any extra register
      operands is both a register use and a register definition.
      
      This patch fixes a subtle bug in the evaluation of read/write operands for ARM
      VLD1 with implicit index update. Added test vld1-index-update.s
      
      llvm-svn: 347503
      7e32cc83
    • Sanjay Patel's avatar
      [DAG] consolidate shift simplifications · 7e119c04
      Sanjay Patel authored
      ...and use them to avoid creating obviously undef values as
      discussed in the post-commit thread for r347478.
      
      The diffs in vector div/rem show that we were missing real
      optimizations by creating bogus shift nodes.
      
      llvm-svn: 347502
      7e119c04
    • Sanjay Patel's avatar
      [x86] make test immune to oversized shift simplification · e0cc8763
      Sanjay Patel authored
      I'm not sure if this actually preserves the original intent
      of this test, but if we leave it as-is, the -1 (oversized)
      shift should be folded to undef and allow deleting half
      of the output.
      
      llvm-svn: 347501
      e0cc8763
    • Luke Cheeseman's avatar
      Revert r347490 as it breaks address sanitizer builds · 6db3a6a4
      Luke Cheeseman authored
      llvm-svn: 347499
      6db3a6a4
    • Oliver Stannard's avatar
      [ARM][AsmParser] Improve debug printing of parsed asm operands · 173bc2bb
      Oliver Stannard authored
      In ARMOperand::print:
      - Print human-readable register names, instead of numbers.
      - Print the correct names for IT condition masks (these were in the wrong order
        before).
      - Print all parts of memory operands, not just the base register.
      
      This makes the output of llvm-mc -show-inst-operands more readable.
      
      Differential revision: https://reviews.llvm.org/D54850
      
      llvm-svn: 347494
      173bc2bb
    • Andrea Di Biagio's avatar
      [llvm-mca][View] Improved Retire Control Unit Statistics. · 07a8255a
      Andrea Di Biagio authored
      RetireControlUnitStatistics now reports extra information about the ROB and the
      avg/maximum number of entries consumed over the entire simulation.
      
      Example:
        Retire Control Unit - number of cycles where we saw N instructions retired:
        [# retired], [# cycles]
         0,           109  (17.9%)
         1,           102  (16.7%)
         2,           399  (65.4%)
      
        Total ROB Entries:                64
        Max Used ROB Entries:             35  ( 54.7% )
        Average Used ROB Entries per cy:  32  ( 50.0% )
      
      Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
      reflect this change.
      
      llvm-svn: 347493
      07a8255a
    • Eugene Leviant's avatar
      Attempt to fix buildbot after r347489 · 972e3480
      Eugene Leviant authored
      llvm-svn: 347492
      972e3480
    • Luke Cheeseman's avatar
      Revert r343341 · d6dbd641
      Luke Cheeseman authored
      - Cannot reproduce the build failure locally and the build logs have
        been deleted.
      
      llvm-svn: 347490
      d6dbd641
    • Eugene Leviant's avatar
      [ThinLTO] Assembly representation of ReadOnly attribute · 009d833a
      Eugene Leviant authored
      Differential revision: https://reviews.llvm.org/D54754
      
      llvm-svn: 347489
      009d833a
    • Max Kazantsev's avatar
    • Sjoerd Meijer's avatar
      [ARM][NFC] codegen tests cleanup: remove dangling check prefixes · fc448cfd
      Sjoerd Meijer authored
      I am working on making FileCheck stricter (in D54769 and D53710) so that it
      issues diagnostics when there's something wrong with tests.
      
      This is a cleanup for dangling prefixes in the ARM codegen tests, e.g.:
      
      --check-prefixes=A,B
      
      where A occurs in the check file, but B doesn't. This can be innocent if A does
      all the required checking, but can also be a bug in that test if it results in
      the test actually not checking anything (if A for example only checks a common
      label). Test CodeGen/ARM/smml.ll is such an example.
      
      Differential Revision: https://reviews.llvm.org/D54842
      
      llvm-svn: 347487
      fc448cfd
    • Max Kazantsev's avatar
      Disable LoopSimplifyCFG terminator folding by default · e1c2dc27
      Max Kazantsev authored
      llvm-svn: 347486
      e1c2dc27
    • Max Kazantsev's avatar
      [LoopSimplifyCFG] Don't delete LCSSA Phis · cb8e2403
      Max Kazantsev authored
      When removing edges, we also update Phi inputs and may end up removing
      a Phi if it has only one input. We should not do it for edges that leave the current
      loop because these Phis are LCSSA Phis and need to be preserved.
      
      Thanks @dmgreen	for finding this!
      
      Differential Revision: https://reviews.llvm.org/D54841
      
      llvm-svn: 347484
      cb8e2403
    • Max Kazantsev's avatar
      [NFC] Add verification flags to tests · a10c1c74
      Max Kazantsev authored
      llvm-svn: 347483
      a10c1c74
    • Craig Topper's avatar
      [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading... · 0ec17884
      Craig Topper authored
      [LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type.
      
      This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together.
      
      But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code.
      
      The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.
      
      llvm-svn: 347482
      0ec17884
    • Fangrui Song's avatar
      [Object] Also treat STB_GNU_UNIQUE symbols as exported to other DSO · 32ebd731
      Fangrui Song authored
      All of STB_GLOBAL/STB_WEAK/STB_GNU_UNIQUE are treated as export symbols, see:
      
      glibc/elf/dl-lookup.c:do_lookup_x
      musl/ldso/dynlink.c OK_BINDS
      
      Though ld.so does not read binding, the currently used STV_DEFAULT or STV_PROTECTED is a good emulation of linker behavior.
      
      llvm-svn: 347481
      32ebd731
  3. Nov 22, 2018
    • Craig Topper's avatar
      [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to... · b2397633
      Craig Topper authored
      [LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type.
      
      SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that.
      
      The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.
      
      llvm-svn: 347479
      b2397633
    • Sanjay Patel's avatar
      [DAGCombiner] form 'not' ops ahead of shifts (PR39657) · 3e800192
      Sanjay Patel authored
      We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
      but that would not matter without this patch because DAGCombiner was 
      reversing that transform. I think we need this transform in the backend 
      regardless of what happens in IR to catch cases where the shift-xor 
      is formed late from GEP or other ops.
      
      https://rise4fun.com/Alive/NC1
      
        Name: shl
        Pre: (-1 << C2) == C1
        %shl = shl i8 %x, C2
        %r = xor i8 %shl, C1
        =>
        %not = xor i8 %x, -1
        %r = shl i8 %not, C2
        
        Name: shr
        Pre: (-1 u>> C2) == C1
        %sh = lshr i8 %x, C2
        %r = xor i8 %sh, C1
        =>
        %not = xor i8 %x, -1
        %r = lshr i8 %not, C2
      
      https://bugs.llvm.org/show_bug.cgi?id=39657
      
      llvm-svn: 347478
      3e800192
    • Vladimir Stefanovic's avatar
      Reland test/MC/Mips/reloc-directive-label-offset.s · b2c4d668
      Vladimir Stefanovic authored
      The test was reverted because it failed on
      llvm-clang-x86_64-expensive-checks-win builder, and that was because
      -DEXPENSIVE_CHECKS adds randomness to llvm::sort(), affecting the order of
      relocation table entries.
      Modified the test to not have two relocations at the same offset.
      
      llvm-svn: 347476
      b2c4d668
    • Andrea Di Biagio's avatar
      [llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI · 840f0326
      Andrea Di Biagio authored
      Also, try to minimize the number of queries to the memory queues to speedup the
      analysis.
      
      On average, this change gives a small 2% speedup. For memcpy-like kernels, the
      speedup is up to 5.5%.
      
      llvm-svn: 347469
      840f0326
    • Andrea Di Biagio's avatar
      [llvm-mca] Use a SmallVector instead of std::vector to track register reads/writes. NFCI · 13e1d207
      Andrea Di Biagio authored
      This avoids a heap allocation most of the times.
      This patch gives a small but consistent 3% speedup on a release build (up to ~5%
      on a debug build).
      
      llvm-svn: 347464
      13e1d207
    • Andrea Di Biagio's avatar
      [llvm-mca] Fix an invalid memory read introduced by r346487. · 1cb8a3c6
      Andrea Di Biagio authored
      This patch fixes an invalid memory read introduced by r346487.
      Before this patch, partial register write had to query the latency of the
      dependent full register write by calling a method on the full write descriptor.
      However, if the full write is from an already retired instruction, chances are
      that the EntryStage already reclaimed its memory.
      In some parial register write tests, valgrind was reporting an invalid
      memory read.
      
      This change fixes the invalid memory access problem. Writes are now responsible
      for tracking dependent partial register writes, and notify them in the event of
      instruction issued.
      That means, partial register writes no longer need to query their associated
      full write to check when they are ready to execute.
      
      Added test X86/BtVer2/partial-reg-update-7.s
      
      llvm-svn: 347459
      1cb8a3c6
    • Max Kazantsev's avatar
      [NFC] Assert that all blocks staying in loop are live · b565e609
      Max Kazantsev authored
      llvm-svn: 347458
      b565e609
    • Max Kazantsev's avatar
      [NFC] Ensure deterministic order of dead exit blocks · 56a24430
      Max Kazantsev authored
      llvm-svn: 347457
      56a24430
    • John Brawn's avatar
      [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR · d6e0ebea
      John Brawn authored
      A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
      BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
      BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.
      
      Fix this by making LowerBUILD_VECTOR not do this transformation for those
      vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
      constant vector. Doing that means we get a DUP, which we then need to recognise
      in ISel as a copy.
      
      llvm-svn: 347456
      d6e0ebea
    • Max Kazantsev's avatar
      [NFC] Simplify code by using standard exit blocks collection · d9f59f8c
      Max Kazantsev authored
      llvm-svn: 347454
      d9f59f8c
    • Chandler Carruth's avatar
      [TI removal] Leverage the fact that TerminatorInst is gone to create · e429c794
      Chandler Carruth authored
      a normal base class that provides all common "call" functionality.
      
      This merges two complex CRTP mixins for the common "call" logic and
      common operand bundle logic into a single, normal base class of
      `CallInst` and `InvokeInst`. Going forward, users can typically
      `dyn_cast<CallBase>` and use the resulting API. No more need for the
      `CallSite` wrapper. I'm planning to migrate current usage of the wrapper
      to directly use the base class and then it can be removed, but those are
      simpler and much more incremental steps. The big change is to introduce
      this abstraction into the type system.
      
      I've tried to do some basic simplifications of the APIs that I couldn't
      really help but touch as part of this:
      - I've tried to organize the attribute API and bundle API into groups to
        make understanding the API of `CallBase` easier. Without this,
        I wasn't able to navigate the API sanely for all of the ways I needed
        to modify it.
      - I've added what seem like more clear and consistent APIs for getting
        at the called operand. These ended up being especially useful to
        consolidate the *numerous* duplicated code paths trying to do this.
      - I've largely reworked the organization and implementation of the APIs
        for computing the argument operands as they needed to change to work
        with the new subclass approach.
      
      To minimize any cost associated with this abstraction, I've moved the
      operand layout in memory to store the called operand last. This makes
      its position relative to the end of the operand array the same,
      regardless of the subclass. It should make it much cheaper to reference
      from the `CallBase` abstraction, and this is likely one of the most
      frequent things to query.
      
      We do still pay one abstraction penalty here: we have to branch to
      determine whether there are 0 or 2 extra operands when computing the end
      of the argument operand sequence. However, that seems both rare and
      should optimize well. I've implemented this in a way specifically
      designed to allow it to optimize fairly well. If this shows up in
      profiles, we can add overrides of the relevant methods to the subclasses
      that bypass this penalty. It seems very unlikely that this will be an
      issue as the code was *already* dealing with an ever present abstraction
      of whether or not there are operand bundles, so this isn't the first
      branch to go into the computation.
      
      I've tried to remove as much of the obvious vestigial API surface of the
      old CRTP implementation as I could, but I suspect there is further
      cleanup that should now be possible, especially around the operand
      bundle APIs. I'm leaving all of that for future work in this patch as
      enough things are changing here as-is.
      
      One thing that made this harder for me to reason about and debug was the
      pervasive use of unsigned values in subtraction and other arithmetic
      computations. I had to debug more than one unintentional wrap. I've
      switched a few of these to use `int` which seems substantially simpler,
      but I've held back from doing this more broadly to avoid creating
      confusing divergence within a single class's API.
      
      I also worked to remove all of the magic numbers used to index into
      operands, putting them behind named constants or putting them into
      a single method with a comment and strictly using the method elsewhere.
      This was necessary to be able to re-layout the operands as discussed
      above.
      
      Thanks to Ben for reviewing this (somewhat large and awkward) patch!
      
      Differential Revision: https://reviews.llvm.org/D54788
      
      llvm-svn: 347452
      e429c794
    • Haojian Wu's avatar
      Revert r343473 "Move llvm util dependencies from clang-tools-extra to add_lit_target." · 36f48c55
      Haojian Wu authored
      Summary:
      It will cause test tools `FileCheck`, `count`, `not` being built blindly, these
      dependencies should move back to clang-tools-extra.
      
      Reviewers: mgorny
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D54797
      
      llvm-svn: 347448
      36f48c55
    • Diana Picus's avatar
      [ARM GlobalISel] Add test for BFC. NFCI · 6b376557
      Diana Picus authored
      r334871 has made it possible for TableGen'erated code to select BFC, but
      it has not added a test for it on the ARM side. Add it now to make sure
      we don't introduce regressions if we ever change anything about that
      rule.
      
      llvm-svn: 347447
      6b376557
    • Jonas Paulsson's avatar
      [SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. · 96782c2c
      Jonas Paulsson authored
      Implement getIntrinsicInstrCost() and return costs reflecting that bswap can
      be done with a vperm per vector register.
      
      Review: Ulrich Weigand
      https://reviews.llvm.org/D54789
      
      llvm-svn: 347445
      96782c2c
    • Fangrui Song's avatar
      [llvm-size] Use empty() and range-based for loop. NFC · 64449e6f
      Fangrui Song authored
      llvm-svn: 347441
      64449e6f
    • Evandro Menezes's avatar
      [llvm-mca] Add test case (NFC) · d0792170
      Evandro Menezes authored
      Add test case that will serve as the base for D54820.
      
      llvm-svn: 347440
      d0792170
Loading