Skip to content
  1. Jul 09, 2018
    • Diego Caballero's avatar
      [VPlan][LV] Introduce condition bit in VPBlockBase · d0953014
      Diego Caballero authored
      This patch introduces a VPValue in VPBlockBase to represent the condition
      bit that is used as successor selector when a block has multiple successors.
      This information wasn't necessary until now, when we are about to introduce
      outer loop vectorization support in VPlan code gen.
      
      Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
      
      Reviewed By: fhahn
      
      Differential Revision: https://reviews.llvm.org/D48814
      
      llvm-svn: 336554
      d0953014
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions. · d3efb59f
      Sander de Smalen authored
      This patch adds support for the following instructions:
      
        CNTB CNTH - Determine the number of active elements implied by
        CNTW CNTD   the named predicate constant, multiplied by an
                    immediate, e.g.
      
                      cnth x0, vl8, #16
      
        CNTP      - Count active predicate elements, e.g.
                      cntp  x0, p0, p1.b
      
                    counts the number of active elements in p1, predicated
                    by p0, and stores the result in x0.
      
      llvm-svn: 336552
      d3efb59f
    • Xin Tong's avatar
      [CVP] Handle calls with void return value. No need to create CVPLattice state for it. · b467233d
      Xin Tong authored
      Summary:
      Tests: 10
      Metric: compile_time
      
      Program                                         unpatch-result  patch-result diff
      
      Bullet/bullet                                  32.39           30.54        -5.7%
      SPASS/SPASS                                    18.14           17.25        -4.9%
      mafft/pairlocalalign                           12.10           11.64        -3.8%
      ClamAV/clamscan                                19.21           19.63         2.2%
      7zip/7zip-benchmark                            49.55           48.85        -1.4%
      kimwitu++/kc                                   15.68           15.87         1.2%
      lencod/lencod                                  21.13           21.34         1.0%
      consumer-typeset/consumer-typeset              13.65           13.62        -0.2%
      tramp3d-v4/tramp3d-v4                          29.88           29.92         0.1%
      sqlite3/sqlite3                                18.48           18.46        -0.1%
             unpatch-result  patch-result       diff
      count  10.000000       10.000000     10.000000
      mean   23.022000       22.712400    -0.011671
      std    11.362831       11.094183     0.027338
      min    12.104000       11.640000    -0.057298
      25%    16.299000       16.214000    -0.032282
      50%    18.844000       19.048000    -0.001350
      75%    27.689000       27.774000     0.007752
      max    49.552000       48.852000     0.021861
      
      I also tested only this pass by concatenating all the code from the
      llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
      for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.
      
      Before patch (opt -time-passes -called-value-propagation):
      ============
      ===-------------------------------------------------------------------------===
       ... Pass execution timing report ...
      ===-------------------------------------------------------------------------===
       Total Execution Time: 3.8303 seconds (3.8279 wall clock)
      
       ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
      Name ---
       2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
      Writer
       0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
      Value Propagation
       0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
      Verifier
       3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total
      
      After patch (opt -time-passes -called-value-propagation):
      ============
      ===-------------------------------------------------------------------------===
       ... Pass execution timing report ...
      ===-------------------------------------------------------------------------===
       Total Execution Time: 3.6605 seconds (3.6579 wall clock)
      
       ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
      Name ---
       2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
      Writer
       0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
      Value Propagation
       0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
      Verifier
       3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total
      
      Reviewers: davide, mssimpso
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D49078
      
      llvm-svn: 336551
      b467233d
    • Stefan Pintilie's avatar
      [Power9] Add __float128 support for compare operations · 3d76326d
      Stefan Pintilie authored
      Added handling for the select f128.
      
      Differential Revision: https://reviews.llvm.org/D48294
      
      llvm-svn: 336548
      3d76326d
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for remaining shift instructions. · 813b21e3
      Sander de Smalen authored
      This patch completes support for shifts, which include:
      - LSL   - Logical Shift Left
      - LSLR  - Logical Shift Left, Reversed form
      - LSR   - Logical Shift Right
      - LSRR  - Logical Shift Right, Reversed form
      - ASR   - Arithmetic Shift Right
      - ASRR  - Arithmetic Shift Right, Reversed form
      - ASRD  - Arithmetic Shift Right for Divide
      
      In the following variants:
      
      - Predicated shift by immediate - ASR, LSL, LSR, ASRD
        e.g.
          asr z0.h, p0/m, z0.h, #1
      
        (active lanes of z0 shifted by #1)
      
      - Unpredicated shift by immediate - ASR, LSL*, LSR*
        e.g.
          asr z0.h, z1.h, #1
      
        (all lanes of z1 shifted by #1, stored in z0)
      
      - Predicated shift by vector - ASR, LSL*, LSR*
        e.g.
          asr z0.h, p0/m, z0.h, z1.h
      
        (active lanes of z0 shifted by z1, stored in z0)
      
      - Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
        e.g.
          lslr z0.h, p0/m, z0.h, z1.h
      
        (active lanes of z1 shifted by z0, stored in z0)
      
      - Predicated shift left/right by wide vector - ASR, LSL, LSR
        e.g.
          lsl z0.h, p0/m, z0.h, z1.d
      
        (active lanes of z0 shifted by wide elements of vector z1)
      
      - Unpredicated shift left/right by wide vector - ASR, LSL, LSR
        e.g.
          lsl z0.h, z1.h, z2.d
      
        (all lanes of z1 shifted by wide elements of z2, stored in z0)
      
      *Variants added in previous patches.
      
      llvm-svn: 336547
      813b21e3
    • Sanjay Patel's avatar
      [InstCombine] fix shuffle-of-binops transform to avoid poison/undef · 5bd36644
      Sanjay Patel authored
      As noted in D48987, there are many different ways for this transform to go wrong. 
      In particular, the poison potential for shifts means we have to more careful with those ops. 
      I added tests to make that behavior visible for all of the different cases that I could find.
      
      This is a partial fix. To make this review easier, I did not make changes for the single binop 
      pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations 
      noted with TODO comments. I'll follow-up once we're confident that things are correct here.
      
      The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.
      
      Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows 
      for some improvements to div/rem patterns, so there are wins along with the missed opportunities 
      and fixes.
      
      Differential Revision: https://reviews.llvm.org/D49047
      
      llvm-svn: 336546
      5bd36644
    • Stefan Maksimovic's avatar
      [mips] Addition of the [d]rem and [d]remu instructions · 0a23998f
      Stefan Maksimovic authored
      Related to http://reviews.llvm.org/D15772
      Depends on http://reviews.llvm.org/D16889
      Adds [D]REM[U] instructions.
      
      Patch By: Srdjan Obucina
      Contributions from: Simon Dardis
      
      Differential Revision: https://reviews.llvm.org/D17036
      
      llvm-svn: 336545
      0a23998f
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for TBL instruction. · 54077dcf
      Sander de Smalen authored
      Support for SVE's TBL instruction for programmable table
      lookup/permute using vector of element indices, e.g.
      
        tbl  z0.d, { z1.d }, z2.d
      
      stores elements from z1, indexed by elements from z2, into z0.
      
      llvm-svn: 336544
      54077dcf
    • Andrea Di Biagio's avatar
      [llvm-mca] report an error if the assembly sequence contains an unsupported instruction. · 88347796
      Andrea Di Biagio authored
      This is a short-term fix for PR38093.
      For now, we llvm::report_fatal_error if the instruction builder finds an
      unsupported instruction in the instruction stream.
      
      We need to revisit this fix once we start addressing PR38101.
      Essentially, we need a better framework for error handling.
      
      llvm-svn: 336543
      88347796
    • Sam McCall's avatar
      [Support] Allow JSON serialization of Optional<T> for supported T. · 7e4234fc
      Sam McCall authored
      This is ported from r333881 to JSON's new home.
      
      llvm-svn: 336542
      7e4234fc
    • Sam McCall's avatar
      [Support] Make JSON handle doubles and int64s losslessly · d93eaeb7
      Sam McCall authored
      Summary:
      This patch adds a new "integer" ValueType, and renames Number -> Double.
      This allows us to preserve the full precision of int64_t when parsing integers
      from the wire, or constructing from an integer.
      The API is unchanged, other than giving asInteger() a clearer contract.
      
      In addition, always output doubles with enough precision that parsing will
      reconstruct the same double.
      
      Reviewers: simon_tatham
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D46209
      
      llvm-svn: 336541
      d93eaeb7
    • Sam McCall's avatar
      [Support] Fix GCC compile after r336534 · fd75fc50
      Sam McCall authored
      llvm-svn: 336537
      fd75fc50
    • Chandler Carruth's avatar
      [PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in · ed296543
      Chandler Carruth authored
      r335553 with the non-trivial unswitching of switches.
      
      The code correctly updated most aspects of the CFG and analyses, but
      missed some crucial aspects:
      1) When multiple cases have the same successor, we unswitch that
         a single time and replace the switch with a direct branch. The CFG
         here is correct, but the target of this direct branch may have had
         a PHI node with multiple entries in it.
      2) When we still have to clone a successor of the switch into an
         unswitched copy of the loop, we'll delete potentially multiple edges
         entering this successor, not just one.
      3) We also have to delete multiple edges entering the successors in the
         original loop when they have to be retained.
      4) When the "retained successor" *also* occurs as a case successor, we
         just assert failed everywhere. This doesn't happen very easily
         because its always valid to simply drop the case -- the retained
         successor for switches is always the default successor. However, it
         is likely possible through some contrivance of different loop passes,
         unrolling, and simplifying for this to occur in practice and
         certainly there is nothing "invalid" about the IR so this pass needs
         to handle it.
      5) In the case of #4, we also will replace these multiple edges with
         a direct branch much like in #1 and need to collapse the entries in
         any PHI nodes to a single enrty.
      
      All of this stems from the delightful fact that the same successor can
      show up in multiple parts of the switch terminator, and each of these
      are considered a distinct edge for the purpose of PHI nodes (and
      iterating the successors and predecessors) but not for unswitching
      itself, the dominator tree, or many other things. For the record,
      I intensely dislike this "feature" of the IR in large part because of
      the complexity it causes in passes like this. We already have a ton of
      logic building sets and handling duplicates, and we just had to add
      a bunch more.
      
      I've added a complex test case that covers all five of the above failure
      modes. I've also added a variation on it where #4 and #5 occur in loop
      exit, adding fun where we have an LCSSA PHI node with "multiple entries"
      despite have dedicated exits. There were no additional issues found by
      this, but it seems a useful corner case to cover with testing.
      
      One thing that working on all of this code has made painfully clear for
      me as well is how amazingly inefficient our PHI node representation is
      (in terms of the in-memory data structures and the APIs used to update
      them). This code has truly marvelous complexity bounds because every
      time we remove an entry from a PHI node we do a linear scan to find it
      and then a linear update to the data structure to remove it. We could in
      theory batch all of the PHI node updates into a single linear walk of
      the operands making this much more efficient, but the APIs fight hard
      against this and the fact that we have to handle duplicates in the
      peculiar manner we do (removing all but one in some cases) makes even
      implementing that very tedious and annoying. Anyways, none of this is
      new here or specific to loop unswitching. All code in LLVM that updates
      PHI node operands suffers from these problems.
      
      llvm-svn: 336536
      ed296543
    • Sam McCall's avatar
      Lift JSON library from clang-tools-extra/clangd to llvm/Support. · 6be38247
      Sam McCall authored
      Summary:
      This consists of four main parts:
       - an type json::Expr representing JSON values of dynamic kind, which can be
         composed, inspected, and modified
       - a JSON parser from string -> json::Expr
       - a JSON printer from json::Expr -> string, with optional pretty-printing
       - a convention for mapping json::Expr <=> native types (fromJSON/toJSON)
         Mapping functions are provided for primitives (e.g. int, vector) and the
         ObjectMapper helper helps implement fromJSON for struct/object types.
      
      Based on clangd's usage, a couple of places I'd appreciate review attention:
       - fromJSON returns only bool. A richer error-signaling mechanism may be useful
         to provide useful messages, or let recursive fromJSONs (containers/structs)
         do careful error recovery.
       - should json::obj be always explicitly written (like json::ary)
       - there's no streaming parse API. I suspect there are some simple wins like
         a callback API where the document is a long array, and each element is small.
         But this can probably be bolted on easily when we see the need.
      
      Reviewers: bkramer, labath
      
      Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D45753
      
      llvm-svn: 336534
      6be38247
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for ADR instruction. · c69944c6
      Sander de Smalen authored
      Supporting various addressing modes:
      - adr z0.s, [z0.s, z0.s]
      - adr z0.s, [z0.s, z0.s, lsl #<shift>]
      - adr z0.d, [z0.d, z0.d]
      - adr z0.d, [z0.d, z0.d, lsl #<shift>]
      - adr z0.d, [z0.d, z0.d, uxtw #<shift>]
      - adr z0.d, [z0.d, z0.d, sxtw #<shift>]
      
      Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar
      
      Reviewed By: SjoerdMeijer
      
      Differential Revision: https://reviews.llvm.org/D48870
      
      llvm-svn: 336533
      c69944c6
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for UZP and TRN instructions. · bd513b42
      Sander de Smalen authored
      This patch adds support for:
        UZP1  Concatenate even elements from two vectors
        UZP2  Concatenate  odd elements from two vectors
        TRN1  Interleave  even elements from two vectors
        TRN2  Interleave   odd elements from two vectors
      
      With variants for both data and predicate vectors, e.g.
        uzp1    z0.b, z1.b, z2.b
        trn2    p0.s, p1.s, p2.s
      
      llvm-svn: 336531
      bd513b42
    • Jonas Devlieghere's avatar
      [AccelTable] Provide abstraction for emitting DWARF5 accelerator tables. · 5e810a87
      Jonas Devlieghere authored
      When emitting the DWARF accelerator tables from dsymutil, we don't have
      a DwarfDebug instance and we use a custom class to represent Dwarf
      compile units. This patch adds an interface AccelTableWriterInfo to
      abstract these from the Dwarf5AccelTableWriter, so we can have a custom
      implementation for this in dsymutil.
      
      Differential revision: https://reviews.llvm.org/D49031
      
      llvm-svn: 336529
      5e810a87
    • Jonas Devlieghere's avatar
      [AccelTable] Dwarf5AccelTableEmitter -> Writer (NFC) · e60ca777
      Jonas Devlieghere authored
      Renames Dwarf5AccelTableEmitter to Dwarf5AccelTableWriter as suggested
      in D49031.
      
      llvm-svn: 336525
      e60ca777
    • Chijun Sima's avatar
      [PGOMemOPSize] Preserve the DominatorTree · 9e1e0c7b
      Chijun Sima authored
      Summary:
      PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
      When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.
      
      Reviewers: kuhar, davide, dmgreen
      
      Reviewed By: dmgreen
      
      Subscribers: mzolotukhin, vsk, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48914
      
      llvm-svn: 336522
      9e1e0c7b
    • Craig Topper's avatar
      [X86] Improve the message for some asserts. Remove an if that is guaranteed true by said asserts. · b8145ec6
      Craig Topper authored
      This replaces some asserts in lowerV2F64VectorShuffle with the similar asserts from lowerVIF64VectorShuffle which are more readable. The original asserts mentioned a blend, but there's no guarantee that it is a blend.
      
      Also remove an if that the asserts prove is always true. Mask[0] is always less than 2 and Mask[1] is always at least 2. Therefore (Mask[0] >= 2) + (Mask[1] >= 2) == 1 must wlays be true.
      
      llvm-svn: 336517
      b8145ec6
    • Craig Topper's avatar
      [X86] Remove an AddedComplexity line that seems unnecessary. · c98c675f
      Craig Topper authored
      It only existed on SSE and AVX version. AVX512 version didn't have it.
      
      I checked the generated table and this didn't seem necessary to creat a match preference.
      
      llvm-svn: 336516
      c98c675f
  2. Jul 08, 2018
  3. Jul 07, 2018
Loading