Skip to content
  1. Aug 20, 2017
  2. Aug 19, 2017
    • Victor Leschuk's avatar
      Set init value for ScalarEvolution::BackedgeTakenInfo::MaxOrZero · ee3292c5
      Victor Leschuk authored
      Otherwise it can be used uninitialized in move ctor.
      
      llvm-svn: 311262
      ee3292c5
    • Martin Storsjö's avatar
      [ARM] Factorize the calculation of WhichResult in isV*Mask. NFC. · d606d2a6
      Martin Storsjö authored
      Differential Revision: https://reviews.llvm.org/D36930
      
      llvm-svn: 311260
      d606d2a6
    • Martin Storsjö's avatar
      [ARM] Check the right order for halves of VZIP/VUZP if both parts are used · 91522ffa
      Martin Storsjö authored
      This is the exact same fix as in SVN r247254. In that commit, the fix was
      applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue
      is present for VZIP/VUZP as well.
      
      This fixes PR33921.
      
      Differential Revision: https://reviews.llvm.org/D36899
      
      llvm-svn: 311258
      91522ffa
    • Teresa Johnson's avatar
      Fix bot failures by requiring x86 target · b225ad05
      Teresa Johnson authored
      The tests added in r311254 require a target triple since they are
      running through code generation. Fix bot failures by requiring
      an x86 target.
      
      llvm-svn: 311257
      b225ad05
    • Konstantin Zhuravlyov's avatar
      AMDGPU/NFC: Reorder functions in SIMemoryLegalizer: · 89377c44
      Konstantin Zhuravlyov authored
        - Move *load* functions before *atomic* functions
        - Move *store* functions before *atomic* functions
      
      llvm-svn: 311256
      89377c44
    • Jatin Bhateja's avatar
      [DAGCombiner] Extending pattern detection for vector shuffle. · 6b4c2056
      Jatin Bhateja authored
          Summary:
          If all the operands of a BUILD_VECTOR extract elements from same vector then split the
          vector efficiently based on the maximum vector access index.
      
          Reviewers: zvi, delena, RKSimon, thakis
      
          Reviewed By: RKSimon
      
          Subscribers: chandlerc, eladcohen, llvm-commits
      
          Differential Revision: https://reviews.llvm.org/D35788
      
      llvm-svn: 311255
      6b4c2056
    • Teresa Johnson's avatar
      [ThinLTO] Fix ThinLTO crash · 73305f82
      Teresa Johnson authored
      Summary:
      Follow up to fix in r311023, which fixed the case where the combined
      index is written to disk. The same samplePGO logic exists for the
      in-memory index when computing imports, so we need to filter out
      GlobalVariable summaries there too.
      
      Reviewers: davidxl
      
      Subscribers: inglorion, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D36919
      
      llvm-svn: 311254
      73305f82
    • Craig Topper's avatar
      [X86] Remove an unnecessary alignment restriction from MOVDDUP pattern. · 6e70f7cd
      Craig Topper authored
      The SSE MOVDDUP instruction only loads 64-bits with no alignment restriction.
      
      llvm-svn: 311253
      6e70f7cd
    • Jatin Bhateja's avatar
      Revert rL311247 : To rectify commit message. · 66f7958e
      Jatin Bhateja authored
      Summary: This reverts commit rL311247.
      
      Differential Revision: https://reviews.llvm.org/D36927
      
      llvm-svn: 311252
      66f7958e
    • Jatin Bhateja's avatar
      Merge branch 'arcpatch-D35788' · 6f0d0d23
      Jatin Bhateja authored
      llvm-svn: 311247
      6f0d0d23
    • Jatin Bhateja's avatar
      Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase." · 1c568637
      Jatin Bhateja authored
      Summary:
      
      This reverts commit rL311242.
      
      Differential Revision: https://reviews.llvm.org/D36924
      
      llvm-svn: 311246
      1c568637
    • Jatin Bhateja's avatar
      Extension of shuffle vector pattern detection, updating post rebase. · 313f97dd
      Jatin Bhateja authored
      llvm-svn: 311242
      313f97dd
    • Victor Leschuk's avatar
      revert failing test · ee7d232a
      Victor Leschuk authored
      llvm-svn: 311238
      ee7d232a
    • Victor Leschuk's avatar
      Add temporary test to verify that win10 builder hangs on error · ba0954c4
      Victor Leschuk authored
      llvm-svn: 311236
      ba0954c4
    • Victor Leschuk's avatar
      Temporary mark lit :: shtest-format as unsupported on windows · 59dc64f3
      Victor Leschuk authored
      When run manually it fails, but when run under buildbot it causes hang.
      
      llvm-svn: 311230
      59dc64f3
    • Chandler Carruth's avatar
      [Inliner] Fix a nasty bug when inlining a non-recursive trace of · 4f3aa29a
      Chandler Carruth authored
      a function into itself.
      
      We tried to fix this before in r306495 but that got reverted as the
      assert was actually hit.
      
      This fixes the original bug (which we seem to have lost track of with
      the revert) by blocking a second remapping when the function being
      inlined is also the caller and the remapping could succeed but
      erroneously.
      
      The included test case would actually load from an inlined copy of the
      alloca before this change, failing to load the stored value and
      miscompiling.
      
      Many thanks to Richard Smith for diagnosing a user miscompile to this
      bug, and to Kyle for the first attempt and initial analysis and David Li
      for remembering the issue and how to fix it and suggesting the patch.
      I'm just stitching it together and landing it. =]
      
      llvm-svn: 311229
      4f3aa29a
    • Chandler Carruth's avatar
      [Inliner] Clean up a test case a bit to make it more clear what is being · 2a80fddf
      Chandler Carruth authored
      tested and why.
      
      llvm-svn: 311228
      2a80fddf
    • Chandler Carruth's avatar
      [SLP] Fix an unused variable warning in non-asserts builds. · 1f821259
      Chandler Carruth authored
      llvm-svn: 311227
      1f821259
    • Chandler Carruth's avatar
      [x86] Teach the cmov converter to aggressively convert cmovs with memory · 93a64552
      Chandler Carruth authored
      operands into control flow.
      
      We have seen periodically performance problems with cmov where one
      operand comes from memory. On modern x86 processors with strong branch
      predictors and speculative execution, this tends to be much better done
      with a branch than cmov. We routinely see cmov stalling while the load
      is completed rather than continuing, and if there are subsequent
      branches, they cannot be speculated in turn.
      
      Also, in many (even simple) cases, macro fusion causes the control flow
      version to be fewer uops.
      
      Consider the IACA output for the initial sequence of code in a very hot
      function in one of our internal benchmarks that motivates this, and notice the
      micro-op reduction provided.
      Before, SNB:
      ```
      Throughput Analysis Report
      --------------------------
      Block Throughput: 2.20 Cycles       Throughput Bottleneck: Port1
      
      | Num Of |              Ports pressure in cycles               |    |
      |  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
      ---------------------------------------------------------------------
      |   1    |           | 1.0 |           |           |     |     | CP | mov rcx, rdi
      |   0*   |           |     |           |           |     |     |    | xor edi, edi
      |   2^   | 0.1       | 0.6 | 0.5   0.5 | 0.5   0.5 |     | 0.4 | CP | cmp byte ptr [rsi+0xf], 0xf
      |   1    |           |     | 0.5   0.5 | 0.5   0.5 |     |     |    | mov rax, qword ptr [rsi]
      |   3    | 1.8       | 0.6 |           |           |     | 0.6 | CP | cmovbe rax, rdi
      |   2^   |           |     | 0.5   0.5 | 0.5   0.5 |     | 1.0 |    | cmp byte ptr [rcx+0xf], 0x10
      |   0F   |           |     |           |           |     |     |    | jb 0xf
      Total Num Of Uops: 9
      ```
      After, SNB:
      ```
      Throughput Analysis Report
      --------------------------
      Block Throughput: 2.00 Cycles       Throughput Bottleneck: Port5
      
      | Num Of |              Ports pressure in cycles               |    |
      |  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
      ---------------------------------------------------------------------
      |   1    | 0.5       | 0.5 |           |           |     |     |    | mov rax, rdi
      |   0*   |           |     |           |           |     |     |    | xor edi, edi
      |   2^   | 0.5       | 0.5 | 1.0   1.0 |           |     |     |    | cmp byte ptr [rsi+0xf], 0xf
      |   1    | 0.5       | 0.5 |           |           |     |     |    | mov ecx, 0x0
      |   1    |           |     |           |           |     | 1.0 | CP | jnbe 0x39
      |   2^   |           |     |           | 1.0   1.0 |     | 1.0 | CP | cmp byte ptr [rax+0xf], 0x10
      |   0F   |           |     |           |           |     |     |    | jnb 0x3c
      Total Num Of Uops: 7
      ```
      The difference even manifests in a throughput cycle rate difference on Haswell.
      Before, HSW:
      ```
      Throughput Analysis Report
      --------------------------
      Block Throughput: 2.00 Cycles       Throughput Bottleneck: FrontEnd
      
      | Num Of |                    Ports pressure in cycles                     |    |
      |  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
      ---------------------------------------------------------------------------------
      |   0*   |           |     |           |           |     |     |     |     |    | mov rcx, rdi
      |   0*   |           |     |           |           |     |     |     |     |    | xor edi, edi
      |   2^   |           |     | 0.5   0.5 | 0.5   0.5 |     | 1.0 |     |     |    | cmp byte ptr [rsi+0xf], 0xf
      |   1    |           |     | 0.5   0.5 | 0.5   0.5 |     |     |     |     |    | mov rax, qword ptr [rsi]
      |   3    | 1.0       | 1.0 |           |           |     |     | 1.0 |     |    | cmovbe rax, rdi
      |   2^   | 0.5       |     | 0.5   0.5 | 0.5   0.5 |     |     | 0.5 |     |    | cmp byte ptr [rcx+0xf], 0x10
      |   0F   |           |     |           |           |     |     |     |     |    | jb 0xf
      Total Num Of Uops: 8
      ```
      After, HSW:
      ```
      Throughput Analysis Report
      --------------------------
      Block Throughput: 1.50 Cycles       Throughput Bottleneck: FrontEnd
      
      | Num Of |                    Ports pressure in cycles                     |    |
      |  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
      ---------------------------------------------------------------------------------
      |   0*   |           |     |           |           |     |     |     |     |    | mov rax, rdi
      |   0*   |           |     |           |           |     |     |     |     |    | xor edi, edi
      |   2^   |           |     | 1.0   1.0 |           |     | 1.0 |     |     |    | cmp byte ptr [rsi+0xf], 0xf
      |   1    |           | 1.0 |           |           |     |     |     |     |    | mov ecx, 0x0
      |   1    |           |     |           |           |     |     | 1.0 |     |    | jnbe 0x39
      |   2^   | 1.0       |     |           | 1.0   1.0 |     |     |     |     |    | cmp byte ptr [rax+0xf], 0x10
      |   0F   |           |     |           |           |     |     |     |     |    | jnb 0x3c
      Total Num Of Uops: 6
      ```
      
      Note that this cannot be usefully restricted to inner loops. Much of the
      hot code we see hitting this is not in an inner loop or not in a loop at
      all. The optimization still remains effective and indeed critical for
      some of our code.
      
      I have run a suite of internal benchmarks with this change. I saw a few
      very significant improvements and a very few minor regressions,
      but overall this change rarely has a significant effect. However, the
      improvements were very significant, and in quite important routines
      responsible for a great deal of our C++ CPU cycles. The gains pretty
      clealy outweigh the regressions for us.
      
      I also ran the test-suite and SPEC2006. Only 11 binaries changed at all
      and none of them showed any regressions.
      
      Amjad Aboud at Intel also ran this over their benchmarks and saw no
      regressions.
      
      Differential Revision: https://reviews.llvm.org/D36858
      
      llvm-svn: 311226
      93a64552
    • Chandler Carruth's avatar
      [x86] Refactor the CMOV conversion pass to be more flexible. · e3b3547e
      Chandler Carruth authored
      The primary thing that this accomplishes is to allow future re-use of
      these routines in more contexts and clarify the behavior w.r.t. loops.
      For example, if handling outer loops is desirable, doing so in
      a inside-out order becomes straight forward because it walks the loop
      nest itself (rather than walking the function's basic blocks) and
      de-couples the CMOV rewriting from the loop structure as there isn't
      actually anything loop-specific about this transformation.
      
      This patch should be essentially a no-op. It potentially changes the
      order in which we visit the inner loops, but otherwise should merely set
      the stage for subsequent changes.
      
      Differential Revision: https://reviews.llvm.org/D36783
      
      llvm-svn: 311225
      e3b3547e
    • Dinar Temirbulatov's avatar
      [SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary... · 7aff8cfa
      Dinar Temirbulatov authored
      [SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI.
      
      llvm-svn: 311223
      7aff8cfa
    • Dinar Temirbulatov's avatar
      [SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands,... · e3ce1b45
      Dinar Temirbulatov authored
      [SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions.
      
      Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab
      
      Subscribers: llvm-commits, rengolin
      
      Differential Revision: https://reviews.llvm.org/D36766
      
      llvm-svn: 311221
      e3ce1b45
    • Matthias Braun's avatar
      ARMRegsiterInfo: Define more ssub indexes; NFC · 91bd3ad1
      Matthias Braun authored
      This doesn't really change anything as Tablegen would have inferred
      those indices anyway; defining them gives us shorter names that are
      easier to read while debugging (i.e. "ssub_4" rather than
      "dsub2_then_ssub_0")
      
      llvm-svn: 311218
      91bd3ad1
    • Adrian Prantl's avatar
      Filter out non-constant DIGlobalVariableExpressions reachable via the CU · 2116dd36
      Adrian Prantl authored
      They won't affect the DWARF output, but they will mess with the
      sorting of the fragments. This fixes the crash reported in PR34159.
      
      https://bugs.llvm.org/show_bug.cgi?id=34159
      
      llvm-svn: 311217
      2116dd36
    • Eric Beckmann's avatar
      llvm-mt: Merge manifest namespaces. · 91d8af53
      Eric Beckmann authored
      mt.exe performs a tree merge where certain element nodes are combined
      into one.  This introduces the possibility of xml namespaces conflicting
      with each other.  The original mt.exe has a hierarchy whereby certain
      namespace names can override others, and nodes that would then end up in
      ambigious namespaces have their namespaces explicitly defined.  This
      namespace handles this merging process.
      
      llvm-svn: 311215
      91d8af53
    • Eugene Zelenko's avatar
      [Analysis] Fix some Clang-tidy modernize and Include What You Use warnings;... · be709f2c
      Eugene Zelenko authored
      [Analysis] Fix some Clang-tidy modernize and  Include What You Use warnings; other minor fixes (NFC).
      
      llvm-svn: 311212
      be709f2c
    • Xinliang David Li's avatar
      Fix comment /NFC · 0d07f9d6
      Xinliang David Li authored
      llvm-svn: 311209
      0d07f9d6
    • Xinliang David Li's avatar
      [Profile] backward propagate profile info in JumpThreading · 709ffe17
      Xinliang David Li authored
      Differential Revsion: http://reviews.llvm.org/D36864
      
      llvm-svn: 311208
      709ffe17
    • Amjad Aboud's avatar
      88ffa3af
    • Max Kazantsev's avatar
      [IRCE] Fix buggy behavior in Clamp · 0aaf8c16
      Max Kazantsev authored
      Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations.
      In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned
      predicates, but we did not check this for `S` which can in theory be negative.
      
      This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative,
      and it adds a test where such situation may lead to incorrect conditions calculation.
      
      Differential Revision: https://reviews.llvm.org/D36873
      
      llvm-svn: 311205
      0aaf8c16
  3. Aug 18, 2017
Loading