Skip to content
  1. Aug 30, 2019
    • Duncan P. N. Exon Smith's avatar
      FileManager: Remove ShouldCloseOpenFile argument from getBufferForFile, NFC · 122705b9
      Duncan P. N. Exon Smith authored
      Remove this dead code.  We always close it.
      
      llvm-svn: 370488
      122705b9
    • Bob Haarman's avatar
      [lld-link] implement -start-lib and -end-lib · fd7569c8
      Bob Haarman authored
      Summary:
      This implements -start-lib and -end-lib flags for lld-link, analogous
      to the similarly named options in ld.lld. Object files after
      -start-lib are included in the link only when needed to resolve
      undefined symbols. The -end-lib flag goes back to the normal behavior
      of always including object files in the link. This mimics the
      semantics of static libraries, but without needing to actually create
      the archive file.
      
      Reviewers: ruiu, smeenai, MaskRay
      
      Reviewed By: ruiu, MaskRay
      
      Subscribers: akhuang, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66848
      
      llvm-svn: 370487
      fd7569c8
    • Whitney Tsang's avatar
      [INSTRUCTIONS] Add support of const for getLoadStorePointerOperand() and · b8a35649
      Whitney Tsang authored
      getLoadStorePointerOperand().
      Reviewer: hsaito, sebpop, reames, hfinkel, mkuper, bogner, haicheng,
      arsenm, lattner, chandlerc, grosser, rengolin
      Reviewed By: reames
      Subscribers: wdng, llvm-commits, bmahjour
      Tag: LLVM
      Differential Revision: https://reviews.llvm.org/D66595
      
      llvm-svn: 370486
      b8a35649
    • Johannes Doerfert's avatar
      [Attributor] Fix: do not pretend to preserve the CFG · 659a8707
      Johannes Doerfert authored
      llvm-svn: 370485
      659a8707
    • Craig Topper's avatar
      [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site. · 66f03ba1
      Craig Topper authored
      I'm looking at unfolding broadcast loads on AVX512 which will
      require refactoring this code to select broadcast opcodes instead
      of regular load/stores in some cases. Merging them to avoid
      further complicating their interfaces.
      
      llvm-svn: 370484
      66f03ba1
    • Jonas Devlieghere's avatar
      [lit] Fix my earlier bogus fix to not set DYLD_LIBRARY_PATH with Asan. · a053ae0f
      Jonas Devlieghere authored
      My follow-up commit to mess with DYLD_LIBRARY_PATH was bogus for two
      reasons:
      
       - The condition was inverted.
       - We were checking the OS's environment, instead of the config's.
      
      Two wrongs don't make a right, but the second mistake meant that the
      sanitizer bot passed.
      
      llvm-svn: 370483
      a053ae0f
    • Johan Vikstrom's avatar
      [clangd] Add highlighting for macro expansions. · becbdc66
      Johan Vikstrom authored
      Summary: https://github.com/clangd/clangd/issues/134
      
      Reviewers: hokein, ilya-biryukov
      
      Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66995
      
      llvm-svn: 370482
      becbdc66
    • Nandor Licker's avatar
      Revert [Clang Interpreter] Initial patch for the constexpr interpreter · 5c8b94a6
      Nandor Licker authored
      This reverts r370476 (git commit a5590950)
      
      llvm-svn: 370481
      5c8b94a6
    • Johannes Doerfert's avatar
      [Attributor] Use existing function information for the call site · 3fac668d
      Johannes Doerfert authored
      Summary:
      Instead of recomputing information for call sites we now use the
      function information directly. This is always valid and once we have
      call site specific information we can improve here.
      
      This patch also bootstraps attributes that are created on-demand through
      an initial update call. Information that is known will then directly be
      available in the new attribute without causing an iteration delay.
      
      The tests show how this improves the iteration count.
      
      Reviewers: sstefan1, uenoku
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66781
      
      llvm-svn: 370480
      3fac668d
    • Johannes Doerfert's avatar
      [Attributor] Manifest load/store alignment generally · 81df452d
      Johannes Doerfert authored
      Summary:
      Any pointer could have load/store users not only floating ones so we
      move the manifest logic for alignment into the AAAlignImpl class.
      
      Reviewers: uenoku, sstefan1
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66922
      
      llvm-svn: 370479
      81df452d
    • Simon Pilgrim's avatar
      [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. · c2fed1dc
      Simon Pilgrim authored
      llvm-svn: 370478
      c2fed1dc
    • Nandor Licker's avatar
      [Clang Interpreter] Initial patch for the constexpr interpreter · a5590950
      Nandor Licker authored
      Summary:
      This patch introduces the skeleton of the constexpr interpreter,
      capable of evaluating a simple constexpr functions consisting of
      if statements. The interpreter is described in more detail in the
      RFC. Further patches will add more features.
      
      Reviewers: Bigcheese, jfb, rsmith
      
      Subscribers: bruno, uenoku, ldionne, Tyker, thegameg, tschuett, dexonsmith, mgorny, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D64146
      
      llvm-svn: 370476
      a5590950
    • Piotr Sobczak's avatar
      [InstCombine][AMDGPU] Simplify tbuffer loads · 67b97946
      Piotr Sobczak authored
      Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.
      
      Reviewers: arsenm, nhaehnle
      
      Reviewed By: arsenm
      
      Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66926
      
      llvm-svn: 370475
      67b97946
    • Sid Manning's avatar
      [llvm-nm] Small fix to Exected<StringRef> · aa0e8f96
      Sid Manning authored
      Differential Revision: https://reviews.llvm.org/D66976
      
      llvm-svn: 370474
      aa0e8f96
    • Johan Vikstrom's avatar
      [clangd] Added highlighting for structured bindings. · 268f45bf
      Johan Vikstrom authored
      Summary: Structured bindings are in a BindingDecl. The decl the declRefExpr points to are the BindingDecls. So this adds an additional if statement in the addToken function to highlight them.
      
      Reviewers: hokein, ilya-biryukov
      
      Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66738
      
      llvm-svn: 370473
      268f45bf
    • George Rimar's avatar
      [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther". · 4e71702c
      George Rimar authored
      Currenly we can encode the 'st_other' field of symbol using 3 fields.
      'Visibility' is used to encode STV_* values.
      'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
      'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.
      
      'st_other' field is used to encode symbol visibility and platform-dependent
      flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
      which are enumeration values and the Other part, which is different and inconsistent.
      
      For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
      (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
      STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)
      
      And for PPC64 the Other part might actually encode any value.
      
      This patch implements custom logic for handling the st_other and removes
      'Visibility' and 'StOther' fields.
      
      Here is an example of a new YAML style this patch allows:
      
      - Name:  foo
        Other: [ 0x4 ]
      - Name:  bar
        Other: [ STV_PROTECTED, 4 ]
      - Name:  zed
        Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]
      
      Differential revision: https://reviews.llvm.org/D66886
      
      llvm-svn: 370472
      4e71702c
    • Simon Pilgrim's avatar
      [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. · 33676696
      Simon Pilgrim authored
      llvm-svn: 370471
      33676696
    • Simon Pilgrim's avatar
      [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types. · 8e1989e7
      Simon Pilgrim authored
      This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.
      
      llvm-svn: 370468
      8e1989e7
    • Simon Atanasyan's avatar
      [mips] Merge common checkings under the same check prefix. NFC · 68f73bf2
      Simon Atanasyan authored
      llvm-svn: 370467
      68f73bf2
    • Luis Marques's avatar
      [RISCV] Fix a couple of tests' CHECKs · c2b3d527
      Luis Marques authored
      llvm-svn: 370466
      c2b3d527
    • Haojian Wu's avatar
      Remove an extra ";", NFC. · ed170c9b
      Haojian Wu authored
      llvm-svn: 370465
      ed170c9b
    • Amaury Sechet's avatar
      [X86] Add tests for rotate matching. NFC · 485760f4
      Amaury Sechet authored
      llvm-svn: 370464
      485760f4
    • Bjorn Pettersson's avatar
      [CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC · 22714592
      Bjorn Pettersson authored
      Summary:
      Found a couple of places in the code where all the PHI nodes
      of a MBB is updated, replacing references to one MBB by
      reference to another MBB instead.
      
      This patch simply refactors the code to use a common helper
      (MachineBasicBlock::replacePhiUsesWith) for such PHI node
      updates.
      
      Reviewers: t.p.northover, arsenm, uabelho
      
      Subscribers: wdng, hiraditya, jsji, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66750
      
      llvm-svn: 370463
      22714592
    • Pavel Labath's avatar
      [dotest] Finish removing -q · 9bad6639
      Pavel Labath authored
      One usage of this option remained, and caused dotest to error out if one
      happened to pass the -v flag.
      
      llvm-svn: 370462
      9bad6639
    • Gabor Marton's avatar
      [ASTImporter] Do not look up lambda classes · e3e83d70
      Gabor Marton authored
      Summary:
      Consider this code:
      ```
            void f() {
              auto L0 = [](){};
              auto L1 = [](){};
            }
      
      ```
      First we import `L0` then `L1`. Currently we end up having only one
      CXXRecordDecl for the two different lambdas. And that is a problem if
      the body of their op() is different. This happens because when we import
      `L1` then lookup finds the existing `L0` and since they are structurally
      equivalent we just map the imported L0 to be the counterpart of L1.
      
      We have the same problem in this case:
      ```
            template <typename F0, typename F1>
            void f(F0 L0 = [](){}, F1 L1 = [](){}) {}
      
      ```
      
      In StructuralEquivalenceContext we could distinquish lambdas only by
      their source location in these cases. But we the lambdas are actually
      structrually equivalent they differn only by the source location.
      
      Thus, the  solution is to disable lookup completely if the decl in
      the "from" context is a lambda.
      However, that could have other problems: what if the lambda is defined
      in a header file and included in several TUs? I think we'd have as many
      duplicates as many includes we have. I think we could live with that,
      because the lambda classes are TU local anyway, we cannot just access
      them from another TU.
      
      Reviewers: a_sidorin, a.sidorin, shafik
      
      Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66348
      
      llvm-svn: 370461
      e3e83d70
    • Simon Pilgrim's avatar
      [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros · 7cbf823f
      Simon Pilgrim authored
      Return a proper zero vector, just in case some elements are undef.
      
      Noticed by inspection after dealing with a similar issue in PR43159.
      
      llvm-svn: 370460
      7cbf823f
    • Simon Pilgrim's avatar
      Fix Wdocumentation warning. NFCI. · 01a3c25c
      Simon Pilgrim authored
      llvm-svn: 370459
      01a3c25c
    • Chris Jackson's avatar
      [llvm-objcopy] Allow the visibility of symbols created by --binary and · fa1fe937
      Chris Jackson authored
      --add-symbol to be specified with --new-symbol-visibility
      
      llvm-svn: 370458
      fa1fe937
    • Balazs Keri's avatar
      [ASTImporter] Propagate errors during import of overridden methods. · b4fd7d42
      Balazs Keri authored
      Summary:
      If importing overridden methods fails for a method it can be seen
      incorrectly as non-virtual. To avoid this inconsistency the method
      is marked with import error to avoid later use of it.
      
      Reviewers: martong, a.sidorin, shafik, a_sidorin
      
      Reviewed By: martong, shafik
      
      Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66933
      
      llvm-svn: 370457
      b4fd7d42
    • Hideto Ueno's avatar
      [Attributor] Implement AANoAliasCallSiteArgument initialization · 6381b143
      Hideto Ueno authored
      Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.
      
      Reviewers: jdoerfert, sstefan1
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66927
      
      llvm-svn: 370456
      6381b143
    • Shaurya Gupta's avatar
      [Clangd] ExtractFunction Added checks for broken control flow · 3b08a61f
      Shaurya Gupta authored
      Summary:
      - Added checks for broken control flow
      - Added unittests
      
      Reviewers: sammccall, kadircet
      
      Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66732
      
      llvm-svn: 370455
      3b08a61f
    • Roman Lebedev's avatar
      [LoopIdiomRecognize] BCmp loop idiom recognition · 5c9f3cfe
      Roman Lebedev authored
      Summary:
      @mclow.lists brought up this issue up in IRC.
      It is a reasonably common problem to compare some two values for equality.
      Those may be just some integers, strings or arrays of integers.
      
      In C, there is `memcmp()`, `bcmp()` functions.
      In C++, there exists `std::equal()` algorithm.
      One can also write that function manually.
      
      libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
      various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ
      
      libc++ does not do anything like that, it simply relies on simple C++'s
      `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)
      
      So likely, there exists a certain performance opportunities.
      Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
      is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}
      
      ```
      #include <algorithm>
      #include <cmath>
      #include <cstdint>
      #include <iterator>
      #include <limits>
      #include <random>
      #include <type_traits>
      #include <utility>
      #include <vector>
      
      #include "benchmark/benchmark.h"
      
      template <class T>
      bool equal(T* a, T* a_end, T* b) noexcept {
        for (; a != a_end; ++a, ++b) {
          if (*a != *b) return false;
        }
        return true;
      }
      
      template <typename T>
      std::vector<T> getVectorOfRandomNumbers(size_t count) {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                             std::numeric_limits<T>::max());
        std::vector<T> v;
        v.reserve(count);
        std::generate_n(std::back_inserter(v), count,
                        [&dis, &gen]() { return dis(gen); });
        assert(v.size() == count);
        return v;
      }
      
      struct Identical {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto Tmp = getVectorOfRandomNumbers<T>(count);
          return std::make_pair(Tmp, std::move(Tmp));
        }
      };
      
      struct InequalHalfway {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto V0 = getVectorOfRandomNumbers<T>(count);
          auto V1 = V0;
          V1[V1.size() / size_t(2)]++;  // just change the value.
          return std::make_pair(std::move(V0), std::move(V1));
        }
      };
      
      template <class T, class Gen>
      void BM_bcmp(benchmark::State& state) {
        const size_t Length = state.range(0);
      
        const std::pair<std::vector<T>, std::vector<T>> Data =
            Gen::template Gen<T>(Length);
        const std::vector<T>& a = Data.first;
        const std::vector<T>& b = Data.second;
        assert(a.size() == Length && b.size() == a.size());
      
        benchmark::ClobberMemory();
        benchmark::DoNotOptimize(a);
        benchmark::DoNotOptimize(a.data());
        benchmark::DoNotOptimize(b);
        benchmark::DoNotOptimize(b.data());
      
        for (auto _ : state) {
          const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
          benchmark::DoNotOptimize(is_equal);
        }
        state.SetComplexityN(Length);
        state.counters["eltcnt"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
        state.counters["eltcnt/sec"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
        const size_t BytesRead = 2 * sizeof(T) * Length;
        state.counters["bytes_read/iteration"] =
            benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                               benchmark::Counter::OneK::kIs1024);
        state.counters["bytes_read/sec"] = benchmark::Counter(
            BytesRead, benchmark::Counter::kIsIterationInvariantRate,
            benchmark::Counter::OneK::kIs1024);
      }
      
      template <typename T>
      static void CustomArguments(benchmark::internal::Benchmark* b) {
        const size_t L2SizeBytes = []() {
          for (const benchmark::CPUInfo::CacheInfo& I :
               benchmark::CPUInfo::Get().caches) {
            if (I.level == 2) return I.size;
          }
          return 0;
        }();
        // What is the largest range we can check to always fit within given L2 cache?
        const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                              /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
        b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
      }
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
          ->Apply(CustomArguments<uint64_t>);
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
          ->Apply(CustomArguments<uint64_t>);
      ```
      {F8768210}
      ```
      $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
      RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
      2019-04-25 21:17:11
      Running build-old/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 0.65, 3.90, 4.14
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
      BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
      BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
      BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
      BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
      RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
      2019-04-25 21:19:29
      Running build-new/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 1.01, 2.85, 3.71
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
      BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
      BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
      BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
      BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
      Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
      Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
      ---------------------------------------------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
      <...>
      BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
      <...>
      BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
      <...>
      BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
      ```
      
      What can we tell from the benchmark?
      * Performance of naive equality check somewhat improves with element size,
        maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
        for uint64_t. I think, that instability implies performance problems.
      * Performance of `memcmp()`-aware benchmark always maxes out at around
        bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
        naive variant!
      * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
        eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
        linearly decreases with element size.
        For uint64_t, it's ~4x+ the elements/second.
      * The call obvious is more pricey than the loop, with small element count.
        As it can be seen from the full output {F8768210}, the `memcmp()` is almost
        universally worse, independent of the element size (and thus buffer size) when
        element count is less than 8.
      
      So all in all, bcmp idiom does indeed pose untapped performance headroom.
      This diff does implement said idiom recognition. I think a reasonable test
      coverage is present, but do tell if there is anything obvious missing.
      
      Now, quality. This does succeed to build and pass the test-suite, at least
      without any non-bundled elements. {F8768216} {F8768217}
      This transform fires 91 times:
      ```
      $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
      Tests: 1149
      Metric: loop-idiom.NumBCmp
      
      Program                                         result-new
      
      MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
      MultiSource/Applications/d/make_dparser         3.00
      SingleSource/UnitTests/vla                      2.00
      MultiSource/Applications/Burg/burg              1.00
      MultiSourc.../Applications/JM/lencod/lencod     1.00
      MultiSource/Applications/lemon/lemon            1.00
      MultiSource/Benchmarks/Bullet/bullet            1.00
      MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
      MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
      MultiSourc...Prolangs-C/simulator/simulator     1.00
      ```
      The size changes are:
      I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
      ```
      $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
      Tests: 1149
      Same hash: 907 (filtered out)
      Remaining: 242
      Metric: size..text
      
      Program                                        result-old result-new diff
      test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
      test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
      test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
      test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
      test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
      test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
      test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
      test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
      test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
      test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
      test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
      Geomean difference                                                   nan%
               result-old    result-new       diff
      count  1.000000e+01  10.00000      10.000000
      mean   3.152090e+05  311695.40000  0.006749
      std    3.790398e+05  372091.42232  0.036605
      min    7.530000e+02  833.00000    -0.034981
      25%    4.243300e+04  42401.00000  -0.000866
      50%    1.197370e+05  119689.00000 -0.000392
      75%    6.397050e+05  639705.00000 -0.000005
      max    1.001697e+06  966657.00000  0.106242
      ```
      
      I don't have timings though.
      
      And now to the code. The basic idea is to completely replace the whole loop.
      If we can't fully kill it, don't transform.
      I have left one or two comments in the code, so hopefully it can be understood.
      
      Also, there is a few TODO's that i have left for follow-ups:
      * widening of `memcmp()`/`bcmp()`
      * step smaller than the comparison size
      * Metadata propagation
      * more than two blocks as long as there is still a single backedge?
      * ???
      
      Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet
      
      Reviewed By: courbet
      
      Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61144
      
      llvm-svn: 370454
      5c9f3cfe
    • Roman Lebedev's avatar
      [NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers · 09e4ac1a
      Roman Lebedev authored
      Summary:
      The internal `Builder` is private, which means there is
      currently no way to set the debuginfo locations for `SCEVExpander`.
      This only adds the wrappers, but does not use them anywhere.
      
      Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson
      
      Reviewed By: sanjoy
      
      Subscribers: javed.absar, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61007
      
      llvm-svn: 370453
      09e4ac1a
    • Johan Vikstrom's avatar
      [clangd] Collecting main file macro expansion locations in ParsedAST. · 84b4c4a4
      Johan Vikstrom authored
      Summary: TokenBuffer does not collect macro expansions inside macro arguments which is needed for semantic higlighting. Therefore collects macro expansions in the main file in a PPCallback when building the ParsedAST instead.
      
      Reviewers: hokein, ilya-biryukov
      
      Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66928
      
      llvm-svn: 370452
      84b4c4a4
    • Dmitri Gribenko's avatar
      [Tooling] Migrated APIs that take ownership of objects to unique_ptr · b22804b3
      Dmitri Gribenko authored
      Subscribers: jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66960
      
      llvm-svn: 370451
      b22804b3
    • Pavel Labath's avatar
      dotest: improvements to the pexpect tests · 12a7e6c0
      Pavel Labath authored
      Summary:
      While working on r370054, i've found it frustrating that the test output
      was compeletely unhelpful in case of failures. Therefore I've decided to
      improve that. In this I reuse the PExpectTest class, which was one of
      our mechanisms for running pexpect tests, but which has gotten orhpaned
      in the mean time.
      
      I've replaced the existing send methods with a "expect" method, which
      I've tried to design so that it has a similar interface to the expect
      method in regular non-pexpect dotest tests (as it essentially does
      something very similar). I've kept the ability to dump the transcript of
      the pexpect communication to stdout in the "trace" mode, as that is a
      very handy way to figure out what the test is doing. I've also removed
      the "expect_string" method used in the existing tests -- I've found this
      to be unhelpful because it hides the message that would be normally
      displayed by the EOF exception. Although vebose, this message includes
      some important information, like what strings we were searching for,
      what were the last bits of lldb output, etc. I've also beefed up the
      class to automatically disable the debug info test duplication, and
      auto-skip tests when the host platform does not support pexpect.
      
      This patch ports TestMultilineCompletion and TestIOHandlerCompletion to
      the new class. It also deletes TestFormats as it is not testing anything
      (definitely not formats) -- it was committed with the test code
      commented out (r228207), and then the testing code was deleted in
      r356000.
      
      Reviewers: teemperor, JDevlieghere, davide
      
      Subscribers: aprantl, lldb-commits
      
      Differential Revision: https://reviews.llvm.org/D66954
      
      llvm-svn: 370449
      12a7e6c0
    • David Stenberg's avatar
      [LiveDebugValues] Insert entry values after bundles · b35d4699
      David Stenberg authored
      Summary:
      Change LiveDebugValues so that it inserts entry values after the bundle
      which contains the clobbering instruction. Previously it would insert
      the debug value after the bundle head using insertAfter(), breaking the
      bundle.
      
      Reviewers: djtodoro, NikolaPrica, aprantl, vsk
      
      Reviewed By: vsk
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #debug-info, #llvm
      
      Differential Revision: https://reviews.llvm.org/D66888
      
      llvm-svn: 370448
      b35d4699
    • Haojian Wu's avatar
      [clangd] Add .vscode-test to .gitignore. · 0491d13c
      Haojian Wu authored
      Reviewers: jvikstrom
      
      Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66949
      
      llvm-svn: 370446
      0491d13c
    • Alexander Potapenko's avatar
      [CodeGen]: fix error message for "=r" asm constraint · 57b87322
      Alexander Potapenko authored
      Summary:
      Nico Weber reported that the following code:
        char buf[9];
        asm("" : "=r" (buf));
      
      yields the "impossible constraint in asm: can't store struct into a register"
      error message, although |buf| is not a struct (see
      http://crbug.com/999160).
      
      Make the error message more generic and add a test for it.
      Also make sure other tests in x86_64-PR42672.c check for the full error
      message.
      
      Reviewers: eli.friedman, thakis
      
      Subscribers: cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D66948
      
      llvm-svn: 370444
      57b87322
    • Sven van Haastregt's avatar
      vim: add `immarg` keyword · fd66c8bf
      Sven van Haastregt authored
      The `immarg` attribute was added in r355981.
      
      llvm-svn: 370443
      fd66c8bf
Loading