Skip to content
  1. Aug 30, 2019
    • Puyan Lotfi's avatar
      [llvm-ifs][IFS] llvm Interface Stubs merging + object file generation tool. · d719c506
      Puyan Lotfi authored
      This tool merges interface stub files to produce a merged interface stub file
      or a stub library. Currently it for stub library generation it can produce an
      ELF .so stub file, or a TBD file (experimental). It will be used by the clang
      -emit-interface-stubs compilation pipeline to merge and assemble the per-CU
      stub files into a stub library.
      
      The new IFS format is as follows:
      
      --- !experimental-ifs-v1
      IfsVersion:      1.0
      Triple:          <llvm triple>
      ObjectFileFormat: <ELF | TBD>
      Symbols:
        _ZSymbolName: { Type: <type>, etc... }
      ...
      
      Differential Revision: https://reviews.llvm.org/D66405
      
      llvm-svn: 370499
      d719c506
    • Simon Pilgrim's avatar
      [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI. · 3be7081a
      Simon Pilgrim authored
      SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent.
      
      llvm-svn: 370498
      3be7081a
    • Simon Pilgrim's avatar
      [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit... · 2d1e0899
      Simon Pilgrim authored
      [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.
      
      Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.
      
      Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests
      
      llvm-svn: 370497
      2d1e0899
    • Matt Arsenault's avatar
      GlobalISel: Fix missing pass dependency · 466ec2d5
      Matt Arsenault authored
      llvm-svn: 370496
      466ec2d5
    • Craig Topper's avatar
      [X86] Pass v32i16/v64i8 in zmm registers on KNL target. · 18e8d02e
      Craig Topper authored
      gcc and icc pass these types in zmm registers in zmm registers.
      
      This patch implements a quick hack to override the register
      type before calling convention handling to one that is legal.
      Longer term we might want to do something similar to 256-bit
      integer registers on AVX1 where we just split all the operations.
      
      Fixes PR42957
      
      Differential Revision: https://reviews.llvm.org/D66708
      
      llvm-svn: 370495
      18e8d02e
    • Craig Topper's avatar
      [ValueTypes] Add v16f16 and v32f16 to EVT::getEVTString and Tablegen's getEnumName · 30ddd2ab
      Craig Topper authored
      Missed these when I hadded the enum entries
      
      llvm-svn: 370494
      30ddd2ab
    • Nico Weber's avatar
      gn build: Merge r370490 · 9976a5bc
      Nico Weber authored
      llvm-svn: 370492
      9976a5bc
    • Evgeniy Stepanov's avatar
      MemTag: unchecked load/store optimization. · 04647f5e
      Evgeniy Stepanov authored
      Summary:
      MTE allows memory access to bypass tag check iff the address argument
      is [SP, #imm]. This change takes advantage of this to demote uses of
      tagged addresses to regular FrameIndex operands, reducing register
      pressure in large functions.
      
      MO_TAGGED target flag is used to signal that the FrameIndex operand
      refers to memory that might be tagged, and needs to be handled with
      care. Such operand must be lowered to [SP, #imm] directly, without a
      scratch register.
      
      The transformation pass attempts to predict when the offset will be
      out of range and disable the optimization.
      AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
      this prediction has been wrong, but it is quite inefficient and should
      be avoided.
      
      Reviewers: pcc, vitalybuka, ostannard
      
      Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66457
      
      llvm-svn: 370490
      04647f5e
    • Simon Pilgrim's avatar
      ab8cb1a3
    • Whitney Tsang's avatar
      [INSTRUCTIONS] Add support of const for getLoadStorePointerOperand() and · b8a35649
      Whitney Tsang authored
      getLoadStorePointerOperand().
      Reviewer: hsaito, sebpop, reames, hfinkel, mkuper, bogner, haicheng,
      arsenm, lattner, chandlerc, grosser, rengolin
      Reviewed By: reames
      Subscribers: wdng, llvm-commits, bmahjour
      Tag: LLVM
      Differential Revision: https://reviews.llvm.org/D66595
      
      llvm-svn: 370486
      b8a35649
    • Johannes Doerfert's avatar
      [Attributor] Fix: do not pretend to preserve the CFG · 659a8707
      Johannes Doerfert authored
      llvm-svn: 370485
      659a8707
    • Craig Topper's avatar
      [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site. · 66f03ba1
      Craig Topper authored
      I'm looking at unfolding broadcast loads on AVX512 which will
      require refactoring this code to select broadcast opcodes instead
      of regular load/stores in some cases. Merging them to avoid
      further complicating their interfaces.
      
      llvm-svn: 370484
      66f03ba1
    • Johannes Doerfert's avatar
      [Attributor] Use existing function information for the call site · 3fac668d
      Johannes Doerfert authored
      Summary:
      Instead of recomputing information for call sites we now use the
      function information directly. This is always valid and once we have
      call site specific information we can improve here.
      
      This patch also bootstraps attributes that are created on-demand through
      an initial update call. Information that is known will then directly be
      available in the new attribute without causing an iteration delay.
      
      The tests show how this improves the iteration count.
      
      Reviewers: sstefan1, uenoku
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66781
      
      llvm-svn: 370480
      3fac668d
    • Johannes Doerfert's avatar
      [Attributor] Manifest load/store alignment generally · 81df452d
      Johannes Doerfert authored
      Summary:
      Any pointer could have load/store users not only floating ones so we
      move the manifest logic for alignment into the AAAlignImpl class.
      
      Reviewers: uenoku, sstefan1
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66922
      
      llvm-svn: 370479
      81df452d
    • Simon Pilgrim's avatar
      [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. · c2fed1dc
      Simon Pilgrim authored
      llvm-svn: 370478
      c2fed1dc
    • Piotr Sobczak's avatar
      [InstCombine][AMDGPU] Simplify tbuffer loads · 67b97946
      Piotr Sobczak authored
      Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.
      
      Reviewers: arsenm, nhaehnle
      
      Reviewed By: arsenm
      
      Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66926
      
      llvm-svn: 370475
      67b97946
    • Sid Manning's avatar
      [llvm-nm] Small fix to Exected<StringRef> · aa0e8f96
      Sid Manning authored
      Differential Revision: https://reviews.llvm.org/D66976
      
      llvm-svn: 370474
      aa0e8f96
    • George Rimar's avatar
      [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther". · 4e71702c
      George Rimar authored
      Currenly we can encode the 'st_other' field of symbol using 3 fields.
      'Visibility' is used to encode STV_* values.
      'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
      'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.
      
      'st_other' field is used to encode symbol visibility and platform-dependent
      flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
      which are enumeration values and the Other part, which is different and inconsistent.
      
      For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
      (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
      STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)
      
      And for PPC64 the Other part might actually encode any value.
      
      This patch implements custom logic for handling the st_other and removes
      'Visibility' and 'StOther' fields.
      
      Here is an example of a new YAML style this patch allows:
      
      - Name:  foo
        Other: [ 0x4 ]
      - Name:  bar
        Other: [ STV_PROTECTED, 4 ]
      - Name:  zed
        Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]
      
      Differential revision: https://reviews.llvm.org/D66886
      
      llvm-svn: 370472
      4e71702c
    • Simon Pilgrim's avatar
      [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. · 33676696
      Simon Pilgrim authored
      llvm-svn: 370471
      33676696
    • Simon Pilgrim's avatar
      [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types. · 8e1989e7
      Simon Pilgrim authored
      This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.
      
      llvm-svn: 370468
      8e1989e7
    • Simon Atanasyan's avatar
      [mips] Merge common checkings under the same check prefix. NFC · 68f73bf2
      Simon Atanasyan authored
      llvm-svn: 370467
      68f73bf2
    • Luis Marques's avatar
      [RISCV] Fix a couple of tests' CHECKs · c2b3d527
      Luis Marques authored
      llvm-svn: 370466
      c2b3d527
    • Haojian Wu's avatar
      Remove an extra ";", NFC. · ed170c9b
      Haojian Wu authored
      llvm-svn: 370465
      ed170c9b
    • Amaury Sechet's avatar
      [X86] Add tests for rotate matching. NFC · 485760f4
      Amaury Sechet authored
      llvm-svn: 370464
      485760f4
    • Bjorn Pettersson's avatar
      [CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC · 22714592
      Bjorn Pettersson authored
      Summary:
      Found a couple of places in the code where all the PHI nodes
      of a MBB is updated, replacing references to one MBB by
      reference to another MBB instead.
      
      This patch simply refactors the code to use a common helper
      (MachineBasicBlock::replacePhiUsesWith) for such PHI node
      updates.
      
      Reviewers: t.p.northover, arsenm, uabelho
      
      Subscribers: wdng, hiraditya, jsji, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66750
      
      llvm-svn: 370463
      22714592
    • Simon Pilgrim's avatar
      [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros · 7cbf823f
      Simon Pilgrim authored
      Return a proper zero vector, just in case some elements are undef.
      
      Noticed by inspection after dealing with a similar issue in PR43159.
      
      llvm-svn: 370460
      7cbf823f
    • Simon Pilgrim's avatar
      Fix Wdocumentation warning. NFCI. · 01a3c25c
      Simon Pilgrim authored
      llvm-svn: 370459
      01a3c25c
    • Chris Jackson's avatar
      [llvm-objcopy] Allow the visibility of symbols created by --binary and · fa1fe937
      Chris Jackson authored
      --add-symbol to be specified with --new-symbol-visibility
      
      llvm-svn: 370458
      fa1fe937
    • Hideto Ueno's avatar
      [Attributor] Implement AANoAliasCallSiteArgument initialization · 6381b143
      Hideto Ueno authored
      Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.
      
      Reviewers: jdoerfert, sstefan1
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66927
      
      llvm-svn: 370456
      6381b143
    • Roman Lebedev's avatar
      [LoopIdiomRecognize] BCmp loop idiom recognition · 5c9f3cfe
      Roman Lebedev authored
      Summary:
      @mclow.lists brought up this issue up in IRC.
      It is a reasonably common problem to compare some two values for equality.
      Those may be just some integers, strings or arrays of integers.
      
      In C, there is `memcmp()`, `bcmp()` functions.
      In C++, there exists `std::equal()` algorithm.
      One can also write that function manually.
      
      libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
      various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ
      
      libc++ does not do anything like that, it simply relies on simple C++'s
      `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)
      
      So likely, there exists a certain performance opportunities.
      Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
      is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}
      
      ```
      #include <algorithm>
      #include <cmath>
      #include <cstdint>
      #include <iterator>
      #include <limits>
      #include <random>
      #include <type_traits>
      #include <utility>
      #include <vector>
      
      #include "benchmark/benchmark.h"
      
      template <class T>
      bool equal(T* a, T* a_end, T* b) noexcept {
        for (; a != a_end; ++a, ++b) {
          if (*a != *b) return false;
        }
        return true;
      }
      
      template <typename T>
      std::vector<T> getVectorOfRandomNumbers(size_t count) {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                             std::numeric_limits<T>::max());
        std::vector<T> v;
        v.reserve(count);
        std::generate_n(std::back_inserter(v), count,
                        [&dis, &gen]() { return dis(gen); });
        assert(v.size() == count);
        return v;
      }
      
      struct Identical {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto Tmp = getVectorOfRandomNumbers<T>(count);
          return std::make_pair(Tmp, std::move(Tmp));
        }
      };
      
      struct InequalHalfway {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto V0 = getVectorOfRandomNumbers<T>(count);
          auto V1 = V0;
          V1[V1.size() / size_t(2)]++;  // just change the value.
          return std::make_pair(std::move(V0), std::move(V1));
        }
      };
      
      template <class T, class Gen>
      void BM_bcmp(benchmark::State& state) {
        const size_t Length = state.range(0);
      
        const std::pair<std::vector<T>, std::vector<T>> Data =
            Gen::template Gen<T>(Length);
        const std::vector<T>& a = Data.first;
        const std::vector<T>& b = Data.second;
        assert(a.size() == Length && b.size() == a.size());
      
        benchmark::ClobberMemory();
        benchmark::DoNotOptimize(a);
        benchmark::DoNotOptimize(a.data());
        benchmark::DoNotOptimize(b);
        benchmark::DoNotOptimize(b.data());
      
        for (auto _ : state) {
          const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
          benchmark::DoNotOptimize(is_equal);
        }
        state.SetComplexityN(Length);
        state.counters["eltcnt"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
        state.counters["eltcnt/sec"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
        const size_t BytesRead = 2 * sizeof(T) * Length;
        state.counters["bytes_read/iteration"] =
            benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                               benchmark::Counter::OneK::kIs1024);
        state.counters["bytes_read/sec"] = benchmark::Counter(
            BytesRead, benchmark::Counter::kIsIterationInvariantRate,
            benchmark::Counter::OneK::kIs1024);
      }
      
      template <typename T>
      static void CustomArguments(benchmark::internal::Benchmark* b) {
        const size_t L2SizeBytes = []() {
          for (const benchmark::CPUInfo::CacheInfo& I :
               benchmark::CPUInfo::Get().caches) {
            if (I.level == 2) return I.size;
          }
          return 0;
        }();
        // What is the largest range we can check to always fit within given L2 cache?
        const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                              /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
        b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
      }
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
          ->Apply(CustomArguments<uint64_t>);
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
          ->Apply(CustomArguments<uint64_t>);
      ```
      {F8768210}
      ```
      $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
      RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
      2019-04-25 21:17:11
      Running build-old/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 0.65, 3.90, 4.14
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
      BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
      BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
      BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
      BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
      RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
      2019-04-25 21:19:29
      Running build-new/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 1.01, 2.85, 3.71
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
      BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
      BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
      BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
      BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
      Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
      Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
      ---------------------------------------------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
      <...>
      BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
      <...>
      BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
      <...>
      BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
      ```
      
      What can we tell from the benchmark?
      * Performance of naive equality check somewhat improves with element size,
        maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
        for uint64_t. I think, that instability implies performance problems.
      * Performance of `memcmp()`-aware benchmark always maxes out at around
        bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
        naive variant!
      * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
        eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
        linearly decreases with element size.
        For uint64_t, it's ~4x+ the elements/second.
      * The call obvious is more pricey than the loop, with small element count.
        As it can be seen from the full output {F8768210}, the `memcmp()` is almost
        universally worse, independent of the element size (and thus buffer size) when
        element count is less than 8.
      
      So all in all, bcmp idiom does indeed pose untapped performance headroom.
      This diff does implement said idiom recognition. I think a reasonable test
      coverage is present, but do tell if there is anything obvious missing.
      
      Now, quality. This does succeed to build and pass the test-suite, at least
      without any non-bundled elements. {F8768216} {F8768217}
      This transform fires 91 times:
      ```
      $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
      Tests: 1149
      Metric: loop-idiom.NumBCmp
      
      Program                                         result-new
      
      MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
      MultiSource/Applications/d/make_dparser         3.00
      SingleSource/UnitTests/vla                      2.00
      MultiSource/Applications/Burg/burg              1.00
      MultiSourc.../Applications/JM/lencod/lencod     1.00
      MultiSource/Applications/lemon/lemon            1.00
      MultiSource/Benchmarks/Bullet/bullet            1.00
      MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
      MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
      MultiSourc...Prolangs-C/simulator/simulator     1.00
      ```
      The size changes are:
      I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
      ```
      $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
      Tests: 1149
      Same hash: 907 (filtered out)
      Remaining: 242
      Metric: size..text
      
      Program                                        result-old result-new diff
      test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
      test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
      test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
      test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
      test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
      test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
      test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
      test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
      test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
      test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
      test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
      Geomean difference                                                   nan%
               result-old    result-new       diff
      count  1.000000e+01  10.00000      10.000000
      mean   3.152090e+05  311695.40000  0.006749
      std    3.790398e+05  372091.42232  0.036605
      min    7.530000e+02  833.00000    -0.034981
      25%    4.243300e+04  42401.00000  -0.000866
      50%    1.197370e+05  119689.00000 -0.000392
      75%    6.397050e+05  639705.00000 -0.000005
      max    1.001697e+06  966657.00000  0.106242
      ```
      
      I don't have timings though.
      
      And now to the code. The basic idea is to completely replace the whole loop.
      If we can't fully kill it, don't transform.
      I have left one or two comments in the code, so hopefully it can be understood.
      
      Also, there is a few TODO's that i have left for follow-ups:
      * widening of `memcmp()`/`bcmp()`
      * step smaller than the comparison size
      * Metadata propagation
      * more than two blocks as long as there is still a single backedge?
      * ???
      
      Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet
      
      Reviewed By: courbet
      
      Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61144
      
      llvm-svn: 370454
      5c9f3cfe
    • Roman Lebedev's avatar
      [NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers · 09e4ac1a
      Roman Lebedev authored
      Summary:
      The internal `Builder` is private, which means there is
      currently no way to set the debuginfo locations for `SCEVExpander`.
      This only adds the wrappers, but does not use them anywhere.
      
      Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson
      
      Reviewed By: sanjoy
      
      Subscribers: javed.absar, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61007
      
      llvm-svn: 370453
      09e4ac1a
    • David Stenberg's avatar
      [LiveDebugValues] Insert entry values after bundles · b35d4699
      David Stenberg authored
      Summary:
      Change LiveDebugValues so that it inserts entry values after the bundle
      which contains the clobbering instruction. Previously it would insert
      the debug value after the bundle head using insertAfter(), breaking the
      bundle.
      
      Reviewers: djtodoro, NikolaPrica, aprantl, vsk
      
      Reviewed By: vsk
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #debug-info, #llvm
      
      Differential Revision: https://reviews.llvm.org/D66888
      
      llvm-svn: 370448
      b35d4699
    • Sven van Haastregt's avatar
      vim: add `immarg` keyword · fd66c8bf
      Sven van Haastregt authored
      The `immarg` attribute was added in r355981.
      
      llvm-svn: 370443
      fd66c8bf
    • Nico Weber's avatar
      gn build: Merge r370441 · 629f9215
      Nico Weber authored
      llvm-svn: 370442
      629f9215
    • Dmitri Gribenko's avatar
      [ADT] Removed VariadicFunction · 4fc0d3bd
      Dmitri Gribenko authored
      Summary:
      It is not used. It uses macro-based unrolling instead of variadic
      templates, so it is not idiomatic anymore, and therefore it is a
      questionable API to keep "just in case".
      
      Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66961
      
      llvm-svn: 370441
      4fc0d3bd
    • Martin Storsjö's avatar
      [LLD] [COFF] Support merging resource object files · 3d3a9b3b
      Martin Storsjö authored
      Extend WindowsResourceParser to support using a ResourceSectionRef for
      loading resources from an object file.
      
      Only allow merging resource object files in mingw mode; keep the
      existing error on multiple resource objects in link mode.
      
      If there only is one resource object file and no .res resources,
      don't parse and recreate the .rsrc section, but just link it in without
      inspecting it. This allows users to produce any .rsrc section (outside
      of what the parser supports), just like before. (I don't have a specific
      need for this, but it reduces the risk of this new feature.)
      
      Separate out the .rsrc section chunks in InputFiles.cpp, and only include
      them in the list of section chunks to link if we've determined that there
      only was one single resource object. (We need to keep other chunks from
      those object files, as they can legitimately contain other sections as
      well, in addition to .rsrc section chunks.)
      
      Differential Revision: https://reviews.llvm.org/D66824
      
      llvm-svn: 370436
      3d3a9b3b
    • Martin Storsjö's avatar
      [WindowsResource] Remove use of global variables in WindowsResourceParser · d8d63ff2
      Martin Storsjö authored
      Instead of updating a global variable counter for the next index of
      strings and data blobs, pass along a reference to actual data/string
      vectors and let the TreeNode insertion methods add their data/strings to
      the vectors when a new entry is needed.
      
      Additionally, if the resource tree had duplicates, that were ignored
      with -force:multipleres in lld, we no longer store all versions of the
      duplicated resource data, now we only keep the one that actually ends
      up referenced.
      
      Differential Revision: https://reviews.llvm.org/D66823
      
      llvm-svn: 370435
      d8d63ff2
    • Martin Storsjö's avatar
      e62d5682
    • Martin Storsjö's avatar
      [COFF] Add a ResourceSectionRef method for getting resource contents · 94382217
      Martin Storsjö authored
      This allows llvm-readobj to print the contents of each resource
      when printing resources from an object file or executable, like it
      already does for plain .res files.
      
      This requires providing the whole COFFObjectFile to ResourceSectionRef.
      
      This supports both object files and executables. For executables,
      the DataRVA field is used as is to look up the right section.
      
      For object files, ideally we would need to complete linking of them
      and fix up all relocations to know what the DataRVA field would end up
      being. In practice, the only thing that makes sense for an RVA field
      is an ADDR32NB relocation. Thus, find a relocation pointing at this
      field, verify that it has the expected type, locate the symbol it
      points at, look up the section the symbol points at, and read from the
      right offset in that section.
      
      This works both for GNU windres object files (which use one single
      .rsrc section, with all relocations against the base of the .rsrc
      section, with the original value of the DataRVA field being the
      offset of the data from the beginning of the .rsrc section) and
      cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
      and one symbol per data entry, with the original pre-relocated DataRVA
      field being set to zero).
      
      Differential Revision: https://reviews.llvm.org/D66820
      
      llvm-svn: 370433
      94382217
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Lower uitofp · e96892a8
      Petar Avramovic authored
      Add custom lowering for G_UITOFP for MIPS32.
      
      Differential Revision: https://reviews.llvm.org/D66930
      
      llvm-svn: 370432
      e96892a8
Loading