Skip to content
  1. Aug 30, 2019
    • Craig Topper's avatar
      [X86] Pass v32i16/v64i8 in zmm registers on KNL target. · 18e8d02e
      Craig Topper authored
      gcc and icc pass these types in zmm registers in zmm registers.
      
      This patch implements a quick hack to override the register
      type before calling convention handling to one that is legal.
      Longer term we might want to do something similar to 256-bit
      integer registers on AVX1 where we just split all the operations.
      
      Fixes PR42957
      
      Differential Revision: https://reviews.llvm.org/D66708
      
      llvm-svn: 370495
      18e8d02e
    • Evgeniy Stepanov's avatar
      MemTag: unchecked load/store optimization. · 04647f5e
      Evgeniy Stepanov authored
      Summary:
      MTE allows memory access to bypass tag check iff the address argument
      is [SP, #imm]. This change takes advantage of this to demote uses of
      tagged addresses to regular FrameIndex operands, reducing register
      pressure in large functions.
      
      MO_TAGGED target flag is used to signal that the FrameIndex operand
      refers to memory that might be tagged, and needs to be handled with
      care. Such operand must be lowered to [SP, #imm] directly, without a
      scratch register.
      
      The transformation pass attempts to predict when the offset will be
      out of range and disable the optimization.
      AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
      this prediction has been wrong, but it is quite inefficient and should
      be avoided.
      
      Reviewers: pcc, vitalybuka, ostannard
      
      Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66457
      
      llvm-svn: 370490
      04647f5e
    • Johannes Doerfert's avatar
      [Attributor] Use existing function information for the call site · 3fac668d
      Johannes Doerfert authored
      Summary:
      Instead of recomputing information for call sites we now use the
      function information directly. This is always valid and once we have
      call site specific information we can improve here.
      
      This patch also bootstraps attributes that are created on-demand through
      an initial update call. Information that is known will then directly be
      available in the new attribute without causing an iteration delay.
      
      The tests show how this improves the iteration count.
      
      Reviewers: sstefan1, uenoku
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66781
      
      llvm-svn: 370480
      3fac668d
    • Johannes Doerfert's avatar
      [Attributor] Manifest load/store alignment generally · 81df452d
      Johannes Doerfert authored
      Summary:
      Any pointer could have load/store users not only floating ones so we
      move the manifest logic for alignment into the AAAlignImpl class.
      
      Reviewers: uenoku, sstefan1
      
      Subscribers: hiraditya, bollu, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66922
      
      llvm-svn: 370479
      81df452d
    • Piotr Sobczak's avatar
      [InstCombine][AMDGPU] Simplify tbuffer loads · 67b97946
      Piotr Sobczak authored
      Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.
      
      Reviewers: arsenm, nhaehnle
      
      Reviewed By: arsenm
      
      Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66926
      
      llvm-svn: 370475
      67b97946
    • George Rimar's avatar
      [yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther". · 4e71702c
      George Rimar authored
      Currenly we can encode the 'st_other' field of symbol using 3 fields.
      'Visibility' is used to encode STV_* values.
      'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
      'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.
      
      'st_other' field is used to encode symbol visibility and platform-dependent
      flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
      which are enumeration values and the Other part, which is different and inconsistent.
      
      For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
      (Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
      STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)
      
      And for PPC64 the Other part might actually encode any value.
      
      This patch implements custom logic for handling the st_other and removes
      'Visibility' and 'StOther' fields.
      
      Here is an example of a new YAML style this patch allows:
      
      - Name:  foo
        Other: [ 0x4 ]
      - Name:  bar
        Other: [ STV_PROTECTED, 4 ]
      - Name:  zed
        Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]
      
      Differential revision: https://reviews.llvm.org/D66886
      
      llvm-svn: 370472
      4e71702c
    • Simon Atanasyan's avatar
      [mips] Merge common checkings under the same check prefix. NFC · 68f73bf2
      Simon Atanasyan authored
      llvm-svn: 370467
      68f73bf2
    • Luis Marques's avatar
      [RISCV] Fix a couple of tests' CHECKs · c2b3d527
      Luis Marques authored
      llvm-svn: 370466
      c2b3d527
    • Amaury Sechet's avatar
      [X86] Add tests for rotate matching. NFC · 485760f4
      Amaury Sechet authored
      llvm-svn: 370464
      485760f4
    • Chris Jackson's avatar
      [llvm-objcopy] Allow the visibility of symbols created by --binary and · fa1fe937
      Chris Jackson authored
      --add-symbol to be specified with --new-symbol-visibility
      
      llvm-svn: 370458
      fa1fe937
    • Hideto Ueno's avatar
      [Attributor] Implement AANoAliasCallSiteArgument initialization · 6381b143
      Hideto Ueno authored
      Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.
      
      Reviewers: jdoerfert, sstefan1
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D66927
      
      llvm-svn: 370456
      6381b143
    • Roman Lebedev's avatar
      [LoopIdiomRecognize] BCmp loop idiom recognition · 5c9f3cfe
      Roman Lebedev authored
      Summary:
      @mclow.lists brought up this issue up in IRC.
      It is a reasonably common problem to compare some two values for equality.
      Those may be just some integers, strings or arrays of integers.
      
      In C, there is `memcmp()`, `bcmp()` functions.
      In C++, there exists `std::equal()` algorithm.
      One can also write that function manually.
      
      libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
      various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ
      
      libc++ does not do anything like that, it simply relies on simple C++'s
      `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)
      
      So likely, there exists a certain performance opportunities.
      Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
      is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}
      
      ```
      #include <algorithm>
      #include <cmath>
      #include <cstdint>
      #include <iterator>
      #include <limits>
      #include <random>
      #include <type_traits>
      #include <utility>
      #include <vector>
      
      #include "benchmark/benchmark.h"
      
      template <class T>
      bool equal(T* a, T* a_end, T* b) noexcept {
        for (; a != a_end; ++a, ++b) {
          if (*a != *b) return false;
        }
        return true;
      }
      
      template <typename T>
      std::vector<T> getVectorOfRandomNumbers(size_t count) {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                             std::numeric_limits<T>::max());
        std::vector<T> v;
        v.reserve(count);
        std::generate_n(std::back_inserter(v), count,
                        [&dis, &gen]() { return dis(gen); });
        assert(v.size() == count);
        return v;
      }
      
      struct Identical {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto Tmp = getVectorOfRandomNumbers<T>(count);
          return std::make_pair(Tmp, std::move(Tmp));
        }
      };
      
      struct InequalHalfway {
        template <typename T>
        static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
          auto V0 = getVectorOfRandomNumbers<T>(count);
          auto V1 = V0;
          V1[V1.size() / size_t(2)]++;  // just change the value.
          return std::make_pair(std::move(V0), std::move(V1));
        }
      };
      
      template <class T, class Gen>
      void BM_bcmp(benchmark::State& state) {
        const size_t Length = state.range(0);
      
        const std::pair<std::vector<T>, std::vector<T>> Data =
            Gen::template Gen<T>(Length);
        const std::vector<T>& a = Data.first;
        const std::vector<T>& b = Data.second;
        assert(a.size() == Length && b.size() == a.size());
      
        benchmark::ClobberMemory();
        benchmark::DoNotOptimize(a);
        benchmark::DoNotOptimize(a.data());
        benchmark::DoNotOptimize(b);
        benchmark::DoNotOptimize(b.data());
      
        for (auto _ : state) {
          const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
          benchmark::DoNotOptimize(is_equal);
        }
        state.SetComplexityN(Length);
        state.counters["eltcnt"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
        state.counters["eltcnt/sec"] =
            benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
        const size_t BytesRead = 2 * sizeof(T) * Length;
        state.counters["bytes_read/iteration"] =
            benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                               benchmark::Counter::OneK::kIs1024);
        state.counters["bytes_read/sec"] = benchmark::Counter(
            BytesRead, benchmark::Counter::kIsIterationInvariantRate,
            benchmark::Counter::OneK::kIs1024);
      }
      
      template <typename T>
      static void CustomArguments(benchmark::internal::Benchmark* b) {
        const size_t L2SizeBytes = []() {
          for (const benchmark::CPUInfo::CacheInfo& I :
               benchmark::CPUInfo::Get().caches) {
            if (I.level == 2) return I.size;
          }
          return 0;
        }();
        // What is the largest range we can check to always fit within given L2 cache?
        const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                              /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
        b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
      }
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
          ->Apply(CustomArguments<uint64_t>);
      
      BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
          ->Apply(CustomArguments<uint8_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
          ->Apply(CustomArguments<uint16_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
          ->Apply(CustomArguments<uint32_t>);
      BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
          ->Apply(CustomArguments<uint64_t>);
      ```
      {F8768210}
      ```
      $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
      RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
      2019-04-25 21:17:11
      Running build-old/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 0.65, 3.90, 4.14
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
      BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
      BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
      BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
      BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
      RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
      2019-04-25 21:19:29
      Running build-new/test/llvm-bcmp-bench
      Run on (8 X 4000 MHz CPU s)
      CPU Caches:
        L1 Data 16K (x8)
        L1 Instruction 64K (x4)
        L2 Unified 2048K (x4)
        L3 Unified 8192K (x1)
      Load Average: 1.01, 2.85, 3.71
      ---------------------------------------------------------------------------------------------------
      Benchmark                                         Time             CPU   Iterations UserCounters...
      ---------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
      BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
      BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
      <...>
      BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
      BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
      BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
      <...>
      BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
      BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
      BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
      <...>
      BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
      BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
      BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
      BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
      BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
      BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
      BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
      BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
      BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
      BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
      BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
      Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
      Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
      ---------------------------------------------------------------------------------------------------------------------------------------
      <...>
      BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
      <...>
      BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
      <...>
      BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
      <...>
      BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
      <...>
      BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
      <...>
      BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
      <...>
      BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
      <...>
      BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
      ```
      
      What can we tell from the benchmark?
      * Performance of naive equality check somewhat improves with element size,
        maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
        for uint64_t. I think, that instability implies performance problems.
      * Performance of `memcmp()`-aware benchmark always maxes out at around
        bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
        naive variant!
      * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
        eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
        linearly decreases with element size.
        For uint64_t, it's ~4x+ the elements/second.
      * The call obvious is more pricey than the loop, with small element count.
        As it can be seen from the full output {F8768210}, the `memcmp()` is almost
        universally worse, independent of the element size (and thus buffer size) when
        element count is less than 8.
      
      So all in all, bcmp idiom does indeed pose untapped performance headroom.
      This diff does implement said idiom recognition. I think a reasonable test
      coverage is present, but do tell if there is anything obvious missing.
      
      Now, quality. This does succeed to build and pass the test-suite, at least
      without any non-bundled elements. {F8768216} {F8768217}
      This transform fires 91 times:
      ```
      $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
      Tests: 1149
      Metric: loop-idiom.NumBCmp
      
      Program                                         result-new
      
      MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
      MultiSource/Applications/d/make_dparser         3.00
      SingleSource/UnitTests/vla                      2.00
      MultiSource/Applications/Burg/burg              1.00
      MultiSourc.../Applications/JM/lencod/lencod     1.00
      MultiSource/Applications/lemon/lemon            1.00
      MultiSource/Benchmarks/Bullet/bullet            1.00
      MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
      MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
      MultiSourc...Prolangs-C/simulator/simulator     1.00
      ```
      The size changes are:
      I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
      ```
      $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
      Tests: 1149
      Same hash: 907 (filtered out)
      Remaining: 242
      Metric: size..text
      
      Program                                        result-old result-new diff
      test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
      test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
      test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
      test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
      test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
      test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
      test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
      test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
      test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
      test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
      test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
      test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
      test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
      Geomean difference                                                   nan%
               result-old    result-new       diff
      count  1.000000e+01  10.00000      10.000000
      mean   3.152090e+05  311695.40000  0.006749
      std    3.790398e+05  372091.42232  0.036605
      min    7.530000e+02  833.00000    -0.034981
      25%    4.243300e+04  42401.00000  -0.000866
      50%    1.197370e+05  119689.00000 -0.000392
      75%    6.397050e+05  639705.00000 -0.000005
      max    1.001697e+06  966657.00000  0.106242
      ```
      
      I don't have timings though.
      
      And now to the code. The basic idea is to completely replace the whole loop.
      If we can't fully kill it, don't transform.
      I have left one or two comments in the code, so hopefully it can be understood.
      
      Also, there is a few TODO's that i have left for follow-ups:
      * widening of `memcmp()`/`bcmp()`
      * step smaller than the comparison size
      * Metadata propagation
      * more than two blocks as long as there is still a single backedge?
      * ???
      
      Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet
      
      Reviewed By: courbet
      
      Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D61144
      
      llvm-svn: 370454
      5c9f3cfe
    • David Stenberg's avatar
      [LiveDebugValues] Insert entry values after bundles · b35d4699
      David Stenberg authored
      Summary:
      Change LiveDebugValues so that it inserts entry values after the bundle
      which contains the clobbering instruction. Previously it would insert
      the debug value after the bundle head using insertAfter(), breaking the
      bundle.
      
      Reviewers: djtodoro, NikolaPrica, aprantl, vsk
      
      Reviewed By: vsk
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #debug-info, #llvm
      
      Differential Revision: https://reviews.llvm.org/D66888
      
      llvm-svn: 370448
      b35d4699
    • Martin Storsjö's avatar
      [COFF] Add a ResourceSectionRef method for getting resource contents · 94382217
      Martin Storsjö authored
      This allows llvm-readobj to print the contents of each resource
      when printing resources from an object file or executable, like it
      already does for plain .res files.
      
      This requires providing the whole COFFObjectFile to ResourceSectionRef.
      
      This supports both object files and executables. For executables,
      the DataRVA field is used as is to look up the right section.
      
      For object files, ideally we would need to complete linking of them
      and fix up all relocations to know what the DataRVA field would end up
      being. In practice, the only thing that makes sense for an RVA field
      is an ADDR32NB relocation. Thus, find a relocation pointing at this
      field, verify that it has the expected type, locate the symbol it
      points at, look up the section the symbol points at, and read from the
      right offset in that section.
      
      This works both for GNU windres object files (which use one single
      .rsrc section, with all relocations against the base of the .rsrc
      section, with the original value of the DataRVA field being the
      offset of the data from the beginning of the .rsrc section) and
      cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
      and one symbol per data entry, with the original pre-relocated DataRVA
      field being set to zero).
      
      Differential Revision: https://reviews.llvm.org/D66820
      
      llvm-svn: 370433
      94382217
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Lower uitofp · e96892a8
      Petar Avramovic authored
      Add custom lowering for G_UITOFP for MIPS32.
      
      Differential Revision: https://reviews.llvm.org/D66930
      
      llvm-svn: 370432
      e96892a8
    • Petar Avramovic's avatar
      [MIPS GlobalISel] Lower fptoui · 6412b565
      Petar Avramovic authored
      Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
      in TargetLowering::expandFP_TO_UINT.
      Lower G_FPTOUI for MIPS32.
      
      Differential Revision: https://reviews.llvm.org/D66929
      
      llvm-svn: 370431
      6412b565
    • Dan Gohman's avatar
      [CodeGen] Fix lowering for returning the result of an extractvalue · 8cfeeaf9
      Dan Gohman authored
      When the number of return values exceeds the number of registers available,
      SelectionDAGBuilder::visitRet transforms a function's return to use a
      pointer to a buffer to hold return values. When the returned value is an
      operator such as extractvalue, the value may have a non-zero result number.
      Add that number to the indexing when obtaining the values to store.
      
      This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.
      
      Differential Revision: https://reviews.llvm.org/D66978
      
      llvm-svn: 370430
      8cfeeaf9
    • Jinsong Ji's avatar
      [PowerPC][NFC] Use -mtriple in RUN line, remove target triple in tls.ll · 54a1ad5b
      Jinsong Ji authored
      To avoid confusion, especially when -mtriple are also added for PPC32.
      
      llvm-svn: 370427
      54a1ad5b
    • Fangrui Song's avatar
      [PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO · 7704b543
      Fangrui Song authored
      Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs,
      ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use
      _LO without a paired _HA.
      
      Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and
      get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO}
      don't have good linker support:
      
      (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}.
      (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation:
      
        // a.o
        addis 3, 3, tsd_tls@got@tprel@ha
        lwz 3, tsd_tls@got@tprel@l(3)
        add 3, 3, tsd_tls@tls
        // b.o
        .section .tdata,"awT"; .globl tsd_tls; tsd_tls:
      
        // ld/ld-new a.o b.o
        internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section
      
      Reviewed By: adalava
      
      Differential Revision: https://reviews.llvm.org/D66925
      
      llvm-svn: 370426
      7704b543
    • Dan Gohman's avatar
      [WebAssembly] Make __attribute__((used)) not imply export. · da84b688
      Dan Gohman authored
      Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't
      need to imply exporting. When targeting Emscripten, have
      WASM_SYMBOL_NO_STRIP imply exporting.
      
      Differential Revision: https://reviews.llvm.org/D62542
      
      llvm-svn: 370415
      da84b688
    • Philip Reames's avatar
      [Tests] Precommit a few cases where we're missing oppurtunities for block... · 452e5647
      Philip Reames authored
      [Tests] Precommit a few cases where we're missing oppurtunities for block local simplications off assumes.
      
      llvm-svn: 370414
      452e5647
  2. Aug 29, 2019
Loading