Skip to content
  1. Dec 17, 2020
    • dfukalov's avatar
      [NFC] Reduce include files dependency and AA header cleanup (part 2). · 9ed8e0ca
      dfukalov authored
      Continuing work started in https://reviews.llvm.org/D92489:
      
      Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".
      
      Reviewed By: RKSimon
      
      Differential Revision: https://reviews.llvm.org/D92852
      9ed8e0ca
    • Simon Pilgrim's avatar
      [X86] Remove extract_subvector(subv_broadcast_load()) fold. · 931e66bd
      Simon Pilgrim authored
      This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.
      931e66bd
    • Krasimir Georgiev's avatar
      e71a4cc2
    • Barry Revzin's avatar
      Make LLVM build in C++20 mode · 92310454
      Barry Revzin authored
      Part of the <=> changes in C++20 make certain patterns of writing equality
      operators ambiguous with themselves (sorry!).
      This patch goes through and adjusts all the comparison operators such that
      they should work in both C++17 and C++20 modes. It also makes two other small
      C++20-specific changes (adding a constructor to a type that cases to be an
      aggregate, and adding casts from u8 literals which no longer have type
      const char*).
      
      There were four categories of errors that this review fixes.
      Here are canonical examples of them, ordered from most to least common:
      
      // 1) Missing const
      namespace missing_const {
          struct A {
          #ifndef FIXED
              bool operator==(A const&);
          #else
              bool operator==(A const&) const;
          #endif
          };
      
          bool a = A{} == A{}; // error
      }
      
      // 2) Type mismatch on CRTP
      namespace crtp_mismatch {
          template <typename Derived>
          struct Base {
          #ifndef FIXED
              bool operator==(Derived const&) const;
          #else
              // in one case changed to taking Base const&
              friend bool operator==(Derived const&, Derived const&);
          #endif
          };
      
          struct D : Base<D> { };
      
          bool b = D{} == D{}; // error
      }
      
      // 3) iterator/const_iterator with only mixed comparison
      namespace iter_const_iter {
          template <bool Const>
          struct iterator {
              using const_iterator = iterator<true>;
      
              iterator();
      
              template <bool B, std::enable_if_t<(Const && !B), int> = 0>
              iterator(iterator<B> const&);
      
          #ifndef FIXED
              bool operator==(const_iterator const&) const;
          #else
              friend bool operator==(iterator const&, iterator const&);
          #endif
          };
      
          bool c = iterator<false>{} == iterator<false>{} // error
                || iterator<false>{} == iterator<true>{}
                || iterator<true>{} == iterator<false>{}
                || iterator<true>{} == iterator<true>{};
      }
      
      // 4) Same-type comparison but only have mixed-type operator
      namespace ambiguous_choice {
          enum Color { Red };
      
          struct C {
              C();
              C(Color);
              operator Color() const;
              bool operator==(Color) const;
              friend bool operator==(C, C);
          };
      
          bool c = C{} == C{}; // error
          bool d = C{} == Red;
      }
      
      Differential revision: https://reviews.llvm.org/D78938
      92310454
    • Simon Pilgrim's avatar
      [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969) · cdb692ee
      Simon Pilgrim authored
      Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.
      
      This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.
      
      As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.
      
      Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.
      
      Differential Revision: https://reviews.llvm.org/D92645
      cdb692ee
    • Florian Hahn's avatar
      [InstCombine] Preserve !annotation for newly created instructions. · eba09a2d
      Florian Hahn authored
      When replacing an instruction with !annotation with a newly created
      replacement, add the !annotation metadata to the replacement.
      
      This mostly covers cases where the new instructions are created using
      the ::Create helpers. Instructions created by IRBuilder will be handled
      by D91444.
      
      Reviewed By: thegameg
      
      Differential Revision: https://reviews.llvm.org/D93399
      eba09a2d
    • QingShan Zhang's avatar
      Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 · ebdd20f4
      QingShan Zhang authored
      X86 and AArch64 expand it as libcall inside the target. And PowerPC also
      want to expand them as libcall for P8. So, propose an implement in the
      legalizer to common the logic and remove the code for X86/AArch64 to
      avoid the duplicate code.
      
      Reviewed By: Craig Topper
      
      Differential Revision: https://reviews.llvm.org/D91331
      ebdd20f4
    • Fangrui Song's avatar
      Use basic_string::find(char) instead of basic_string::find(const char *s, size_type pos=0) · c70f3686
      Fangrui Song authored
      Many (StringRef) cannot be detected by clang-tidy performance-faster-string-find.
      c70f3686
    • Wei Mi's avatar
      [NFC][SampleFDO] Preparation to support multiple sections with the same type · a906e3ec
      Wei Mi authored
      in ExtBinary format.
      
      Currently ExtBinary format doesn't support multiple sections with the same type
      in the profile. We add the support in this patch. Previously we use the section
      type to identify a section uniquely. Now we introduces a LayoutIndex in the
      SecHdrTableEntry and use the LayoutIndex to locate the target section. The
      allocations of NameTable and FuncOffsetTable are adjusted accordingly.
      
      Currently it works as a NFC because it won't change anything for current layout.
      The test for multiple sections support will be included in another patch where a
      new type of profile containing multiple sections with the same type is
      introduced.
      
      Differential Revision: https://reviews.llvm.org/D93254
      a906e3ec
    • Xiang1 Zhang's avatar
      [Debugify] Support checking Machine IR debug info · 39584ae5
      Xiang1 Zhang authored
      Add mir-check-debug pass to check MIR-level debug info.
      
      For IR-level, currently, LLVM have debugify + check-debugify to generate
      and check debug IR. Much like the IR-level pass debugify, mir-debugify
      inserts sequentially increasing line locations to each MachineInstr in a
      Module, But there is no equivalent MIR-level check-debugify pass, So now
      we support it at "mir-check-debug".
      
      Reviewed By: djtodoro
      
      Differential Revision: https://reviews.llvm.org/D91595
      39584ae5
    • Kazu Hirata's avatar
      [GCN] Remove unused function handleNewInstruction (NFC) · 4ad5b634
      Kazu Hirata authored
      The function was added without a user on Dec 22, 2016 in commit
      7e274e02.  It seems to be unused since
      then.
      4ad5b634
    • Kazu Hirata's avatar
      [IR, CodeGen] Use llvm::is_contained (NFC) · 5501b929
      Kazu Hirata authored
      5501b929
    • Xiang1 Zhang's avatar
      Revert "[Debugify] Support checking Machine IR debug info" · 1e42ad9d
      Xiang1 Zhang authored
      This reverts commit 50aaa8c2.
      1e42ad9d
    • Hsiangkai Wang's avatar
      [RISCV] Define vector widening mul intrinsics. · a5e4a513
      Hsiangkai Wang authored
      
      
      Define vector widening mul intrinsics and lower them to V instructions.
      
      We work with @rogfer01 from BSC to come out this patch.
      
      Authored-by: default avatarRoger Ferrer Ibanez <rofirrim@gmail.com>
      Co-Authored-by: default avatarHsiangkai Wang <kai.wang@sifive.com>
      
      Differential Revision: https://reviews.llvm.org/D93381
      a5e4a513
    • Hsiangkai Wang's avatar
      [RISCV] Define vector mul/div/rem intrinsics. · dd5281e7
      Hsiangkai Wang authored
      
      
      Define vector mul/div/rem intrinsics and lower them to V instructions.
      
      We work with @rogfer01 from BSC to come out this patch.
      
      Authored-by: default avatarRoger Ferrer Ibanez <rofirrim@gmail.com>
      Co-Authored-by: default avatarHsiangkai Wang <kai.wang@sifive.com>
      
      Differential Revision: https://reviews.llvm.org/D93380
      dd5281e7
    • Hsiangkai Wang's avatar
      [RISCV] V does not imply F. · f03609b5
      Hsiangkai Wang authored
      If users want to use vector floating point instructions, they need to
      specify 'F' extension additionally.
      
      Differential Revision: https://reviews.llvm.org/D93282
      f03609b5
    • Matt Arsenault's avatar
      AMDGPU: Remove SGPRSpillVGPRDefinedSet hack · f3337367
      Matt Arsenault authored
      These VGPRs should be reserved and therefore do not need "correct"
      liveness. They should not have undef uses, which can still cause
      issues.
      f3337367
    • Zakk Chen's avatar
      [RISCV] Define vle/vse intrinsics. · c1d6d461
      Zakk Chen authored
      
      
      Define vle/vse intrinsics and lower to V instructions.
      
      We work with @rogfer01 from BSC to come out this patch.
      
      Authored-by: default avatarRoger Ferrer Ibanez <rofirrim@gmail.com>
      Co-Authored-by: default avatarZakk Chen <zakk.chen@sifive.com>
      
      Reviewed By: craig.topper
      
      Differential Revision: https://reviews.llvm.org/D93359
      c1d6d461
    • Xiang1 Zhang's avatar
      [Debugify] Support checking Machine IR debug info · 50aaa8c2
      Xiang1 Zhang authored
      Add mir-check-debug pass to check MIR-level debug info.
      
      For IR-level, currently, LLVM have debugify + check-debugify to generate
      and check debug IR. Much like the IR-level pass debugify, mir-debugify
      inserts sequentially increasing line locations to each MachineInstr in a
      Module, But there is no equivalent MIR-level check-debugify pass, So now
      we support it at "mir-check-debug".
      
      Reviewed By: djtodoro
      
      Differential Revision: https://reviews.llvm.org/D91595
      50aaa8c2
    • Hongtao Yu's avatar
      [CSSPGO] Consume pseudo-probe-based AutoFDO profile · ac068e01
      Hongtao Yu authored
      This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
      
      Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
      
      **Adjustment to profile format **
      
      A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
      ```
      !CFGChecksum: 12345
      ```
      is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
      
      Differential Revision: https://reviews.llvm.org/D92347
      ac068e01
    • Guozhi Wei's avatar
      [MBP] Add whole chain to BlockFilterSet instead of individual BB · 687e80be
      Guozhi Wei authored
      Currently we add individual BB to BlockFilterSet if its frequency satisfies
      
        LoopFreq / Freq <= LoopToColdBlockRatio
      
      LoopFreq is edge frequency from outside to loop header.
      LoopToColdBlockRatio is a command line parameter.
      
      It doesn't make sense since we always layout whole chain, not individual BBs.
      
      It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
      of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
      BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
      like .cold in the test case. So it is added to the chain of inner loop. When
      work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
      successor .problem is not counted in UnscheduledPredecessors of .problem chain.
      But other blocks in the inner loop are added BlockFilterSet, so the whole inner
      loop chain can be layout, and markChainSuccessors is called to decrease
      UnscheduledPredecessors of following chains. markChainSuccessors calls
      markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
      so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
      counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
      becomes 0 when it still has an unscheduled predecessor .pred! And it causes
      problems in following various successor BB selection algorithms.
      
      Differential Revision: https://reviews.llvm.org/D89088
      687e80be
    • alex-t's avatar
      Disable Jump Threading for the targets with divergent control flow · 35ec3ff7
      alex-t authored
      Details: Jump Threading does not make sense for the targets with divergent CF
               since they do not use branch prediction for speculative execution.
               Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
               This may cause errors in further CF lowering.
      
      Reviewed By: rampitec
      
      Differential Revision: https://reviews.llvm.org/D93302
      35ec3ff7
  2. Dec 16, 2020
Loading