Skip to content
  1. Dec 02, 2020
    • Hongtao Yu's avatar
      [CSSPGO] Pseudo probes for function calls. · 24d4291c
      Hongtao Yu authored
      An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.
      
      One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.
      
      With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.
      
      To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.
      
      Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.
      
      Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.
      
      Reviewed By: wmi
      
      Differential Revision: https://reviews.llvm.org/D91756
      24d4291c
    • Alex Zinenko's avatar
      [OpenMPIRBuilder] forward arguments as pointers to outlined function · 240dd924
      Alex Zinenko authored
      OpenMPIRBuilder::createParallel outlines the body region of the parallel
      construct into a new function that accepts any value previously defined outside
      the region as a function argument. This function is called back by OpenMP
      runtime function __kmpc_fork_call, which expects trailing arguments to be
      pointers. If the region uses a value that is not of a pointer type, e.g. a
      struct, the produced code would be invalid. In such cases, make createParallel
      emit IR that stores the value on stack and pass the pointer to the outlined
      function instead. The outlined function then loads the value back and uses as
      normal.
      
      Reviewed By: jdoerfert, llitchev
      
      Differential Revision: https://reviews.llvm.org/D92189
      240dd924
  2. Nov 30, 2020
    • Mircea Trofin's avatar
      [llvm][inliner] Reuse the inliner pass to implement 'always inliner' · 5fe10263
      Mircea Trofin authored
      Enable performing mandatory inlinings upfront, by reusing the same logic
      as the full inliner, instead of the AlwaysInliner. This has the
      following benefits:
      - reduce code duplication - one inliner codebase
      - open the opportunity to help the full inliner by performing additional
      function passes after the mandatory inlinings, but before th full
      inliner. Performing the mandatory inlinings first simplifies the problem
      the full inliner needs to solve: less call sites, more contextualization, and,
      depending on the additional function optimization passes run between the
      2 inliners, higher accuracy of cost models / decision policies.
      
      Note that this patch does not yet enable much in terms of post-always
      inline function optimization.
      
      Differential Revision: https://reviews.llvm.org/D91567
      5fe10263
    • Hongtao Yu's avatar
      [CSSPGO] Pseudo probe instrumentation pass · 64fa8cce
      Hongtao Yu authored
      This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
      
      Given the following LLVM IR:
      
      ```
      define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
      bb0:
        %cmp = icmp eq i32 %x, 0
         br i1 %cmp, label %bb1, label %bb2
      bb1:
         br label %bb3
      bb2:
         br label %bb3
      bb3:
         ret void
      }
      ```
      
      The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.
      
      ```
      define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
      bb0:
         %cmp = icmp eq i32 %x, 0
         call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
         br i1 %cmp, label %bb1, label %bb2
      bb1:
         call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
         br label %bb3
      bb2:
         call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
         br label %bb3
      bb3:
         call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
         ret void
      }
      ```
      
      Reviewed By: wmi
      
      Differential Revision: https://reviews.llvm.org/D86499
      64fa8cce
  3. Nov 28, 2020
    • Andrew Litteken's avatar
      Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner." · a8a43b63
      Andrew Litteken authored
      Reverting commit due to address sanitizer errors.
      
      > Extracting the similar regions is the first step in the IROutliner.
      > 
      > Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
      > sort them by how many instructions will be removed.  Each
      > IRSimilarityCandidate is used to define an OutlinableRegion.  Each
      > region is ordered by their occurrence in the Module and the regions that
      > are not compatible with previously outlined regions are discarded.
      > 
      > Each region is then extracted with the CodeExtractor into its own
      > function.
      > 
      > We test that correctly extract in:
      > test/Transforms/IROutliner/extraction.ll
      > test/Transforms/IROutliner/address-taken.ll
      > test/Transforms/IROutliner/outlining-same-globals.ll
      > test/Transforms/IROutliner/outlining-same-constants.ll
      > test/Transforms/IROutliner/outlining-different-structure.ll
      > 
      > Reviewers: paquette, jroelofs, yroux
      > 
      > Differential Revision: https://reviews.llvm.org/D86975
      
      This reverts commit bf899e89.
      a8a43b63
    • Andrew Litteken's avatar
      [IRSim][IROutliner] Adding the extraction basics for the IROutliner. · bf899e89
      Andrew Litteken authored
      Extracting the similar regions is the first step in the IROutliner.
      
      Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
      sort them by how many instructions will be removed.  Each
      IRSimilarityCandidate is used to define an OutlinableRegion.  Each
      region is ordered by their occurrence in the Module and the regions that
      are not compatible with previously outlined regions are discarded.
      
      Each region is then extracted with the CodeExtractor into its own
      function.
      
      We test that correctly extract in:
      test/Transforms/IROutliner/extraction.ll
      test/Transforms/IROutliner/address-taken.ll
      test/Transforms/IROutliner/outlining-same-globals.ll
      test/Transforms/IROutliner/outlining-same-constants.ll
      test/Transforms/IROutliner/outlining-different-structure.ll
      
      Reviewers: paquette, jroelofs, yroux
      
      Differential Revision: https://reviews.llvm.org/D86975
      bf899e89
  4. Nov 26, 2020
    • Nikita Popov's avatar
      [AA] Split up LocationSize::unknown() · 4df8efce
      Nikita Popov authored
      Currently, we have some confusion in the codebase regarding the
      meaning of LocationSize::unknown(): Some parts (including most of
      BasicAA) assume that LocationSize::unknown() only allows accesses
      after the base pointer. Some parts (various callers of AA) assume
      that LocationSize::unknown() allows accesses both before and after
      the base pointer (but within the underlying object).
      
      This patch splits up LocationSize::unknown() into
      LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
      to make this completely unambiguous. I tried my best to determine
      which one is appropriate for all the existing uses.
      
      The test changes in cs-cs.ll in particular illustrate a previously
      clearly incorrect AA result: We were effectively assuming that
      argmemonly functions were only allowed to access their arguments
      after the passed pointer, but not before it. I'm pretty sure that
      this was not intentional, and it's certainly not specified by
      LangRef that way.
      
      Differential Revision: https://reviews.llvm.org/D91649
      4df8efce
  5. Nov 25, 2020
    • Roman Lebedev's avatar
      [PassManager] Run Induction Variable Simplification pass *after* Recognize... · a8d74517
      Roman Lebedev authored
      [PassManager] Run Induction Variable Simplification pass *after* Recognize loop idioms pass, not before
      
      Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does.
      I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand,
      but i'm *very* sure that `-indvars` requires `-loop-idiom` to run afterwards,
      as it can be seen in the phase-ordering test.
      
      LoopIdiom runs on two types of loops: countable ones, and uncountable ones.
      For uncountable ones, IndVars obviously didn't make any change to them,
      since they are uncountable, so for them the order should be irrelevant.
      For countable ones, well, they should have been countable before IndVars
      for IndVars to make any change to them, and since SCEV is used on them,
      it shouldn't matter if IndVars have already canonicalized them.
      So i don't really see why we'd want the current ordering.
      
      Should this cause issues, it will give us a reproducer test case
      that shows flaws in this logic, and we then could adjust accordingly.
      
      While this is quite likely beneficial in-the-wild already,
      it's a required part for the full motivational pattern
      behind `left-shift-until-bittest` loop idiom (D91038).
      
      Reviewed By: dmgreen
      
      Differential Revision: https://reviews.llvm.org/D91800
      a8d74517
  6. Nov 24, 2020
    • Teresa Johnson's avatar
      [ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends · 6e4c1cf2
      Teresa Johnson authored
      Previously this option could be used to skip devirtualizations of the
      given functions in regular LTO and in the ThinLTO indexing step. This
      change allows them to be skipped in the backend as well, which is useful
      when debugging WPD in a distributed ThinLTO backend.
      
      Differential Revision: https://reviews.llvm.org/D91812
      6e4c1cf2
    • Arthur Eubanks's avatar
      [FunctionAttrs][NPM] Fix handling of convergent · 932e4f88
      Arthur Eubanks authored
      The legacy pass didn't properly detect indirect calls.
      
      We can still remove the convergent attribute when there are indirect
      calls. The LangRef says:
      
      > When it appears on a call/invoke, the convergent attribute indicates
      that we should treat the call as though we’re calling a convergent
      function. This is particularly useful on indirect calls; without this we
      may treat such calls as though the target is non-convergent.
      
      So don't skip handling of convergent when there are unknown calls.
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D89826
      932e4f88
  7. Nov 23, 2020
  8. Nov 20, 2020
  9. Nov 19, 2020
  10. Nov 18, 2020
  11. Nov 16, 2020
  12. Nov 14, 2020
  13. Nov 13, 2020
    • Guozhi Wei's avatar
      [AlwaysInliner] Call mergeAttributesForInlining after inlining · a20220d2
      Guozhi Wei authored
      Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining
      should be called to merge callee's attributes to caller. But it is not called in
      AlwaysInliner, causes caller's attributes inconsistent with inlined code.
      
      Attached test case demonstrates that attribute "min-legal-vector-width"="512" is
      not merged into caller without this patch, and it causes failure in SelectionDAG
      when lowering the inlined AVX512 intrinsic.
      
      Differential Revision: https://reviews.llvm.org/D91446
      a20220d2
    • serge-sans-paille's avatar
      llvmbuildectomy - compatibility with ocaml bindings · 95537f45
      serge-sans-paille authored
      Use exact component name in add_ocaml_library.
      Make expand_topologically compatible with new architecture.
      Fix quoting in is_llvm_target_library.
      Fix LLVMipo component name.
      Write release note.
      95537f45
    • Florian Hahn's avatar
      Add !annotation metadata and remarks pass. · 8bb63479
      Florian Hahn authored
      This patch adds a new !annotation metadata kind which can be used to
      attach annotation strings to instructions.
      
      It also adds a new pass that emits summary remarks per function with the
      counts for each annotation kind.
      
      The intended uses cases for this new metadata is annotating
      'interesting' instructions and the remarks should provide additional
      insight into transformations applied to a program.
      
      To motivate this, consider these specific questions we would like to get answered:
      
      * How many stores added for automatic variable initialization remain after optimizations? Where are they?
      * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
      
      Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
      (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
      
      Reviewed By: thegameg, jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D91188
      8bb63479
    • serge-sans-paille's avatar
      llvmbuildectomy - replace llvm-build by plain cmake · 9218ff50
      serge-sans-paille authored
      No longer rely on an external tool to build the llvm component layout.
      
      Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
      introduce `add_llvm_component_group` to accurately describe component behavior.
      
      These function store extra properties in the created targets. These properties
      are processed once all components are defined to resolve library dependencies
      and produce the header expected by llvm-config.
      
      Differential Revision: https://reviews.llvm.org/D90848
      9218ff50
    • Arthur Eubanks's avatar
      [NFC] Removed unused variable · b9406121
      Arthur Eubanks authored
      Obsolete as of https://reviews.llvm.org/D91046.
      b9406121
  14. Nov 11, 2020
    • Arthur Eubanks's avatar
      [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass · d9cbceb0
      Arthur Eubanks authored
      Previously the inliner did a bit of a hack by adding ref edges for all
      new edges introduced by performing an inline before calling
      updateCGAndAnalysisManagerForPass(). This was because
      updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
      edges.
      
      This adds handling of non-trivial call edges to
      updateCGAndAnalysisManagerForPass().  The inliner called
      updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
      newly introduced edges (so updateCGAndAnalysisManagerForPass() would
      only have to handle promotion), but now it needs to call
      updateCGAndAnalysisManagerForCGSCCPass() since
      updateCGAndAnalysisManagerForPass() is now handling the new call edges
      and function passes cannot add new edges.
      
      We follow the previous path of adding trivial ref edges then letting promotion
      handle changing the ref edges to call edges and the CGSCC updates. So
      this still does not allow adding call edges that result in an addition
      of a non-trivial ref edge.
      
      This is in preparation for better detecting devirtualization. Previously
      since the inliner itself would add ref edges,
      updateCGAndAnalysisManagerForPass() would think that promotion and thus
      devirtualization had happened after any sort of inlining.
      
      Reviewed By: asbirlea
      
      Differential Revision: https://reviews.llvm.org/D91046
      d9cbceb0
  15. Nov 10, 2020
    • Sjoerd Meijer's avatar
      [LoopFlatten] Run it earlier, just before IndVarSimplify · 2ef47910
      Sjoerd Meijer authored
      This is a prep step for widening induction variables in LoopFlatten if this is
      posssible (D90640), to avoid having to perform certain overflow checks. Since
      IndVarSimplify may already widen induction variables, we want to run
      LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
      passes were already close after each other.
      
      Differential Revision: https://reviews.llvm.org/D90402
      2ef47910
    • Sanne Wouda's avatar
      Add loop distribution to the LTO pipeline · dd03881b
      Sanne Wouda authored
      The LoopDistribute pass is missing from the LTO pipeline, so
      -enable-loop-distribute has no effect during post-link. The pre-link
      loop distribution doesn't seem to survive the LTO pipeline either.
      
      With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
      uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
      unaffected.
      
      Differential Revision: https://reviews.llvm.org/D89896
      dd03881b
    • Michael Kruse's avatar
      [OMPIRBuilder] Start 'Create' methods with lower case. NFC. · e5dba2d7
      Michael Kruse authored
      For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
      
      This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
      
      I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
      
      Reviewed By: mehdi_amini, fghanim
      
      Differential Revision: https://reviews.llvm.org/D91109
      e5dba2d7
  16. Nov 03, 2020
  17. Nov 02, 2020
  18. Oct 31, 2020
  19. Oct 30, 2020
  20. Oct 29, 2020
  21. Oct 28, 2020
Loading