Skip to content
  1. Dec 12, 2020
    • David Green's avatar
      [LV] Fix scalar cost for tail predicated loops · ab97c9bd
      David Green authored
      When it comes to the scalar cost of any predicated block, the loop
      vectorizer by default regards this predication as a sign that it is
      looking at an if-conversion and divides the scalar cost of the block by
      2, assuming it would only be executed half the time. This however makes
      no sense if the predication has been introduced to tail predicate the
      loop.
      
      Original patch by Anna Welker
      
      Differential Revision: https://reviews.llvm.org/D86452
      ab97c9bd
  2. Dec 11, 2020
    • Fangrui Song's avatar
      Migrate deprecated DebugLoc::get to DILocation::get · b5ad32ef
      Fangrui Song authored
      This migrates all LLVM (except Kaleidoscope and
      CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.
      
      The CodeGen/StackProtector.cpp usage may have a nullptr Scope
      and can trigger an assertion failure, so I don't migrate it.
      
      Reviewed By: #debug-info, dblaikie
      
      Differential Revision: https://reviews.llvm.org/D93087
      b5ad32ef
    • Marco Elver's avatar
      [KernelAddressSanitizer] Fix globals exclusion for indirect aliases · c28b18af
      Marco Elver authored
      GlobalAlias::getAliasee() may not always point directly to a
      GlobalVariable. In such cases, try to find the canonical GlobalVariable
      that the alias refers to.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/1208
      
      Reviewed By: dvyukov, nickdesaulniers
      
      Differential Revision: https://reviews.llvm.org/D92846
      c28b18af
    • David Sherwood's avatar
      [Support] Introduce a new InstructionCost class · 9b76160e
      David Sherwood authored
      This is the first in a series of patches that attempts to migrate
      existing cost instructions to return a new InstructionCost class
      in place of a simple integer. This new class is intended to be
      as light-weight and simple as possible, with a full range of
      arithmetic and comparison operators that largely mirror the same
      sets of operations on basic types, such as integers. The main
      advantage to using an InstructionCost is that it can encode a
      particular cost state in addition to a value. The initial
      implementation only has two states - Normal and Invalid - but these
      could be expanded over time if necessary. An invalid state can
      be used to represent an unknown cost or an instruction that is
      prohibitively expensive.
      
      This patch adds the new class and changes the getInstructionCost
      interface to return the new class. Other cost functions, such as
      getUserCost, etc., will be migrated in future patches as I believe
      this to be less disruptive. One benefit of this new class is that
      it provides a way to unify many of the magic costs in the codebase
      where the cost is set to a deliberately high number to prevent
      optimisations taking place, e.g. vectorization. It also provides
      a route to represent the extremely high, and unknown, cost of
      scalarization of scalable vectors, which is not currently supported.
      
      Differential Revision: https://reviews.llvm.org/D91174
      9b76160e
    • Hongtao Yu's avatar
      [CSSPGO] Pseudo probe encoding and emission. · 705a4c14
      Hongtao Yu authored
      This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
      
      Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 
      
      **ELF object emission**
      
      The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
      
      Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
      
      The format of `.pseudo_probe_desc` section looks like:
      
      ```
      .section   .pseudo_probe_desc,"",@progbits
      .quad   6309742469962978389  // Func GUID
      .quad   4294967295           // Func Hash
      .byte   9                    // Length of func name
      .ascii  "_Z5funcAi"          // Func name
      .quad   7102633082150537521
      .quad   138828622701
      .byte   12
      .ascii  "_Z8funcLeafi"
      .quad   446061515086924981
      .quad   4294967295
      .byte   9
      .ascii  "_Z5funcBi"
      .quad   -2016976694713209516
      .quad   72617220756
      .byte   7
      .ascii  "_Z3fibi"
      ```
      
      For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
      
      ```
      FUNCTION BODY (one for each outlined function present in the text section)
          GUID (uint64)
              GUID of the function
          NPROBES (ULEB128)
              Number of probes originating from this function.
          NUM_INLINED_FUNCTIONS (ULEB128)
              Number of callees inlined into this function, aka number of
              first-level inlinees
          PROBE RECORDS
              A list of NPROBES entries. Each entry contains:
                INDEX (ULEB128)
                TYPE (uint4)
                  0 - block probe, 1 - indirect call, 2 - direct call
                ATTRIBUTE (uint3)
                  reserved
                ADDRESS_TYPE (uint1)
                  0 - code address, 1 - address delta
                CODE_ADDRESS (uint64 or ULEB128)
                  code address or address delta, depending on ADDRESS_TYPE
          INLINED FUNCTION RECORDS
              A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
              callees.  Each record contains:
                INLINE SITE
                  GUID of the inlinee (uint64)
                  ID of the callsite probe (ULEB128)
                FUNCTION BODY
                  A FUNCTION BODY entry describing the inlined function.
      ```
      
      To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
      
      **Assembling**
      
      Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
      
      A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
      
      A example assembly looks like:
      
      ```
      foo2: # @foo2
      # %bb.0: # %bb0
      pushq %rax
      testl %edi, %edi
      .pseudoprobe 837061429793323041 1 0 0
      je .LBB1_1
      # %bb.2: # %bb2
      .pseudoprobe 837061429793323041 6 2 0
      callq foo
      .pseudoprobe 837061429793323041 3 0 0
      .pseudoprobe 837061429793323041 4 0 0
      popq %rax
      retq
      .LBB1_1: # %bb1
      .pseudoprobe 837061429793323041 5 1 0
      callq *%rsi
      .pseudoprobe 837061429793323041 2 0 0
      .pseudoprobe 837061429793323041 4 0 0
      popq %rax
      retq
      # -- End function
      .section .pseudo_probe_desc,"",@progbits
      .quad 6699318081062747564
      .quad 72617220756
      .byte 3
      .ascii "foo"
      .quad 837061429793323041
      .quad 281547593931412
      .byte 4
      .ascii "foo2"
      ```
      
      With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
      
      ```
      # %bb.2:                                # %bb2
      .pseudoprobe    837061429793323041 3 0
      .pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
      .pseudoprobe    837061429793323041 4 0
      popq    %rax
      retq
      ```
      
      **Disassembling**
      
      We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
      
      An example disassembly looks like:
      
      ```
      00000000002011a0 <foo2>:
        2011a0: 50                    push   rax
        2011a1: 85 ff                 test   edi,edi
        [Probe]:  FUNC: foo2  Index: 1  Type: Block
        2011a3: 74 02                 je     2011a7 <foo2+0x7>
        [Probe]:  FUNC: foo2  Index: 3  Type: Block
        [Probe]:  FUNC: foo2  Index: 4  Type: Block
        [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
        2011a5: 58                    pop    rax
        2011a6: c3                    ret
        [Probe]:  FUNC: foo2  Index: 2  Type: Block
        2011a7: bf 01 00 00 00        mov    edi,0x1
        [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
        2011ac: ff d6                 call   rsi
        [Probe]:  FUNC: foo2  Index: 4  Type: Block
        2011ae: 58                    pop    rax
        2011af: c3                    ret
      ```
      
      Reviewed By: wmi
      
      Differential Revision: https://reviews.llvm.org/D91878
      705a4c14
    • Mitch Phillips's avatar
      Revert "[CSSPGO] Pseudo probe encoding and emission." · 7ead5f5a
      Mitch Phillips authored
      This reverts commit b035513c.
      
      Reason: Broke the ASan buildbots:
        http://lab.llvm.org:8011/#/builders/5/builds/2269
      7ead5f5a
  3. Dec 10, 2020
    • Zequan Wu's avatar
      b5216b29
    • Sanjay Patel's avatar
      [InstCombine] avoid crash sinking to unreachable block · 4f051fe3
      Sanjay Patel authored
      The test is reduced from the example in D82005.
      
      Similar to 94f6d365, the test here would assert in
      the DomTree when we tried to convert a select to a
      phi with an unreachable block operand.
      
      We may want to add some kind of guard code in DomTree
      itself to avoid this sort of problem.
      4f051fe3
    • Sanjay Patel's avatar
      [VectorCombine] improve readability; NFC · 12b684ae
      Sanjay Patel authored
      If we are going to allow adjusting the pointer for GEPs,
      rearranging the code a bit will make it easier to follow.
      12b684ae
    • Hongtao Yu's avatar
      [CSSPGO] Pseudo probe encoding and emission. · b035513c
      Hongtao Yu authored
      This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
      
      Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 
      
      **ELF object emission**
      
      The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
      
      Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
      
      The format of `.pseudo_probe_desc` section looks like:
      
      ```
      .section   .pseudo_probe_desc,"",@progbits
      .quad   6309742469962978389  // Func GUID
      .quad   4294967295           // Func Hash
      .byte   9                    // Length of func name
      .ascii  "_Z5funcAi"          // Func name
      .quad   7102633082150537521
      .quad   138828622701
      .byte   12
      .ascii  "_Z8funcLeafi"
      .quad   446061515086924981
      .quad   4294967295
      .byte   9
      .ascii  "_Z5funcBi"
      .quad   -2016976694713209516
      .quad   72617220756
      .byte   7
      .ascii  "_Z3fibi"
      ```
      
      For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
      
      ```
      FUNCTION BODY (one for each outlined function present in the text section)
          GUID (uint64)
              GUID of the function
          NPROBES (ULEB128)
              Number of probes originating from this function.
          NUM_INLINED_FUNCTIONS (ULEB128)
              Number of callees inlined into this function, aka number of
              first-level inlinees
          PROBE RECORDS
              A list of NPROBES entries. Each entry contains:
                INDEX (ULEB128)
                TYPE (uint4)
                  0 - block probe, 1 - indirect call, 2 - direct call
                ATTRIBUTE (uint3)
                  reserved
                ADDRESS_TYPE (uint1)
                  0 - code address, 1 - address delta
                CODE_ADDRESS (uint64 or ULEB128)
                  code address or address delta, depending on ADDRESS_TYPE
          INLINED FUNCTION RECORDS
              A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
              callees.  Each record contains:
                INLINE SITE
                  GUID of the inlinee (uint64)
                  ID of the callsite probe (ULEB128)
                FUNCTION BODY
                  A FUNCTION BODY entry describing the inlined function.
      ```
      
      To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
      
      **Assembling**
      
      Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
      
      A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
      
      A example assembly looks like:
      
      ```
      foo2: # @foo2
      # %bb.0: # %bb0
      pushq %rax
      testl %edi, %edi
      .pseudoprobe 837061429793323041 1 0 0
      je .LBB1_1
      # %bb.2: # %bb2
      .pseudoprobe 837061429793323041 6 2 0
      callq foo
      .pseudoprobe 837061429793323041 3 0 0
      .pseudoprobe 837061429793323041 4 0 0
      popq %rax
      retq
      .LBB1_1: # %bb1
      .pseudoprobe 837061429793323041 5 1 0
      callq *%rsi
      .pseudoprobe 837061429793323041 2 0 0
      .pseudoprobe 837061429793323041 4 0 0
      popq %rax
      retq
      # -- End function
      .section .pseudo_probe_desc,"",@progbits
      .quad 6699318081062747564
      .quad 72617220756
      .byte 3
      .ascii "foo"
      .quad 837061429793323041
      .quad 281547593931412
      .byte 4
      .ascii "foo2"
      ```
      
      With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
      
      ```
      # %bb.2:                                # %bb2
      .pseudoprobe    837061429793323041 3 0
      .pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
      .pseudoprobe    837061429793323041 4 0
      popq    %rax
      retq
      ```
      
      **Disassembling**
      
      We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
      
      An example disassembly looks like:
      
      ```
      00000000002011a0 <foo2>:
        2011a0: 50                    push   rax
        2011a1: 85 ff                 test   edi,edi
        [Probe]:  FUNC: foo2  Index: 1  Type: Block
        2011a3: 74 02                 je     2011a7 <foo2+0x7>
        [Probe]:  FUNC: foo2  Index: 3  Type: Block
        [Probe]:  FUNC: foo2  Index: 4  Type: Block
        [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
        2011a5: 58                    pop    rax
        2011a6: c3                    ret
        [Probe]:  FUNC: foo2  Index: 2  Type: Block
        2011a7: bf 01 00 00 00        mov    edi,0x1
        [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
        2011ac: ff d6                 call   rsi
        [Probe]:  FUNC: foo2  Index: 4  Type: Block
        2011ae: 58                    pop    rax
        2011af: c3                    ret
      ```
      
      Reviewed By: wmi
      
      Differential Revision: https://reviews.llvm.org/D91878
      b035513c
    • Jun Ma's avatar
      [TruncInstCombine] Remove scalable vector restriction · 137674f8
      Jun Ma authored
      Differential Revision: https://reviews.llvm.org/D92819
      137674f8
  4. Dec 09, 2020
    • Jianzhou Zhao's avatar
      [dfsan] Track field/index-level shadow values in variables · ea981165
      Jianzhou Zhao authored
      *************
      * The problem
      *************
      See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
      DFSan always uses a 16bit shadow value for a variable with any type by
      combining all shadow values of all bytes of the variable. So it cannot
      distinguish two fields of a struct: each field's shadow value equals the
      combined shadow value of all fields. This introduces an overtaint issue.
      
      Consider a parsing function
      
         std::pair<char*, int> get_token(char* p);
      
      where p points to a buffer to parse, the returned pair includes the next
      token and the pointer to the position in the buffer after the token.
      
      If the token is tainted, then both the returned pointer and int ar
      tainted. If the parser keeps on using get_token for the rest parsing,
      all the following outputs are tainted because of the tainted pointer.
      
      The CL is the first change to address the issue.
      
      **************************
      * The proposed improvement
      **************************
      Eventually all fields and indices have their own shadow values in
      variables and memory.
      
      For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
      [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
      {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
      primary type still have shadow values i16.
      
      ***************************
      * An potential implementation plan
      ***************************
      
      The idea is to adopt the change incrementially.
      
      1) This CL
      Support field-level accuracy at variables/args/ret in TLS mode,
      load/store/alloca still use combined shadow values.
      
      After the alloca promotion and SSA construction phases (>=-O1), we
      assume alloca and memory operations are reduced. So if struct
      variables do not relate to memory, their tracking is accurate at
      field level.
      
      2) Support field-level accuracy at alloca
      3) Support field-level accuracy at load/store
      
      These two should make O0 and real memory access work.
      
      4) Support vector if necessary.
      5) Support Args mode if necessary.
      6) Support passing more accurate shadow values via custom functions if
      necessary.
      
      ***************
      * About this CL.
      ***************
      The CL did the following
      
      1) extended TLS arg/ret to work with aggregate types. This is similar
      to what MSan does.
      
      2) implemented how to map between an original type/value/zero-const to
      its shadow type/value/zero-const.
      
      3) extended (insert|extract)value to use field/index-level progagation.
      
      4) for other instructions, propagation rules are combining inputs by or.
      The CL converts between aggragate and primary shadow values at the
      cases.
      
      5) Custom function interfaces also need such a conversion because
      all existing custom functions use i16. It is unclear whether custome
      functions need more accurate shadow propagation yet.
      
      6) Added test cases for aggregate type related cases.
      
      Reviewed-by: morehouse
      
      Differential Revision: https://reviews.llvm.org/D92261
      ea981165
    • Sanjay Patel's avatar
      [VectorCombine] allow peeking through an extractelt when creating a vector load · b2ef2640
      Sanjay Patel authored
      This is an enhancement to load vectorization that is motivated by
      a pattern in https://llvm.org/PR16739.
      Unfortunately, it's still not enough to make a difference there.
      We will have to handle multi-use cases in some better way to avoid
      creating multiple overlapping loads.
      
      Differential Revision: https://reviews.llvm.org/D92858
      b2ef2640
    • Roman Lebedev's avatar
      [InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390) · e6f2a79d
      Roman Lebedev authored
      We could create uadd.sat under incorrect circumstances
      if a select with -1 as the false value was canonicalized
      by swapping the T/F values. Unlike the other transforms
      in the same function, it is not invariant to equality.
      
      Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL
      
      Based on original patch by David Green!
      
      Fixes https://bugs.llvm.org/show_bug.cgi?id=48390
      
      Differential Revision: https://reviews.llvm.org/D92717
      e6f2a79d
    • Anton Afanasyev's avatar
      [SLP] Use the width of value truncated just before storing · e5bf2e89
      Anton Afanasyev authored
      For stores chain vectorization we choose the size of vector
      elements to ensure we fit to minimum and maximum vector register
      size for the number of elements given. This patch corrects vector
      element size choosing the width of value truncated just before
      storing instead of the width of value stored.
      
      Fixes PR46983
      
      Differential Revision: https://reviews.llvm.org/D92824
      e5bf2e89
    • Sander de Smalen's avatar
      [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. · d568cff6
      Sander de Smalen authored
      * Steps are scaled by `vscale`, a runtime value.
      * Changes to circumvent the cost-model for now (temporary)
        so that the cost-model can be implemented separately.
      
      This can vectorize the following loop [1]:
      
         void loop(int N, double *a, double *b) {
           #pragma clang loop vectorize_width(4, scalable)
           for (int i = 0; i < N; i++) {
             a[i] = b[i] + 1.0;
           }
         }
      
      [1] This source-level example is based on the pragma proposed
      separately in D89031. This patch only implements the LLVM part.
      
      Reviewed By: dmgreen
      
      Differential Revision: https://reviews.llvm.org/D91077
      d568cff6
    • Sander de Smalen's avatar
      [LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. · adc37145
      Sander de Smalen authored
      This patch removes a number of asserts that VF is not scalable, even though
      the code where this assert lives does nothing that prevents VF being scalable.
      
      Reviewed By: dmgreen
      
      Differential Revision: https://reviews.llvm.org/D91060
      adc37145
    • Joe Ellis's avatar
      [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics · 80c33de2
      Joe Ellis authored
      This commit adds two new intrinsics.
      
      - llvm.experimental.vector.insert: used to insert a vector into another
        vector starting at a given index.
      
      - llvm.experimental.vector.extract: used to extract a subvector from a
        larger vector starting from a given index.
      
      The codegen work for these intrinsics has already been completed; this
      commit is simply exposing the existing ISD nodes to LLVM IR.
      
      Reviewed By: cameron.mcinally
      
      Differential Revision: https://reviews.llvm.org/D91362
      80c33de2
    • Philip Reames's avatar
      [indvars] Common a bit of code [NFC] · 5171b7b4
      Philip Reames authored
      5171b7b4
  5. Dec 08, 2020
  6. Dec 07, 2020
  7. Dec 06, 2020
    • Fangrui Song's avatar
    • Wenlei He's avatar
      [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining · 6b989a17
      Wenlei He authored
      This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.
      
      Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.
      
      **Interface**
      
      There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.
      
      - Query base profile (`getBaseSamplesFor`)
      Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.
      
      - Query context profile (`getContextSamplesFor`)
      Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.
      
      - Track inlined context profile (`markContextSamplesInlined`)
      When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.
      
      - Track not-inlined context profile (`promoteMergeContextSamplesTree`)
      When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.
      
      **Implementation**
      
      Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.
      
      On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.
      
      **Integration**
      
      `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.
      
      Differential Revision: https://reviews.llvm.org/D90125
      6b989a17
    • Kazu Hirata's avatar
      [InstCombine] Remove replacePointer (NFC) · ddb002d7
      Kazu Hirata authored
      The declaration was introduced on Feb 10, 2017 in commit
      ba01ed00 without a corresponding
      definition.
      ddb002d7
    • Sanjay Patel's avatar
    • Fangrui Song's avatar
      [MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model · 204d0d51
      Fangrui Song authored
      The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference.
      The program will crash without the dso_local specifier.
      204d0d51
Loading