Skip to content
  1. Jun 01, 2020
    • Hiroshi Yamauchi's avatar
      [PGO] Improve the working set size heuristics under the partial sample PGO. · 6c27c61d
      Hiroshi Yamauchi authored
      Summary:
      The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
      under the partial sample PGO may not be accurate because the profile is partial
      and the number of hot profile counters in the ProfileSummary may not reflect the
      actual working set size of the program being compiled.
      
      To improve this, the (approximated) ratio of the the number of profile counters
      of the program being compiled to the number of profile counters in the partial
      sample profile is computed (which is called the partial profile ratio) and the
      working set size of the profile is scaled by this ratio to reflect the working
      set size of the program being compiled and used for the working set size
      heuristics.
      
      The partial profile ratio is approximated based on the number of the basic
      blocks in the program and the NumCounts field in the ProfileSummary and computed
      through the thin LTO indexing. This means that there is the limitation that the
      scaled working set size is available to the thin LTO post link passes only.
      
      Reviewers: davidxl
      
      Subscribers: mgorny, eraman, hiraditya, steven_wu, dexonsmith, arphaman, dang, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79831
      6c27c61d
  2. May 29, 2020
  3. May 27, 2020
  4. May 26, 2020
  5. May 25, 2020
  6. May 24, 2020
    • Sanjay Patel's avatar
      [Pass Manager] remove EarlyCSE as clean-up for VectorCombine · 57bb4787
      Sanjay Patel authored
      EarlyCSE was added with D75145, but the motivating test is
      not regressed by removing the extra pass now. That might be
      because VectorCombine altered the way it processes instructions,
      or it might be from (re)moving VectorCombine in the pipeline.
      
      The extra round of EarlyCSE appears to cost approximately
      0.26% in compile-time as discussed in D80236, so we need some
      evidence to justify its inclusion here, but we do not have
      that (yet).
      
      I suspect that between SLP and VectorCombine, we are creating
      patterns that InstCombine and/or codegen are not prepared for,
      but we will need to reduce those examples and include them as
      PhaseOrdering and/or test-suite benchmarks.
      57bb4787
  7. May 23, 2020
    • Craig Topper's avatar
      [Align] Remove operations on MaybeAlign that asserted that it had a defined value. · 7392820f
      Craig Topper authored
      If the caller needs to reponsible for making sure the MaybeAlign
      has a value, then we should just make the caller convert it to an Align
      with operator*.
      
      I explicitly deleted the relational comparison operators that
      were being inherited from Optional. It's unclear what the meaning
      of two MaybeAligns were one is defined and the other isn't
      should be. So make the caller reponsible for defining the behavior.
      
      I left the ==/!= operators from Optional. But now that exposed a
      weird quirk that ==/!= between Align and MaybeAlign required the
      MaybeAlign to be defined. But now we use the operator== from
      Optional that takes an Optional and the Value.
      
      Differential Revision: https://reviews.llvm.org/D80455
      7392820f
  8. May 22, 2020
    • Sanjay Patel's avatar
      [VectorCombine] position pass after SLP in the optimization pipeline rather than before · 6438ea45
      Sanjay Patel authored
      There are 2 known problem patterns shown in the test diffs here:
      vector horizontal ops (an x86 specialization) and vector reductions.
      
      SLP has greater ability to match and fold those than vector-combine,
      so let SLP have first chance at that.
      
      This is a quick fix while we continue to improve vector-combine and
      possibly canonicalize to reduction intrinsics.
      
      In the longer term, we should improve matching of these patterns
      because if they were created in the "bad" forms shown here, then we
      would miss optimizing them.
      
      I'm not sure what is happening with alias analysis on the addsub test.
      The old pass manager now shows an extra line for that, and we see an
      improvement that comes from SLP vectorizing a store. I don't know
      what's missing with the new pass manager to make that happen.
      Strangely, I can't reproduce the behavior if I compile from C++ with
      clang and invoke the new PM with "-fexperimental-new-pass-manager".
      
      Differential Revision: https://reviews.llvm.org/D80236
      6438ea45
  9. May 21, 2020
    • Eli Friedman's avatar
      Make Value::getPointerAlignment() return an Align, not a MaybeAlign. · f26bdb53
      Eli Friedman authored
      If we don't know anything about the alignment of a pointer, Align(1) is
      still correct: all pointers are at least 1-byte aligned.
      
      Included in this patch is a bugfix for an issue discovered during this
      cleanup: pointers with "dereferenceable" attributes/metadata were
      assumed to be aligned according to the type of the pointer.  This
      wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
      stop making this assumption. Frontends may need to be updated.  I
      updated clang's handling of C++ references, and added a release note for
      this.
      
      Differential Revision: https://reviews.llvm.org/D80072
      f26bdb53
  10. May 20, 2020
    • Arthur Eubanks's avatar
      Reland [X86] Codegen for preallocated · 8a887556
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      8a887556
    • Arthur Eubanks's avatar
      Revert "[X86] Codegen for preallocated" · b8cbff51
      Arthur Eubanks authored
      This reverts commit 810567dc.
      
      Some tests are unexpectedly passing
      b8cbff51
    • Arthur Eubanks's avatar
      [X86] Codegen for preallocated · 810567dc
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      810567dc
  11. May 18, 2020
  12. May 16, 2020
  13. May 15, 2020
  14. May 14, 2020
  15. May 13, 2020
  16. May 12, 2020
  17. May 11, 2020
    • Johannes Doerfert's avatar
      [Attributor][FIX] Disallow function signature rewrite for casted calls · 8d94d3c3
      Johannes Doerfert authored
      We will now ensure ensure the return type of called function is the type
      of all call sites we are going to rewrite. This avoids a problem
      partially fixed by D79680. The part that was not covered is a use of
      this "weird" casted call site (see `@func3` in `misc_crash.ll`).
      
      misc_crash.ll checks are auto-generated now.
      8d94d3c3
    • Johannes Doerfert's avatar
      [Attributor] Make AAIsDead dependences optional to prevent top state · c115a78f
      Johannes Doerfert authored
      We should never give up on AAIsDead as it guards other AAs from
      unreachable code (in which SSA properties are meaningless). We did
      however use required dependences on some queries in AAIsDead which
      caused us to invalidate AAIsDead if the queried AA got invalidated.
      We now use optional dependences instead. The bug that exposed this is
      added to the liveness.ll test and other test changes show the impact.
      
      Bug report by @sdmitriev.
      c115a78f
    • Johannes Doerfert's avatar
      [Attributor] Force update of "newly live" abstract attributes · c86fd333
      Johannes Doerfert authored
      During an update of AAIsDead, new instructions become live. If we query
      information from them, the result is often just the initial state, e.g.,
      for call site `noreturn` and `nounwind`. We will now trigger an update
      for cached attributes during the AAIsDead update, though other AAs might
      later use the same API.
      c86fd333
    • Mircea Trofin's avatar
      [llvm][NFC] Move inlining decision-related APIs in InliningAdvisor. · 48fa355e
      Mircea Trofin authored
      Summary: Factoring out in preparation to https://reviews.llvm.org/D79042
      
      Reviewers: dblaikie, davidxl
      
      Subscribers: mgorny, eraman, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79613
      48fa355e
    • Sergey Dmitriev's avatar
      [Attributor] Fix for a crash on RAUW when rewriting function signature · 3df40007
      Sergey Dmitriev authored
      Reviewers: jdoerfert, sstefan1, uenoku
      
      Reviewed By: uenoku
      
      Subscribers: hiraditya, uenoku, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79680
      3df40007
    • OCHyams's avatar
      [NFC][DwarfDebug] Add test for variables with a single location which · da100de0
      OCHyams authored
      don't span their entire scope.
      
      The previous commit (6d1c40c1) is an older version of the test.
      
      Reviewed By: aprantl, vsk
      
      Differential Revision: https://reviews.llvm.org/D79573
      da100de0
    • Xun Li's avatar
      Remove an unused Module param · 44e5aaf9
      Xun Li authored
      Summary:
      In D65848 the function getFuncNameInModule was refactored to no longer use module.
      This diff removes the parameter and rename the function name to avoid confusion.
      
      Reviewers: wenlei, wmi, davidxl
      
      Reviewed By: wenlei
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79310
      44e5aaf9
    • Johannes Doerfert's avatar
      [Attributor] Merge the query set into AbstractAttribute · 3a8740bd
      Johannes Doerfert authored
      The old QuerriedAAs contained two vectors, one for required one for
      optional dependences (=queries). We now use a single vector and encode
      the kind directly in the pointer.
      
      This reduces memory consumption and makes the connection between
      abstract attributes and their dependences clearer.
      
      No functional change is intended, changes in the test are due to
      different order in the query map. Neither the order before nor now is in
      any way special.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 543734 (329735/s)
      temporary memory allocations: 105895 (64217/s)
      peak heap memory consumption: 19.19MB
      peak RSS (including heaptrack overhead): 102.26MB
      total memory leaked: 269.10KB
      ```
      
      After:
      ```
      calls to allocation functions: 513292 (341511/s)
      temporary memory allocations: 106028 (70544/s)
      peak heap memory consumption: 13.35MB
      peak RSS (including heaptrack overhead): 95.64MB
      total memory leaked: 269.10KB
      ```
      
      Difference:
      ```
      calls to allocation functions: -30442 (208506/s)
      temporary memory allocations: 133 (-910/s)
      peak heap memory consumption: -5.84MB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      
      ---
      
      Reviewed By: uenoku
      
      Differential Revision: https://reviews.llvm.org/D78729
      3a8740bd
    • Johannes Doerfert's avatar
      [Attributor][FIX] Carefully handle/ignore/forget `argmemonly` · 5e06b251
      Johannes Doerfert authored
      When we have an existing `argmemonly` or `inaccessiblememorargmemonly`
      we used to "know" that information. However, interprocedural constant
      propagation can invalidate these attributes. We now ignore and remove
      these attributes for internal functions (which may be affected by IP
      constant propagation), if we are deriving new attributes for the
      function.
      5e06b251
    • Johannes Doerfert's avatar
      [Attributor] Use "simplify to constant" in genericValueTraversal · 713ee3aa
      Johannes Doerfert authored
      As we replace values with constants interprocedurally, we also need to
      do this "look-through" step during the generic value traversal or we
      would derive properties from replaced values. While this is often not
      problematic, it is when we use the "kind" of a value for reasoning,
      e.g., accesses to arguments allow `argmemonly`.
      713ee3aa
    • Johannes Doerfert's avatar
      [Attributor] Ignore illegal accesses to `null` · 513ac6e9
      Johannes Doerfert authored
      When we categorize a pointer value we bailed at `null` before. If we
      know `null` is not a valid memory location we can ignore it as there
      won't be an access at all.
      513ac6e9
    • Johannes Doerfert's avatar
      [Attributor] Use existing helpers to determine IR facts · 31c03b92
      Johannes Doerfert authored
      We now use getPointerDereferenceableBytes to determine `nonnull` and
      `dereferenceable` facts from the IR. We also use getPointerAlignment in
      AAAlign for the same reason. The latter can interfere with callbacks so
      we do restrict it to non-function-pointers for now.
      31c03b92
    • Johannes Doerfert's avatar
      a9ee8b49
  18. May 08, 2020
    • Johannes Doerfert's avatar
      [Attributor][FIX] Record dependences for assumed dead abstract attributes · edf03914
      Johannes Doerfert authored
      In a recent patch we introduced a problem with abstract attributes that
      were assumed dead at some point. Since `Attributor::updateAA` was
      introduced in 95e0d28b, we did not
      remember the dependence on the liveness AA when an abstract attribute
      was assumed dead and therefore not updated.
      
      Explicit reproducer added in liveness.ll.
      
      ---
      
      Single run of the Attributor module and then CGSCC pass (oldPM)
      for SPASS/clause.c (~10k LLVM-IR loc):
      
      Before:
      ```
      calls to allocation functions: 509242 (345483/s)
      temporary memory allocations: 98666 (66937/s)
      peak heap memory consumption: 18.60MB
      peak RSS (including heaptrack overhead): 103.29MB
      total memory leaked: 269.10KB
      ```
      
      After:
      ```
      calls to allocation functions: 529332 (355494/s)
      temporary memory allocations: 102107 (68574/s)
      peak heap memory consumption: 19.40MB
      peak RSS (including heaptrack overhead): 102.79MB
      total memory leaked: 269.10KB
      ```
      
      Difference:
      ```
      calls to allocation functions: 20090 (1339333/s)
      temporary memory allocations: 3441 (229400/s)
      peak heap memory consumption: 801.45KB
      peak RSS (including heaptrack overhead): 0B
      total memory leaked: 0B
      ```
      edf03914
    • Johannes Doerfert's avatar
      675334da
Loading