Skip to content
  1. Oct 13, 2015
  2. Oct 12, 2015
    • Cong Hou's avatar
      Update the branch weight metadata in JumpThreading pass. · 3320bcd8
      Cong Hou authored
      In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). 
      
      Differential revision: http://reviews.llvm.org/D10979
      
      llvm-svn: 250089
      3320bcd8
    • Oliver Stannard's avatar
      GlobalOpt does not treat externally_initialized globals correctly · 939724cd
      Oliver Stannard authored
      GlobalOpt currently merges stores into the initialisers of internal,
      externally_initialized globals, but should not do so as the value of the global
      may change between the initialiser and any code in the module being run.
      
      llvm-svn: 250035
      939724cd
    • James Molloy's avatar
      [LoopVectorize] Shrink integer operations into the smallest type possible · 55d633bd
      James Molloy authored
      C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
      type (e.g. i32) whenever arithmetic is performed on them.
      
      For targets with native i8 or i16 operations, usually InstCombine can shrink
      the arithmetic type down again. However InstCombine refuses to create illegal
      types, so for targets without i8 or i16 registers, the lengthening and
      shrinking remains.
      
      Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
      their scalar equivalents do not, so during vectorization it is important to
      remove these lengthens and truncates when deciding the profitability of
      vectorization.
      
      The algorithm this uses starts at truncs and icmps, trawling their use-def
      chains until they terminate or instructions outside the loop are found (or
      unsafe instructions like inttoptr casts are found). If the use-def chains
      starting from different root instructions (truncs/icmps) meet, they are
      unioned. The demanded bits of each node in the graph are ORed together to form
      an overall mask of the demanded bits in the entire graph. The minimum bitwidth
      that graph can be truncated to is the bitwidth minus the number of leading
      zeroes in the overall mask.
      
      The intention is that this algorithm should "first do no harm", so it will
      never insert extra cast instructions. This is why the use-def graphs are
      unioned, so that subgraphs with different minimum bitwidths do not need casts
      inserted between them.
      
      This algorithm works hard to reduce compile time impact. DemandedBits are only
      queried if there are extends of illegal types and if a truncate to an illegal
      type is seen. In the general case, this results in a simple linear scan of the
      instructions in the loop.
      
      No non-noise compile time impact was seen on a clang bootstrap build.
      
      llvm-svn: 250032
      55d633bd
  3. Oct 11, 2015
  4. Oct 10, 2015
  5. Oct 09, 2015
    • Owen Anderson's avatar
      Generalize convergent check to handle invokes as well as calls. · 97ca0f3f
      Owen Anderson authored
      llvm-svn: 249892
      97ca0f3f
    • Owen Anderson's avatar
      Teach LoopUnswitch not to perform non-trivial unswitching on loops containing... · 2c9978b1
      Owen Anderson authored
      Teach LoopUnswitch not to perform non-trivial unswitching on loops containing convergent operations.
      
      Doing so could cause the post-unswitching convergent ops to be
      control-dependent on the unswitch condition where they were not before.
      This check could be refined to allow unswitching where the convergent
      operation was already control-dependent on the unswitch condition.
      
      llvm-svn: 249874
      2c9978b1
    • Owen Anderson's avatar
      Refine the definition of convergent to only disallow the addition of new control dependencies. · d95b08a0
      Owen Anderson authored
      This covers the common case of operations that cannot be sunk.
      Operations that cannot be hoisted should already be handled properly via
      the safe-to-speculate rules and mechanisms.
      
      llvm-svn: 249865
      d95b08a0
    • Dehao Chen's avatar
      Make HeaderLineno a local variable. · 41dc5a6e
      Dehao Chen authored
      http://reviews.llvm.org/D13576
      
      As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol.
      
      llvm-svn: 249848
      41dc5a6e
    • Andrea Di Biagio's avatar
      [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls. · 99493df2
      Andrea Di Biagio authored
      Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
      As a consequence, adjacent nontemporal stores are always merged into a
      memset call.
      
      Example:
      
      ;;;
      define void @foo(<4 x float>* nocapture %p) {
      entry:
        store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
        %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
        store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
        ret void
      }
      
      !0 = !{i32 1}
      ;;;
      
      In this example, the two nontemporal stores are combined to a memset of zero
      which does not preserve the nontemporal hint. Later on the backend (tested on a
      x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
      aligned vector stores.
      
      opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
      
      Before:
        xorps  %xmm0, %xmm0
        movaps  %xmm0, 16(%rdi)
        movaps  %xmm0, (%rdi)
      
      With this patch, we no longer merge nontemporal stores into calls to memset.
      In this example, llc correctly expands the two stores into two movntps:
        xorps  %xmm0, %xmm0
        movntps %xmm0, 16(%rdi)
        movntps  %xmm0, (%rdi)
      
      In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
      calls. However a change like that would only have the effect of forcing the
      backend to expand !nontemporal memsets back to sequences of store instructions.
      A memset library call would not have exactly the same semantic of a builtin
      !nontemporal memset call. So, SelectionDAG will have to conservatively expand
      it back to a sequence of !nontemporal stores (effectively undoing the merging).
      
      Differential Revision: http://reviews.llvm.org/D13519
      
      llvm-svn: 249820
      99493df2
    • Arnaud A. de Grandmaison's avatar
      [EarlyCSE] Address post commit review for r249523. · 859b2ac0
      Arnaud A. de Grandmaison authored
      llvm-svn: 249814
      859b2ac0
    • Sanjoy Das's avatar
      [RS4GC] Refactoring to make a later change easier, NFCI · 3c520a12
      Sanjoy Das authored
      Summary:
      These non-semantic changes will help make a later change adding
      support for deopt operand bundles more streamlined.
      
      Reviewers: reames, swaroop.sridhar
      
      Subscribers: sanjoy, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13491
      
      llvm-svn: 249779
      3c520a12
    • Sanjoy Das's avatar
      [PlaceSafeopints] Extract out `callsGCLeafFunction`, NFC · c21a05a3
      Sanjoy Das authored
      Summary:
      This will be used in a later change to RewriteStatepointsForGC.
      
      Reviewers: reames, swaroop.sridhar
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13490
      
      llvm-svn: 249777
      c21a05a3
    • Sanjoy Das's avatar
      [RS4GC] Don't copy ADT's unneccessarily, NFCI · 1ede5367
      Sanjoy Das authored
      Summary: Use `const auto &` instead of `auto` in `makeStatepointExplicit`.
      
      Reviewers: reames, swaroop.sridhar
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13454
      
      llvm-svn: 249776
      1ede5367
  6. Oct 08, 2015
  7. Oct 07, 2015
    • Sanjoy Das's avatar
      [RS4GC] Use AssertingVH for RematerializedValueMapTy, NFCI · 40bdd041
      Sanjoy Das authored
      Reviewers: reames, swaroop.sridhar
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13489
      
      llvm-svn: 249620
      40bdd041
    • Sanjoy Das's avatar
      [IndVars] Preserve LCSSA in `eliminateIdentitySCEV` · 0015e5a0
      Sanjoy Das authored
      Summary:
      After r249211, SCEV can see through some LCSSA phis.  Add a
      `replacementPreservesLCSSAForm` check before replacing uses of these phi
      nodes with a simplified use of the induction variable to avoid breaking
      LCSSA.
      
      Fixes 25047.
      
      Depends on D13460.
      
      Reviewers: atrick, hfinkel
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13461
      
      llvm-svn: 249575
      0015e5a0
    • Arnaud A. de Grandmaison's avatar
      [EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads. · a6178a17
      Arnaud A. de Grandmaison authored
      Summary:
      Some target intrinsics can access multiple elements, using the pointer as a
      base address (e.g. AArch64 ld4). When trying to CSE such instructions,
      it must be checked the available value comes from a compatible instruction
      because the pointer is not enough to discriminate whether the value is
      correct.
      
      Reviewers: ssijaric
      
      Subscribers: mcrosier, llvm-commits, aemerson
      
      Differential Revision: http://reviews.llvm.org/D13475
      
      llvm-svn: 249523
      a6178a17
    • Sanjoy Das's avatar
      [RS4GC] Remove an unnecessary assert & related variables · 60bf3db1
      Sanjoy Das authored
      I don't think this assert adds much value, and removing it and related
      variables avoids an "unused variable" warning in release builds.
      
      llvm-svn: 249511
      60bf3db1
    • Sanjoy Das's avatar
      [RS4GC] Cosmetic cleanup, NFC · b40bd1a9
      Sanjoy Das authored
      Summary:
      A series of cosmetic cleanup changes to RewriteStatepointsForGC:
      
        - Rename variables to LLVM style
        - Remove some redundant asserts
        - Remove an unsued `Pass *` parameter
        - Remove unnecessary variables
        - Use C++11 idioms where applicable
        - Pass CallSite by value, not reference
      
      Reviewers: reames, swaroop.sridhar
      
      Subscribers: llvm-commits, sanjoy
      
      Differential Revision: http://reviews.llvm.org/D13370
      
      llvm-svn: 249508
      b40bd1a9
    • Hans Wennborg's avatar
      InstCombine: Fold comparisons between unguessable allocas and other pointers · f1f36517
      Hans Wennborg authored
      This will allow us to optimize code such as:
      
        int f(int *p) {
          int x;
          return p == &x;
        }
      
      as well as:
      
        int *allocate(void);
        int f() {
          int x;
          int *p = allocate();
          return p == &x;
        }
      
      The folding can only be done under certain circumstances. Even though p and &x
      cannot alias, the comparison must still return true if the pointer
      representations are equal. If a user successfully generates a p that's a
      correct guess for &x, comparison should return true even though p is an invalid
      pointer.
      
      This patch argues that if the address of the alloca isn't observable outside the
      function, the function can act as-if the address is impossible to guess from the
      outside. The tricky part is keeping the act consistent: if we fold p == &x to
      false in one place, we must make sure to fold any other comparisons based on
      those pointers similarly. To ensure that, we only fold when &x is involved
      exactly once in comparison instructions.
      
      Differential Revision: http://reviews.llvm.org/D13358
      
      llvm-svn: 249490
      f1f36517
    • Hans Wennborg's avatar
      Fix Clang-tidy modernize-use-nullptr warnings in source directories and... · 083ca9bb
      Hans Wennborg authored
      Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
      
      Patch by Eugene Zelenko!
      
      Differential Revision: http://reviews.llvm.org/D13321
      
      llvm-svn: 249482
      083ca9bb
  8. Oct 06, 2015
    • Sanjoy Das's avatar
      [IndVars] Don't break dominance in `eliminateIdentitySCEV` · 5c8bead4
      Sanjoy Das authored
      Summary:
      After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
      Y are related in the dominator tree, even if X is an operand to Y (I've
      included a toy example in comments, and a real example as a test case).
      
      This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
      don't think this is a problem because `ScalarEvolution` requires it
      anyway.
      
      Fixes PR25051.
      
      Depends on D13459.
      
      Reviewers: atrick, hfinkel
      
      Subscribers: joker.eph, llvm-commits, sanjoy
      
      Differential Revision: http://reviews.llvm.org/D13460
      
      llvm-svn: 249471
      5c8bead4
    • Sanjoy Das's avatar
      [IndVars] Extract out eliminateIdentitySCEV, NFC · 088bb0ea
      Sanjoy Das authored
      Summary:
      Reflow a comment while at it.
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13459
      
      llvm-svn: 249470
      088bb0ea
    • Joseph Tremoulet's avatar
      [WinEH] Recognize CoreCLR personality function · 2afea543
      Joseph Tremoulet authored
      Summary:
       - Add CoreCLR to if/else ladders and switches as appropriate.
       - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
         reflect what it captures.
      
      Reviewers: majnemer, andrew.w.kaylor, rnk
      
      Subscribers: pgavlin, AndyAyers, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13449
      
      llvm-svn: 249455
      2afea543
    • Arnaud A. de Grandmaison's avatar
      [EarlyCSE] Constify ParseMemoryInst methods (NFC). · 6fd488b1
      Arnaud A. de Grandmaison authored
      llvm-svn: 249400
      6fd488b1
    • Andrea Di Biagio's avatar
      [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector... · 40f59e44
      Andrea Di Biagio authored
      [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
      
      If the mask of a select instruction is a ConstantVector, method
      SimplifyDemandedVectorElts iterates over the mask elements to identify which
      values are selected from the select inputs.
      
      Before this patch, method SimplifyDemandedVectorElts always used method
      Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
      that method always returns false when called on a ConstantExpr.
      
      This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
      check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
      avoid calling isNullValue() on it.
      
      Fixes PR24922.
      
      Differential Revision: http://reviews.llvm.org/D13219
      
      llvm-svn: 249390
      40f59e44
  9. Oct 05, 2015
  10. Oct 03, 2015
    • Piotr Padlewski's avatar
      inariant.group handling in GVN · dc9b2cfc
      Piotr Padlewski authored
      The most important part required to make clang
      devirtualization works ( ͡°͜ʖ ͡°).
      The code is able to find non local dependencies, but unfortunatelly
      because the caller can only handle local dependencies, I had to add
      some restrictions to look for dependencies only in the same BB.
      
      http://reviews.llvm.org/D12992
      
      llvm-svn: 249196
      dc9b2cfc
  11. Oct 02, 2015
  12. Oct 01, 2015
    • Arnaud A. de Grandmaison's avatar
      [InstCombine] Remove trivially empty lifetime start/end ranges. · 849f3bf8
      Arnaud A. de Grandmaison authored
      Summary:
      Some passes may open up opportunities for optimizations, leaving empty
      lifetime start/end ranges. For example, with the following code:
      
          void foo(char *, char *);
          void bar(int Size, bool flag) {
            for (int i = 0; i < Size; ++i) {
              char text[1];
              char buff[1];
              if (flag)
                foo(text, buff); // BBFoo
            }
          }
      
      the loop unswitch pass will create 2 versions of the loop, one with
      flag==true, and the other one with flag==false, but always leaving
      the BBFoo basic block, with lifetime ranges covering the scope of the for
      loop. Simplify CFG will then remove BBFoo in the case where flag==false,
      but will leave the lifetime markers.
      
      This patch teaches InstCombine to remove trivially empty lifetime marker
      ranges, that is ranges ending right after they were started (ignoring
      debug info or other lifetime markers in the range).
      
      This fixes PR24598: excessive compile time after r234581.
      
      Reviewers: reames, chandlerc
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13305
      
      llvm-svn: 249018
      849f3bf8
    • Jingyue Wu's avatar
      [NaryReassociate] SeenExprs records WeakVH · df1a1b11
      Jingyue Wu authored
      Summary:
      The instructions SeenExprs records may be deleted during rewriting.
      FindClosestMatchingDominator should ignore these deleted instructions.
      
      Fixes PR24301.
      
      Reviewers: grosser
      
      Subscribers: grosser, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13315
      
      llvm-svn: 248983
      df1a1b11
    • Dehao Chen's avatar
      7c41dd64
  13. Sep 30, 2015
    • Michael Zolotukhin's avatar
      [SLP] Don't vectorize loads of non-packed types (like i1, i2). · fc783e91
      Michael Zolotukhin authored
      Summary:
      Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
      i8-sized loads and thus will access 4 consecutive bytes in memory. If we
      vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
      memory. Hence, we should prohibit vectorization in such cases.
      
      PS: Initial patch was proposed by Arnold.
      
      Reviewers: aschwaighofer, nadav, hfinkel
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D13277
      
      llvm-svn: 248943
      fc783e91
    • Evgeniy Stepanov's avatar
      Fix debug info with SafeStack. · f608111d
      Evgeniy Stepanov authored
      llvm-svn: 248933
      f608111d
    • Fiona Glaser's avatar
      DeadCodeElimination: rewrite to be faster · b0c6d917
      Fiona Glaser authored
      Same strategy as simplifyInstructionsInBlock. ~1/3 less time
      on my test suite. This pass doesn't have many in-tree users,
      but getting rid of an O(N^2) worst case and making it cleaner
      should at least make it a viable alternative to ADCE, since
      it's now consistently somewhat faster.
      
      llvm-svn: 248927
      b0c6d917
Loading