Skip to content
  1. Jun 08, 2013
    • Shuxin Yang's avatar
      Fix an assertion in MemCpyOpt pass. · bd254f26
      Shuxin Yang authored
        The MemCpyOpt pass is capable of optimizing:
            callee(&S); copy N bytes from S to D.
          into:
            callee(&D);
      subject to some legality constraints. 
      
        Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
      while D is an opaque-typed, 'sret' formal argument of function being compiled.
      i.e. the signature of the func being compiled is something like this:
        T caller(...,%opaque* noalias nocapture sret %D, ...)
      
        The fix is that when come across such situation, instead of calling some
      utility functions to get the size of D's type (which will crash), we simply
      assume D has at least N bytes as implified by the copy-instruction.
      
      rdar://14073661 
      
      llvm-svn: 183584
      bd254f26
  2. Jun 04, 2013
    • David Majnemer's avatar
      IndVarSimplify: check if loop invariant expansion can trap · 29130c5e
      David Majnemer authored
      IndVarSimplify is willing to move divide instructions outside of their
      loop bodies if they are invariant of the loop.  However, it may not be
      safe to expand them if we do not know if they can trap.
      
      Instead, check to see if it is not safe to expand the instruction and
      skip the expansion.
      
      This fixes PR16041.
      
      Testcase by Rafael Ávila de Espíndola.
      
      llvm-svn: 183239
      29130c5e
  3. May 31, 2013
    • Quentin Colombet's avatar
      Loop Strength Reduce: Scaling factor cost. · bf490d4a
      Quentin Colombet authored
      Account for the cost of scaling factor in Loop Strength Reduce when rating the
      formulae. This uses a target hook.
      
      The default implementation of the hook is: if the addressing mode is legal, the
      scaling factor is free.
      
      <rdar://problem/13806271>
      
      llvm-svn: 183045
      bf490d4a
    • Quentin Colombet's avatar
      Modify how the formulae are rated in Loop Strength Reduce. · 8aa7abe2
      Quentin Colombet authored
      Namely, check if the target allows to fold more that one register in the
      addressing mode and if yes, adjust the cost accordingly.
      
      Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
      to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
      needs a temporary register for the computation, whereas it was correctly
      estimated for reg1 + scale * reg2.
      
      <rdar://problem/13973908>
      
      llvm-svn: 183021
      8aa7abe2
  4. May 25, 2013
  5. May 09, 2013
  6. May 08, 2013
  7. May 06, 2013
    • Andrew Trick's avatar
      Rotate multi-exit loops even if the latch was simplified. · 9c72b071
      Andrew Trick authored
      Test case by Michele Scandale!
      
      Fixes PR10293: Load not hoisted out of loop with multiple exits.
      
      There are few regressions with this patch, now tracked by
      rdar:13817079, and a roughly equal number of improvements. The
      regressions are almost certainly back luck because LoopRotate has very
      little idea of whether rotation is profitable. Doing better requires a
      more comprehensive solution.
      
      This checkin is a quick fix that lacks generality (PR10293 has
      a counter-example). But it trivially fixes the case in PR10293 without
      interfering with other cases, and it does satify the criteria that
      LoopRotate is a loop canonicalization pass that should avoid
      heuristics and special cases.
      
      I can think of two approaches that would probably be better in
      the long run. Ultimately they may both make sense.
      
      (1) LoopRotate should check that the current header would make a good
      loop guard, and that the loop does not already has a sufficient
      guard. The artifical SimplifiedLoopLatch check would be unnecessary,
      and the design would be more general and canonical. Two difficulties:
      
      - We need a strong guarantee that we won't endlessly rotate, so the
        analysis would need to be precise in order to avoid the
        SimplifiedLoopLatch precondition.
      
      - Analysis like this are usually based on SCEV, which we don't want to
        rely on.
      
      (2) Rotate on-demand in late loop passes. This could even be done by
      shoving the loop back on the queue after the optimization that needs
      it. This could work well when we find LICM opportunities in
      multi-branch loops. This requires some work, and it doesn't really
      solve the problem of SCEV wanting a loop guard before the analysis.
      
      llvm-svn: 181230
      9c72b071
  8. May 03, 2013
    • Shuxin Yang's avatar
      Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper... · 637b9beb
      Shuxin Yang authored
      Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change. 
      
      This function consists of following steps:
         1. Collect dependent memory accesses.
         2. Analyze availability.
         3. Perform fully redundancy elimination, or 
         4. Perform PRE, depending on the availability
      
       Step 2, 3 and 4 are now moved to three helper routines.
      
      llvm-svn: 181047
      637b9beb
  9. May 02, 2013
    • Shuxin Yang's avatar
      [GV] Remove dead code which is really difficult to decipher. · af2c3ddf
      Shuxin Yang authored
      Actually it took me couple of hours trying to make sense of them and
      only to find they are dead code.  I guess the original author used
      "allSingleSucc" to indicate if there are any critial edge emanating
      from some blocks, and tried to perform code motion (actually speculation)
      in the presence of these critical edges; but later on he/she changed mind
      and decided to perform edge-splitting first.
      
      llvm-svn: 180951
      af2c3ddf
  10. May 01, 2013
    • Filip Pizlo's avatar
      This patch breaks up Wrap.h so that it does not have to include all of · dec20e43
      Filip Pizlo authored
      the things, and renames it to CBindingWrapping.h.  I also moved 
      CBindingWrapping.h into Support/.
      
      This new file just contains the macros for defining different wrap/unwrap 
      methods.
      
      The calls to those macros, as well as any custom wrap/unwrap definitions 
      (like for array of Values for example), are put into corresponding C++ 
      headers.
      
      Doing this required some #include surgery, since some .cpp files relied 
      on the fact that including Wrap.h implicitly caused the inclusion of a 
      bunch of other things.
      
      This also now means that the C++ headers will include their corresponding 
      C API headers; for example Value.h must include llvm-c/Core.h.  I think 
      this is harmless, since the C API headers contain just external function 
      declarations and some C types, so I don't believe there should be any 
      nasty dependency issues here.
      
      llvm-svn: 180881
      dec20e43
    • Nadav Rotem's avatar
      SROA: Generate selects instead of shuffles when blending values because this... · 1e211913
      Nadav Rotem authored
      SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.
      Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.
      
      llvm-svn: 180875
      1e211913
  11. Apr 27, 2013
  12. Apr 23, 2013
  13. Apr 22, 2013
  14. Apr 21, 2013
  15. Apr 18, 2013
  16. Apr 15, 2013
  17. Apr 09, 2013
    • Shuxin Yang's avatar
      Redo the fix Benjamin Kramer committed in r178793 about iterator invalidation in Reassociate. · 331f01dc
      Shuxin Yang authored
      I brazenly think this change is slightly simpler than r178793 because: 
        - no "state" in functor
        - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" 
      
        While I can reproduce the probelm in Valgrind, it is rather difficult to come up
      a standalone testing case. The reason is that when an iterator is invalidated,
      the stale invalidated elements are not yet clobbered by nonsense data, so the
      optimizer can still proceed successfully. 
      
        Thank Benjamin for fixing this bug and generously providing the test case.
      
      llvm-svn: 179062
      331f01dc
  18. Apr 07, 2013
    • Chandler Carruth's avatar
      Fix PR15674 (and PR15603): a SROA think-o. · 0e8a52d1
      Chandler Carruth authored
      The fix for PR14972 in r177055 introduced a real think-o in the *store*
      side, likely because I was much more focused on the load side. While we
      can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
      widen a value to be stored, as that changes the width of memory access!
      Lock down the code path in the store rewriting which would do this to
      only handle the intended circumstance.
      
      All of the existing tests continue to pass, and I've added a test from
      the PR.
      
      llvm-svn: 178974
      0e8a52d1
  19. Apr 05, 2013
    • Shuxin Yang's avatar
      Disable the optimization about promoting vector-element-access with symbolic index. · 95adf525
      Shuxin Yang authored
      This optimization is unstable at this moment; it 
        1) block us on a very important application
        2) PR15200
        3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
           (the CHECK command compare the output against wrong result)
      
         I personally believe this optimization should not have any impact on the
      autovectorized code, as auto-vectorizer is supposed to put gather/scatter
      in a "right" way.  Although in theory downstream optimizaters might reveal 
      some gather/scatter optimization opportunities, the chance is quite slim.
      
         For the hand-crafted vectorizing code, in term of redundancy elimination,
      load-CSE, copy-propagation and DSE can collectively achieve the same result,
      but in much simpler way. On the other hand, these optimizers are able to 
      improve the code in a incremental way; in contrast, SROA is sort of all-or-none
      approach. However, SROA might slighly win in stack size, as it tries to figure 
      out a stretch of memory tightenly cover the area accessed by the dynamic index.
      
       rdar://13174884
       PR15200
      
      llvm-svn: 178912
      95adf525
  20. Apr 04, 2013
    • Benjamin Kramer's avatar
      Reassociate: Avoid iterator invalidation. · dd67654a
      Benjamin Kramer authored
      OpndPtrs stored pointers into the Opnd vector that became invalid when the
      vector grows. Store indices instead. Sadly I only have a large testcase that
      only triggers under valgrind, so I didn't include it.
      
      llvm-svn: 178793
      dd67654a
  21. Apr 01, 2013
  22. Mar 30, 2013
    • Shuxin Yang's avatar
      Implement XOR reassociation. It is based on following rules: · 7b0c94e2
      Shuxin Yang authored
        rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
           only useful when c1=c2
        rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
        rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
        rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2
      
       It reduces an application's size (in terms of # of instructions) by 8.9%.
       Reviwed by Pete Cooper. Thanks a lot!
      
       rdar://13212115  
      
      llvm-svn: 178409
      7b0c94e2
  23. Mar 24, 2013
  24. Mar 21, 2013
    • Chandler Carruth's avatar
      [SROA] Prefix names using a custom IRBuilder inserter. · 34f0c7fc
      Chandler Carruth authored
      The key part of this is ensuring that name prefixes remain in a Twine
      form until we get to a point where we can nuke them under NDEBUG. This
      is tricky using the old APIs as they played fast and loose with Twine,
      which is prone to serious error. The inserter is much cleaner as it is
      actually in the call stack leading to the setName call, and so has
      a good opportunity to prepend the prefix.
      
      This matters more than you might imagine because most runs over an
      alloca find a single partition, and rewrite 3 or 4 instructions
      referring to it. As a consequence doing this lazily and exclusively with
      Twine allows the optimizer to delete more of it and shaves another 2% to
      3% off of the release build's SROA run time for PR15412. I also think
      the APIs are cleaner, and the use of Twine is more reliable, so
      I consider it a win-win despite the churn required to reach this state.
      
      llvm-svn: 177631
      34f0c7fc
    • Meador Inge's avatar
      simplify-libcalls: Removed unused variable · cf691565
      Meador Inge authored
      The 'Modified' variable should have been removed from SimplifyLibCalls
      in r177619, but was missed.  This commit removes it.
      
      llvm-svn: 177622
      cf691565
    • Meador Inge's avatar
      Move library call prototype attribute inference to functionattrs · 6b6a161c
      Meador Inge authored
      The simplify-libcalls pass implemented a doInitialization hook to infer
      function prototype attributes for well-known functions.  Given that the
      simplify-libcalls pass is going away *and* that the functionattrs pass
      is already in place to deduce function attributes, I am moving this logic
      to the functionattrs pass.  This approach was discussed during patch
      review:
      http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.
      
      llvm-svn: 177619
      6b6a161c
  25. Mar 20, 2013
  26. Mar 19, 2013
  27. Mar 18, 2013
Loading