Skip to content
  1. May 11, 2013
  2. May 10, 2013
  3. May 09, 2013
  4. May 08, 2013
  5. May 07, 2013
  6. May 06, 2013
    • David Majnemer's avatar
      InstCombine: (X ^ signbit) + C -> X + (signbit ^ C) · 70f286d9
      David Majnemer authored
      llvm-svn: 181249
      70f286d9
    • Andrew Trick's avatar
      Rotate multi-exit loops even if the latch was simplified. · 9c72b071
      Andrew Trick authored
      Test case by Michele Scandale!
      
      Fixes PR10293: Load not hoisted out of loop with multiple exits.
      
      There are few regressions with this patch, now tracked by
      rdar:13817079, and a roughly equal number of improvements. The
      regressions are almost certainly back luck because LoopRotate has very
      little idea of whether rotation is profitable. Doing better requires a
      more comprehensive solution.
      
      This checkin is a quick fix that lacks generality (PR10293 has
      a counter-example). But it trivially fixes the case in PR10293 without
      interfering with other cases, and it does satify the criteria that
      LoopRotate is a loop canonicalization pass that should avoid
      heuristics and special cases.
      
      I can think of two approaches that would probably be better in
      the long run. Ultimately they may both make sense.
      
      (1) LoopRotate should check that the current header would make a good
      loop guard, and that the loop does not already has a sufficient
      guard. The artifical SimplifiedLoopLatch check would be unnecessary,
      and the design would be more general and canonical. Two difficulties:
      
      - We need a strong guarantee that we won't endlessly rotate, so the
        analysis would need to be precise in order to avoid the
        SimplifiedLoopLatch precondition.
      
      - Analysis like this are usually based on SCEV, which we don't want to
        rely on.
      
      (2) Rotate on-demand in late loop passes. This could even be done by
      shoving the loop back on the queue after the optimization that needs
      it. This could work well when we find LICM opportunities in
      multi-branch loops. This requires some work, and it doesn't really
      solve the problem of SCEV wanting a loop guard before the analysis.
      
      llvm-svn: 181230
      9c72b071
    • Jean-Luc Duprat's avatar
      Provide InstCombines for the following 3 cases: · 3e4fc3ef
      Jean-Luc Duprat authored
      A * (1 - (uitofp i1 C)) -> select C, 0, A
      B * (uitofp i1 C) -> select C, B, 0
      select C, 0, A + select C, B, 0 -> select C, B, A
      
      These come up in code that has been hand-optimized from a select to a linear blend, 
      on platforms where that may have mattered. We want to undo such changes 
      with the following transform:
      A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B
      
      llvm-svn: 181216
      3e4fc3ef
    • Nadav Rotem's avatar
      Update the comment to mention that we use TTI. · 632b25b7
      Nadav Rotem authored
      llvm-svn: 181178
      632b25b7
    • Nadav Rotem's avatar
      Revert r164763 because it introduces new shuffles. · c70ef4e9
      Nadav Rotem authored
      Thanks Nick Lewycky for pointing this out.
      
      llvm-svn: 181177
      c70ef4e9
    • Rafael Espindola's avatar
      Fix const merging when an alias of a const is llvm.used. · c229a4ff
      Rafael Espindola authored
      We used to disable constant merging not only if a constant is llvm.used, but
      also if an alias of a constant is llvm.used. This change fixes that.
      
      llvm-svn: 181175
      c229a4ff
  7. May 05, 2013
  8. May 04, 2013
  9. May 03, 2013
    • Shuxin Yang's avatar
      Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper... · 637b9beb
      Shuxin Yang authored
      Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change. 
      
      This function consists of following steps:
         1. Collect dependent memory accesses.
         2. Analyze availability.
         3. Perform fully redundancy elimination, or 
         4. Perform PRE, depending on the availability
      
       Step 2, 3 and 4 are now moved to three helper routines.
      
      llvm-svn: 181047
      637b9beb
    • Nadav Rotem's avatar
      LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values. · 4ce060b3
      Nadav Rotem authored
      By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.
      
      We can now vectorize this loop:
      
      int foo(int *A, int *B, int n) {
        for (int i=0; i < n; i++) {
          int x = 9;
          if (A[i] > B[i]) {
            if (A[i] > 19) {
              x = 3;
            } else if (B[i] < 4 ) {
              x = 4;
            } else {
              x = 5;
            }
          }
          A[i] = x;
        }
      }
      
      llvm-svn: 181037
      4ce060b3
  10. May 02, 2013
    • Shuxin Yang's avatar
      [GV] Remove dead code which is really difficult to decipher. · af2c3ddf
      Shuxin Yang authored
      Actually it took me couple of hours trying to make sense of them and
      only to find they are dead code.  I guess the original author used
      "allSingleSucc" to indicate if there are any critial edge emanating
      from some blocks, and tried to perform code motion (actually speculation)
      in the presence of these critical edges; but later on he/she changed mind
      and decided to perform edge-splitting first.
      
      llvm-svn: 180951
      af2c3ddf
  11. May 01, 2013
    • Filip Pizlo's avatar
      This patch breaks up Wrap.h so that it does not have to include all of · dec20e43
      Filip Pizlo authored
      the things, and renames it to CBindingWrapping.h.  I also moved 
      CBindingWrapping.h into Support/.
      
      This new file just contains the macros for defining different wrap/unwrap 
      methods.
      
      The calls to those macros, as well as any custom wrap/unwrap definitions 
      (like for array of Values for example), are put into corresponding C++ 
      headers.
      
      Doing this required some #include surgery, since some .cpp files relied 
      on the fact that including Wrap.h implicitly caused the inclusion of a 
      bunch of other things.
      
      This also now means that the C++ headers will include their corresponding 
      C API headers; for example Value.h must include llvm-c/Core.h.  I think 
      this is harmless, since the C API headers contain just external function 
      declarations and some C types, so I don't believe there should be any 
      nasty dependency issues here.
      
      llvm-svn: 180881
      dec20e43
    • Nadav Rotem's avatar
      SROA: Generate selects instead of shuffles when blending values because this... · 1e211913
      Nadav Rotem authored
      SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.
      Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.
      
      llvm-svn: 180875
      1e211913
    • Jim Grosbach's avatar
      Revert "InstCombine: Fold more shuffles of shuffles." · d11584a7
      Jim Grosbach authored
      This reverts commit r180802
      
      There's ongoing discussion about whether this is the right place to make
      this transformation. Reverting for now while we figure it out.
      
      llvm-svn: 180834
      d11584a7
    • Richard Trieu's avatar
      Fix a use after free. RI is freed before the call to getDebugLoc(). To · 624c2ebc
      Richard Trieu authored
      prevent this, capture the location before RI is freed.
      
      llvm-svn: 180824
      624c2ebc
  12. Apr 30, 2013
  13. Apr 29, 2013
    • Arnold Schwaighofer's avatar
      SimplifyCFG: If convert single conditional stores · 474df6d3
      Arnold Schwaighofer authored
      This resurrects r179957, but adds code that makes sure we don't touch
      atomic/volatile stores:
      
      This transformation will transform a conditional store with a preceeding
      uncondtional store to the same location:
      
       a[i] =
       may-alias with a[i] load
       if (cond)
         a[i] = Y
      
      into an unconditional store.
      
       a[i] = X
       may-alias with a[i] load
       tmp = cond ? Y : X;
       a[i] = tmp
      
      We assume that on average the cost of a mispredicted branch is going to be
      higher than the cost of a second store to the same location, and that the
      secondary benefits of creating a bigger basic block for other optimizations to
      work on outway the potential case where the branch would be correctly predicted
      and the cost of the executing the second store would be noticably reflected in
      performance.
      
      hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
      this change we are on par with gcc's performance (gcc also performs this
      transformation). There was a 1.2 % performance improvement on a ARM swift chip.
      Other tests in the test-suite+external seem to be mostly uninfluenced in my
      experiments:
      This optimization was triggered on 41 tests such that the executable was
      different before/after the patch. Only 1 out of the 40 tests (dealII) was
      reproducable below 100% (by about .4%). Given that hmmer benefits so much I
      believe this to be a fair trade off.
      
      llvm-svn: 180731
      474df6d3
    • Michael Gottesman's avatar
Loading