Skip to content
  1. Apr 16, 2013
  2. Apr 15, 2013
  3. Apr 14, 2013
  4. Apr 13, 2013
    • Benjamin Kramer's avatar
      GlobalDCE: Fix an oversight in my last commit that could lead to crashes. · adc1727c
      Benjamin Kramer authored
      There is a Constant with non-constant operands: blockaddress.
      
      llvm-svn: 179460
      adc1727c
    • Benjamin Kramer's avatar
      Fix a scalability issue with complex ConstantExprs. · 89ca4bc6
      Benjamin Kramer authored
      This is basically the same fix in three different places. We use a set to avoid
      walking the whole tree of a big ConstantExprs multiple times.
      
      For example: (select cmp, (add big_expr 1), (add big_expr 2))
      We don't want to visit big_expr twice here, it may consist of thousands of
      nodes.
      
      The testcase exercises this by creating an insanely large ConstantExprs out of
      a loop. It's questionable if the optimizer should ever create those, but this
      can be triggered with real C code. Fixes PR15714.
      
      llvm-svn: 179458
      89ca4bc6
  5. Apr 12, 2013
  6. Apr 11, 2013
    • David Majnemer's avatar
      Optimize icmp involving addition better · b81cd63c
      David Majnemer authored
      Allows LLVM to optimize sequences like the following:
      
      %add = add nsw i32 %x, 1
      %cmp = icmp sgt i32 %add, %y
      
      into:
      
      %cmp = icmp sge i32 %x, %y
      
      as well as:
      
      %add1 = add nsw i32 %x, 20
      %add2 = add nsw i32 %y, 57
      %cmp = icmp sge i32 %add1, %add2
      
      into:
      
      %add = add nsw i32 %y, 37
      %cmp = icmp sle i32 %cmp, %x
      
      llvm-svn: 179316
      b81cd63c
    • Benjamin Kramer's avatar
      Fix for wrong instcombine on vector insert/extract · a95f8749
      Benjamin Kramer authored
      When trying to collapse sequences of insertelement/extractelement
      instructions into single shuffle instructions, there is one specific
      case where the Instruction Combiner wrongly updates the resulting
      Mask of shuffle indexes.
      
      The problem is in function CollectShuffleElments.
      
      If we have a sequence of insert/extract element instructions
      like the one below:
      
        %tmp1 = extractelement <4 x float> %LHS, i32 0
        %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
        %tmp3 = extractelement <4 x float> %RHS, i32 2
        %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3
      
      Where:
        . %RHS will have a mask of [4,5,6,7]
        . %LHS will have a mask of [0,1,2,3]
      
      The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
      instead of [4,0,6,7].
      When analyzing %tmp2 in order to compute the Mask for the
      resulting shuffle instruction, the algorithm forgets to update
      the mask index at position 1 with the index associated to the
      element extracted from %LHS by instruction %tmp1.
      
      Patch by Andrea DiBiagio!
      
      llvm-svn: 179291
      a95f8749
    • Benjamin Kramer's avatar
      Add missing colons to check lines. · b50682e1
      Benjamin Kramer authored
      llvm-svn: 179277
      b50682e1
    • Benjamin Kramer's avatar
      FileCheckize a bunch of tests. · 3960c1cd
      Benjamin Kramer authored
      llvm-svn: 179276
      3960c1cd
  7. Apr 10, 2013
  8. Apr 09, 2013
  9. Apr 07, 2013
    • Chandler Carruth's avatar
      Fix PR15674 (and PR15603): a SROA think-o. · 0e8a52d1
      Chandler Carruth authored
      The fix for PR14972 in r177055 introduced a real think-o in the *store*
      side, likely because I was much more focused on the load side. While we
      can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
      widen a value to be stored, as that changes the width of memory access!
      Lock down the code path in the store rewriting which would do this to
      only handle the intended circumstance.
      
      All of the existing tests continue to pass, and I've added a test from
      the PR.
      
      llvm-svn: 178974
      0e8a52d1
  10. Apr 06, 2013
    • Michael Gottesman's avatar
      An objc_retain can serve as a use for a different pointer. · 31ba23aa
      Michael Gottesman authored
      This is the counterpart to commit r160637, except it performs the action
      in the bottomup portion of the data flow analysis.
      
      llvm-svn: 178922
      31ba23aa
    • Michael Gottesman's avatar
      Properly model precise lifetime when given an incomplete dataflow sequence. · 1d8d2577
      Michael Gottesman authored
      The normal dataflow sequence in the ARC optimizer consists of the following
      states:
      
          Retain -> CanRelease -> Use -> Release
      
      The optimizer before this patch stored the uses that determine the lifetime of
      the retainable object pointer when it bottom up hits a retain or when top down
      it hits a release. This is correct for an imprecise lifetime scenario since what
      we are trying to do is remove retains/releases while making sure that no
      ``CanRelease'' (which is usually a call) deallocates the given pointer before we
      get to the ``Use'' (since that would cause a segfault).
      
      If we are considering the precise lifetime scenario though, this is not
      correct. In such a situation, we *DO* care about the previous sequence, but
      additionally, we wish to track the uses resulting from the following incomplete
      sequences:
      
        Retain -> CanRelease -> Release   (TopDown)
        Retain <- Use <- Release          (BottomUp)
      
      *NOTE* This patch looks large but the most of it consists of updating
      test cases. Additionally this fix exposed an additional bug. I removed
      the test case that expressed said bug and will recommit it with the fix
      in a little bit.
      
      llvm-svn: 178921
      1d8d2577
  11. Apr 05, 2013
    • Shuxin Yang's avatar
      Disable the optimization about promoting vector-element-access with symbolic index. · 95adf525
      Shuxin Yang authored
      This optimization is unstable at this moment; it 
        1) block us on a very important application
        2) PR15200
        3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
           (the CHECK command compare the output against wrong result)
      
         I personally believe this optimization should not have any impact on the
      autovectorized code, as auto-vectorizer is supposed to put gather/scatter
      in a "right" way.  Although in theory downstream optimizaters might reveal 
      some gather/scatter optimization opportunities, the chance is quite slim.
      
         For the hand-crafted vectorizing code, in term of redundancy elimination,
      load-CSE, copy-propagation and DSE can collectively achieve the same result,
      but in much simpler way. On the other hand, these optimizers are able to 
      improve the code in a incremental way; in contrast, SROA is sort of all-or-none
      approach. However, SROA might slighly win in stack size, as it tries to figure 
      out a stretch of memory tightenly cover the area accessed by the dynamic index.
      
       rdar://13174884
       PR15200
      
      llvm-svn: 178912
      95adf525
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Pass OperandValueKind information to the cost model · df6f67ed
      Arnold Schwaighofer authored
      Pass down the fact that an operand is going to be a vector of constants.
      
      This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
      back. It had degraded to scalar performance due to my pervious shift cost change
      that made all shifts expensive on x86.
      
      radar://13576547
      
      llvm-svn: 178809
      df6f67ed
  12. Apr 03, 2013
    • Michael Gottesman's avatar
      Remove an optimization where we were changing an objc_autorelease into an... · b8c88365
      Michael Gottesman authored
      Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.
      
      The semantics of ARC implies that a pointer passed into an objc_autorelease
      must live until some point (potentially down the stack) where an
      autorelease pool is popped. On the other hand, an
      objc_autoreleaseReturnValue just signifies that the object must live
      until the end of the given function at least.
      
      Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
      terms of the semantics of ARC* implying that performing the given
      strength reduction without any knowledge of how this relates to
      the autorelease pool pop that is further up the stack violates the
      semantics of ARC.
      
      *Even though objc_autoreleaseReturnValue if you know that no RV
      optimization will occur is more computationally expensive.
      
      llvm-svn: 178612
      b8c88365
  13. Apr 02, 2013
    • Bill Wendling's avatar
      Use a worklist to avoid a sneaky iterator invalidation. · 88d06c3b
      Bill Wendling authored
      The iterator could be invalidated when it's recursively deleting a whole bunch
      of constant expressions in a constant initializer.
      
      Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
      run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
      `.ll' file through `llvm-as' before feeding it to `opt'.
      
      PR15440
      
      llvm-svn: 178531
      88d06c3b
  14. Apr 01, 2013
  15. Mar 30, 2013
    • Shuxin Yang's avatar
      Implement XOR reassociation. It is based on following rules: · 7b0c94e2
      Shuxin Yang authored
        rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
           only useful when c1=c2
        rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
        rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
        rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2
      
       It reduces an application's size (in terms of # of instructions) by 8.9%.
       Reviwed by Pete Cooper. Thanks a lot!
      
       rdar://13212115  
      
      llvm-svn: 178409
      7b0c94e2
  16. Mar 29, 2013
    • Michael Gottesman's avatar
      Updated test0 of retain-not-declared.ll to reflect the fact that... · 94128300
      Michael Gottesman authored
      Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract.
      
      Specifically, objc-arc-expand will make sure that the
      objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret
      will all have %call as an argument.
      
      llvm-svn: 178382
      94128300
Loading