Skip to content
  1. Apr 08, 2012
  2. Apr 07, 2012
    • Craig Topper's avatar
      Move vinsertf128 patterns near the instruction definitions. Add... · aa9aab5a
      Craig Topper authored
      Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns.
      
      llvm-svn: 154268
      aa9aab5a
    • Craig Topper's avatar
      Remove 'else' after 'if' that ends in return. · e09d1c5c
      Craig Topper authored
      llvm-svn: 154267
      e09d1c5c
    • Nadav Rotem's avatar
      1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new · 71d07ae5
      Nadav Rotem authored
         shuffle node because it could introduce new shuffle nodes that were not
         supported efficiently by the target.
      
      2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
         second shuffle reverses the transformation of the first shuffle.
      
      llvm-svn: 154266
      71d07ae5
    • Duncan Sands's avatar
      Convert floating point division by a constant into multiplication by the · 5f8397a9
      Duncan Sands authored
      reciprocal if converting to the reciprocal is exact.  Do it even if inexact
      if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
      benchmarks.
      
      llvm-svn: 154265
      5f8397a9
    • Chandler Carruth's avatar
      Perform partial SROA on the helper hashing structure. I really wish the · 75a1cf32
      Chandler Carruth authored
      optimizers could do this for us, but expecting partial SROA of classes
      with template methods through cloning is probably expecting too much
      heroics. With this change, the begin/end pointer pairs which indicate
      the status of each loop iteration are actually passed directly into each
      layer of the combine_data calls, and the inliner has a chance to see
      when most of the combine_data function could be deleted by inlining.
      Similarly for 'length'.
      
      We have to be careful to limit the places where in/out reference
      parameters are used as those will also defeat the inliner / optimizers
      from properly propagating constants.
      
      With this change, LLVM is able to fully inline and unroll the hash
      computation of small sets of values, such as two or three pointers.
      These now decompose into essentially straight-line code with no loops or
      function calls.
      
      There is still one code quality problem to be solved with the hashing --
      LLVM is failing to nuke the alloca. It removes all loads from the
      alloca, leaving only lifetime intrinsics and dead(!!) stores to the
      alloca. =/ Very unfortunate.
      
      llvm-svn: 154264
      75a1cf32
    • Chandler Carruth's avatar
      Fix ValueTracking to conclude that debug intrinsics are safe to · 28192c93
      Chandler Carruth authored
      speculate. Without this, loop rotate (among many other places) would
      suddenly stop working in the presence of debug info. I found this
      looking at loop rotate, and have augmented its tests with a reduction
      out of a very hot loop in yacr2 where failing to do this rotation costs
      sometimes more than 10% in runtime performance, perturbing numerous
      downstream optimizations.
      
      This should have no impact on performance without debug info, but the
      change in performance when debug info is enabled can be extreme. As
      a consequence (and this how I got to this yak) any profiling of
      performance problems should be treated with deep suspicion -- they may
      have been wildly innacurate of debug info was enabled for profiling. =/
      Just a heads up.
      
      llvm-svn: 154263
      28192c93
    • Benjamin Kramer's avatar
      SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. · e1f4ca1b
      Benjamin Kramer authored
      Found by inspection.
      
      llvm-svn: 154262
      e1f4ca1b
    • Bob Wilson's avatar
      Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543> · 6f9be7e2
      Bob Wilson authored
      The tLDRr instruction with the last register operand set to the zero register
      prints in assembly as if no register was specified, and the assembler encodes
      it as a tLDRi instruction with a zero immediate.  With the integrated assembler,
      that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which
      is broken.  Emit the instruction as tLDRi with a zero immediate.  I don't
      know if there's a good way to write a testcase for this.  Suggestions welcome.
      
      Opportunities for follow-up work:
      1) The asm printer should complain if a non-optional register operand is set
         to the zero register, instead of silently dropping it.
      2) The integrated assembler should complain in the same situation, instead of
         silently emitting the operand as "r0".
      
      llvm-svn: 154261
      6f9be7e2
    • Hongbin Zheng's avatar
      Rewritten expandRegion to clarify the intention and improve · ed986ab6
      Hongbin Zheng authored
        performance, patched by Johannes Doerfert <johannes@jdoerfert.de>.
      
      llvm-svn: 154260
      ed986ab6
    • Hongbin Zheng's avatar
      ScopDetection: Add some comments to function "expandRegion". · 3a2d6035
      Hongbin Zheng authored
      llvm-svn: 154259
      3a2d6035
    • Hongbin Zheng's avatar
      Speed up SCoP detection time by checking the exit of the region first, · 94868e6c
      Hongbin Zheng authored
        patched by Johannes Doerfert <johannes@jdoerfert.de>.
      
      llvm-svn: 154258
      94868e6c
    • Benjamin Kramer's avatar
Loading