Skip to content
  1. Jun 15, 2012
  2. Jun 14, 2012
  3. Jun 13, 2012
  4. Jun 12, 2012
    • Duncan Sands's avatar
      Use std::map rather than SmallMap because SmallMap assumes that the value has · 67cd5919
      Duncan Sands authored
      POD type, causing memory corruption when mapping to APInts with bitwidth > 64.
      Merge another crash testcase into crash.ll while there.
      
      llvm-svn: 158369
      67cd5919
    • Chad Rosier's avatar
      [arm-fast-isel] Add support for -arm-long-calls. · c6916f88
      Chad Rosier authored
      Patch by Jush Lu <jush.msn@gmail.com>.
      
      llvm-svn: 158368
      c6916f88
    • Duncan Sands's avatar
      Now that Reassociate's LinearizeExprTree can look through arbitrary expression · d7aeefeb
      Duncan Sands authored
      topologies, it is quite possible for a leaf node to have huge multiplicity, for
      example: x0 = x*x, x1 = x0*x0, x2 = x1*x1, ... rapidly gives a value which is x
      raised to a vast power (the multiplicity, or weight, of x).  This patch fixes
      the computation of weights by correctly computing them no matter how big they
      are, rather than just overflowing and getting a wrong value.  It turns out that
      the weight for a value never needs more bits to represent than the value itself,
      so it is enough to represent weights as APInts of the same bitwidth and do the
      right overflow-avoiding dance steps when computing weights.  As a side-effect it
      reduces the number of multiplies needed in some cases of large powers.  While
      there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree
      static, pushing the rank computation out into users.  This is progress towards
      fixing PR13021.
      
      llvm-svn: 158358
      d7aeefeb
  5. Jun 11, 2012
  6. Jun 10, 2012
    • Benjamin Kramer's avatar
      InstCombine: Turn (zext A) == (B & (1<<X)-1) into A == (trunc B), narrowing the compare. · 8b8a7697
      Benjamin Kramer authored
      This saves a cast, and zext is more expensive on platforms with subreg support
      than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
      On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
      same performance now when not inlining either function.
      
      stupid_memchr: 323.0us
      bsd_memchr: 321.0us
      memchr: 479.0us
      
      where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
      inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
      I haven't fully understood the issue yet, something is grossly mangling the
      loop after inlining.
      
      llvm-svn: 158297
      8b8a7697
    • Hal Finkel's avatar
      Enable ILP scheduling for all nodes by default on PPC. · 4e9f1a85
      Hal Finkel authored
      Over the entire test-suite, this has an insignificantly negative average
      performance impact, but reduces some of the worst slowdowns from the
      anti-dep. change (r158294).
      
      Largest speedups:
      SingleSource/Benchmarks/Stanford/Quicksort - 28%
      SingleSource/Benchmarks/Stanford/Towers - 24%
      SingleSource/Benchmarks/Shootout-C++/matrix - 23%
      MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
      MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
      (matrix and automotive-bitcount were both in the top-5 slowdown list from the
      anti-dep. change)
      
      Largest slowdowns:
      MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
      MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
      MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
      SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
      MultiSource/Applications/d/make_dparser - 16%
      
      llvm-svn: 158296
      4e9f1a85
    • Nadav Rotem's avatar
      Add AutoUpgrade support for the SSE4 ptest intrinsics. · 17ee58a7
      Nadav Rotem authored
      Patch by Michael Kuperstein.
      
      llvm-svn: 158295
      17ee58a7
    • Hal Finkel's avatar
      Improve ext/trunc patterns on PPC64. · 2edfbddc
      Hal Finkel authored
      The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
      would leave self-moves in the final assembly. Replacing those patterns with ones
      based on the SUBREG builtins yields better-looking code.
      
      Thanks to Jakob and Owen for their suggestions in this matter.
      
      llvm-svn: 158283
      2edfbddc
  7. Jun 09, 2012
    • Craig Topper's avatar
    • Hal Finkel's avatar
      Enable tail merging on PPC. · eb50c2d4
      Hal Finkel authored
      Tail merging had been disabled on PPC because it would disturb bundling decisions
      made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
      are made during post-RA scheduling, and tail merging is generally beneficial (the
      average test-suite speedup is insignificantly positive).
      
      Largest test-suite speedups:
      MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
      MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
      SingleSource/Benchmarks/Shootout-C++/ary - 21%
      SingleSource/Benchmarks/Stanford/Queens - 17%
      
      Largest slowdowns:
      MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
      MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
      MultiSource/Applications/JM/ldecod/ldecod - 14%
      MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%
      
      This is improved by using full (instead of just critical) anti-dependency breaking,
      but doing so still causes miscompiles and so cannot yet be enabled by default.
      
      llvm-svn: 158259
      eb50c2d4
    • Jakob Stoklund Olesen's avatar
      Don't run RAFast in the optimizing regalloc pipeline. · 33a1b416
      Jakob Stoklund Olesen authored
      The fast register allocator is not supposed to work in the optimizing
      pipeline. It doesn't make sense to compute live intervals, run full copy
      coalescing, and then run RAFast.
      
      Fast register allocation in the optimizing pipeline is better done by
      RABasic.
      
      llvm-svn: 158242
      33a1b416
    • Nuno Lopes's avatar
      canonicalize: · 2710f1b0
      Nuno Lopes authored
      -%a + 42
      into
      42 - %a
      
      previously we were emitting:
      -(%a + 42)
      
      This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
      Will work on that next
      
      llvm-svn: 158237
      2710f1b0
  8. Jun 08, 2012
    • Hal Finkel's avatar
      Enable PPC CTR loop formation by default. · c6b5debb
      Hal Finkel authored
      Thanks to Jakob's help, this now causes no new test suite failures!
      
      Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
      SingleSource/Benchmarks/Misc/pi - 108%
      SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
      MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
      SingleSource/Benchmarks/Shootout/ary3 - 32%
      SingleSource/Benchmarks/Shootout-C++/matrix - 30%
      
      The largest slowdowns are:
      MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
      MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
      MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
      MultiSource/Applications/d/make_dparser - -14%
      SingleSource/Benchmarks/Shootout-C++/ary - -13%
      
      In light of these slowdowns, additional profiling work is obviously needed!
      
      llvm-svn: 158223
      c6b5debb
    • Manman Ren's avatar
      Test case for r158160 · bf86b295
      Manman Ren authored
      llvm-svn: 158218
      bf86b295
Loading