Skip to content
  1. Jun 12, 2012
  2. Jun 11, 2012
  3. Jun 10, 2012
    • Benjamin Kramer's avatar
      InstCombine: Turn (zext A) == (B & (1<<X)-1) into A == (trunc B), narrowing the compare. · 8b8a7697
      Benjamin Kramer authored
      This saves a cast, and zext is more expensive on platforms with subreg support
      than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
      On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
      same performance now when not inlining either function.
      
      stupid_memchr: 323.0us
      bsd_memchr: 321.0us
      memchr: 479.0us
      
      where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
      inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
      I haven't fully understood the issue yet, something is grossly mangling the
      loop after inlining.
      
      llvm-svn: 158297
      8b8a7697
    • Hal Finkel's avatar
      Enable ILP scheduling for all nodes by default on PPC. · 4e9f1a85
      Hal Finkel authored
      Over the entire test-suite, this has an insignificantly negative average
      performance impact, but reduces some of the worst slowdowns from the
      anti-dep. change (r158294).
      
      Largest speedups:
      SingleSource/Benchmarks/Stanford/Quicksort - 28%
      SingleSource/Benchmarks/Stanford/Towers - 24%
      SingleSource/Benchmarks/Shootout-C++/matrix - 23%
      MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
      MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
      (matrix and automotive-bitcount were both in the top-5 slowdown list from the
      anti-dep. change)
      
      Largest slowdowns:
      MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
      MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
      MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
      SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
      MultiSource/Applications/d/make_dparser - 16%
      
      llvm-svn: 158296
      4e9f1a85
    • Nadav Rotem's avatar
      Add AutoUpgrade support for the SSE4 ptest intrinsics. · 17ee58a7
      Nadav Rotem authored
      Patch by Michael Kuperstein.
      
      llvm-svn: 158295
      17ee58a7
    • Hal Finkel's avatar
      Use critical anti-dep. breaking on all PPC targets, but also add other register classes. · a8100281
      Hal Finkel authored
      Using 'all' instead of 'critical' would be better because it would make it easier to
      satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
      possible with the crs.
      
      This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
      SingleSource/Benchmarks/Shootout-C++/moments - 40%
      MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
      SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
      SingleSource/Benchmarks/McGill/misr - 23%
      MultiSource/Applications/JM/ldecod/ldecod - 22%
      
      Largest slowdowns:
      SingleSource/Benchmarks/Shootout-C++/matrix - -29%
      SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
      MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
      SingleSource/Benchmarks/Shootout-C++/ary - -17%
      MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%
      
      llvm-svn: 158294
      a8100281
    • Richard Smith's avatar
      PR12964: __int128 and unsigned __int128 are promoted integral types, be sure to · 521ecc1f
      Richard Smith authored
      consider them when enumerating builtin operator candidates.
      
      llvm-svn: 158293
      521ecc1f
    • Craig Topper's avatar
      Add XOP vprot* instruction intrinsics · a3c5fbf5
      Craig Topper authored
      llvm-svn: 158292
      a3c5fbf5
    • Craig Topper's avatar
      Add intrinsics for immediate form of XOP vprot instructions. Use i128mem... · 7afe343b
      Craig Topper authored
      Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions.
      
      llvm-svn: 158291
      7afe343b
Loading