Skip to content
  1. Dec 02, 2013
  2. Dec 01, 2013
  3. Nov 30, 2013
    • Hal Finkel's avatar
      Add a scheduling model (with itinerary) for the PPC POWER7 · 42daeae9
      Hal Finkel authored
      This adds a scheduling model for the POWER7 (P7) core, and enables the
      machine-instruction scheduler when targeting the P7. Scheduling for the P7,
      like earlier ooo PPC cores, requires considering both dispatch group hazards,
      and functional unit resources and latencies. These are both modeled in a
      combined itinerary. Dispatch group formation is still handled by the post-RA
      scheduler (which still needs to be updated for the P7, but nevertheless does a
      pretty good job).
      
      One interesting aspect of this change is that I've also enabled to use of AA
      duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
      results seem to support this decision (see below), and while this is normally
      useful for in-order cores, and not for ooo cores like the P7, I think that the
      dispatch slot hazards are enough like in-order resources to make the AA useful.
      
      Test suite significant performance differences (where negative is a speedup,
      and positive is a regression) vs. the current situation:
      
      MultiSource/Benchmarks/BitBench/drop3/drop3
        with AA: N/A
        without AA: -28.7614% +/- 19.8356%
      (significantly against AA)
      
      MultiSource/Benchmarks/FreeBench/neural/neural
        with AA: -17.7406% +/- 11.2712%
        without AA: N/A
      (significantly in favor of AA)
      
      MultiSource/Benchmarks/SciMark2-C/scimark2
        with AA: -11.2079% +/- 1.80543%
        without AA: -11.3263% +/- 2.79651%
      
      MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
        with AA: -41.8649% +/- 17.0053%
        without AA: -34.5256% +/- 23.7072%
      
      MultiSource/Benchmarks/mafft/pairlocalalign
        with AA: 25.3016% +/- 17.8614%
        without AA: 38.6629% +/- 14.9391%
      (significantly in favor of AA)
      
      MultiSource/Benchmarks/sim/sim
        with AA: N/A
        without AA: 13.4844% +/- 7.18195%
      (significantly in favor of AA)
      
      SingleSource/Benchmarks/BenchmarkGame/Large/fasta
        with AA: 15.0664% +/- 6.70216%
        without AA: 12.7747% +/- 8.43043%
      
      SingleSource/Benchmarks/BenchmarkGame/puzzle
        with AA: 82.2713% +/- 26.3567%
        without AA: 75.7525% +/- 41.1842%
      
      SingleSource/Benchmarks/Misc/flops-2
        with AA: -37.1621% +/- 20.7964%
        without AA: -35.2342% +/- 20.2999%
      (significantly in favor of AA)
      
      These are 99.5% confidence intervals from 5 runs per configuration. Regarding
      the choice to turn on AA during CodeGen, of these results, four seem
      significantly in favor of using AA, and one seems significantly against. I'm
      not making this decision based on these numbers alone, but these results
      seem consistent with results I have from other tests, and so I think that, on
      balance, using AA is a win.
      
      llvm-svn: 195981
      42daeae9
    • Hal Finkel's avatar
      Split some PPC itinerary classes · 46402a42
      Hal Finkel authored
      In preparation for adding scheduling definitions for the POWER7, split some PPC
      itinerary classes so that the P7's latencies and hazards can be better
      described. For the most part, this means differentiating indexed from non-index
      pre-increment loads and stores. Also, differentiate single from
      double-precision sqrt.
      
      No functionality change intended (except for a more-specific latency for
      single-precision sqrt on the A2).
      
      llvm-svn: 195980
      46402a42
    • Hal Finkel's avatar
      Convert a PPC test from grep to FileCheck · ca93e472
      Hal Finkel authored
      Convert this test to FileCheck, and improve it to check for the instructions it
      is trying to exclude instead of checking for register use (especially because
      grepping for r1 can be thrown off, for example, by a use of r12).
      
      llvm-svn: 195979
      ca93e472
    • Hal Finkel's avatar
      Desensitize a couple of PPC regression tests · 2651f973
      Hal Finkel authored
      Use CHECK-DAG to make these regression tests more resilient against changes in
      instruction scheduling.
      
      llvm-svn: 195978
      2651f973
    • Hal Finkel's avatar
      Update the cpu specified on some PPC regression tests · 2b655bb2
      Hal Finkel authored
      Some of these tests did not specify a cpu but were also sensitive to
      instruction scheduling and/or register assignment choices. A few others
      similarly-sensitive tests specified a cpu (often the POWER7), and while the P7
      currently uses the default model for PPC64, this will soon change. For those
      tests which should not really be cpu-dependent anyway, the cpu is set to the
      generic 'ppc64'.
      
      llvm-svn: 195977
      2b655bb2
    • Zoran Jovanovic's avatar
      Test case for issue with microMIPS long branch. · 47248671
      Zoran Jovanovic authored
      llvm-svn: 195976
      47248671
    • Zoran Jovanovic's avatar
      Fixed issue with microMIPS long branch. · 9d86e26e
      Zoran Jovanovic authored
      llvm-svn: 195975
      9d86e26e
    • Daniel Sanders's avatar
      [mips][msa] MSA loads and stores have a 10-bit offset. Account for this when lowering FrameIndex. · 7fd68d60
      Daniel Sanders authored
      This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s
      when the stack frame is between 512 and 32,768 bytes in size.
      
      llvm-svn: 195973
      7fd68d60
    • Daniel Sanders's avatar
      [mips][msa] A small refactor to reduce patch noise in my next commit · 71534147
      Daniel Sanders authored
      No functional change. An if-statement has been split into two nested if-statements.
      
      llvm-svn: 195972
      71534147
    • Juergen Ributzka's avatar
      Force CPU type to unbreak unit tests on Haswell machines. · 5b6234dc
      Juergen Ributzka authored
      llvm-svn: 195971
      5b6234dc
    • Andrew Trick's avatar
  4. Nov 29, 2013
    • Reed Kotler's avatar
      Part 1 of 3 patches that completes very long conditional branches · ad450f23
      Reed Kotler authored
      in constant islands for Mips16. We introdcuce JalB16 as a synomnym
      for Jal16. It makes it easier to read and is also necessary because
      Jal16 is a call instruction but JalB16 is being used as a branch.
      Various parts of LLVM will not work properly even in this late stage of
      the backend if we use what was declared as a call instruction to function
      as a branch. For one, basic block labels may not get emitted in some
      situations. 
      
      llvm-svn: 195968
      ad450f23
    • Zoran Jovanovic's avatar
      Revert revision 195965. · 1bc3cce0
      Zoran Jovanovic authored
      llvm-svn: 195967
      1bc3cce0
    • Petar Jovanovic's avatar
      mips: XFAIL llvm-cov test · e3e940d8
      Petar Jovanovic authored
      XFAIL llvm-cov.test for MIPS until big-endian issues are fixed for llvm-cov.
      The test does pass on MIPS little-endian.
      
      llvm-svn: 195966
      e3e940d8
    • Zoran Jovanovic's avatar
      Fixed issue with microMIPS long branch. · ff2a40ce
      Zoran Jovanovic authored
      llvm-svn: 195965
      ff2a40ce
    • Hal Finkel's avatar
      Adjust PPC A2 input operand latencies · 1df3205e
      Hal Finkel authored
      On the PPC A2, instructions are only issued after their input operands are
      ready. Model this by specifying that input operands are read at dispatch (0
      cycles after issue). This changes all input operand latencies from 1 to 0.
      
      Significant test-suite performance changes (these are 99.5% confidence
      intervals on 6 runs for both before and after):
      
      speedups:
      MultiSource/Benchmarks/sim/sim
      	-1.21915% +/- 0.175063%
      MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
      	-1.23946% +/- 1.05133%
      SingleSource/Benchmarks/Misc/flops-2
      	-1.24237% +/- 0.681362%
      MultiSource/Applications/JM/lencod/lencod
      	-1.33992% +/- 0.757498%
      MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
      	-1.51802% +/- 1.21468%
      MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
      	-2.18818% +/- 1.28605%
      MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
      	-2.21977% +/- 1.19499%
      SingleSource/Benchmarks/BenchmarkGame/spectral-norm
      	-2.29822% +/- 0.671871%
      MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
      	-2.40975% +/- 0.355931%
      SingleSource/Benchmarks/Misc/fp-convert
      	-2.41899% +/- 1.04751%
      MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
      	-2.50349% +/- 0.126765%
      SingleSource/Benchmarks/Misc/flops-3
      	-3.00214% +/- 0.700795%
      MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
      	-3.56995% +/- 3.2929%
      MultiSource/Applications/sgefa/sgefa
      	-4.24908% +/- 2.00413%
      MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
      	-18.1294% +/- 3.96489%
      
      regressions:
      MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
      	1.03249% +/- 0.178547%
      MultiSource/Applications/hexxagon/hexxagon
      	1.16597% +/- 0.285235%
      MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
      	1.39576% +/- 1.07855%
      SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
      	1.71539% +/- 0.173182%
      MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
      	1.90013% +/- 0.866472%
      MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
      	2.39854% +/- 1.05914%
      MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
      	2.4402% +/- 0.817904%
      MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
      	5.87997% +/- 3.3172%
      MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
      	9.02643% +/- 5.79591%
      MultiSource/Benchmarks/VersaBench/bmm/bmm
      	10.3517% +/- 1.227%
      
      Obviously, there are data points on both sides of this; but I think, overall,
      this supports making the change.
      
      llvm-svn: 195951
      1df3205e
Loading