Commits · be336295cd0415c69d929ad61a0c9bf6f2608cea · Roger Ferrer / llvm-epi-0.8

Jun 08, 2012

Remove the TODO statement in the PPC README re: CTR loops · 41e6fd1d

Hal Finkel authored Jun 08, 2012

As Chris points out, this can now be removed!

TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224

41e6fd1d

Enable PPC CTR loop formation by default. · c6b5debb

Hal Finkel authored Jun 08, 2012

Thanks to Jakob's help, this now causes no new test suite failures!

Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%

The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%

In light of these slowdowns, additional profiling work is obviously needed!

llvm-svn: 158223

c6b5debb

Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable. · 3d32ad3a

Hal Finkel authored Jun 08, 2012

Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.

Thanks again to Jakob for suggesting this fix to the CTR loop problem!

llvm-svn: 158221

3d32ad3a

Enable optimization for integer ABS on X86 if Subtarget has CMOV. · 6bc2d270
Manman Ren authored Jun 08, 2012
```
llvm-svn: 158220
```
6bc2d270

Fix Target->Codegen dependence. · 596af1b0

Andrew Trick authored Jun 08, 2012

Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.

The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.

To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.

llvm-svn: 158212

596af1b0

Disable the PPC CTR-Loops pass by default. · 821e0012

Hal Finkel authored Jun 08, 2012

The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.

llvm-svn: 158206

821e0012

Fix a bug in the new PPC CTR-Loops pass. · 8b01503e

Hal Finkel authored Jun 08, 2012

The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
    %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16

llvm-svn: 158205

8b01503e

Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form... · 96c2d4d9

Hal Finkel authored Jun 08, 2012

Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code.

This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.

llvm-svn: 158204

96c2d4d9

X86: optimize generated code for integer ABS · 2cdc8afc

Manman Ren authored Jun 07, 2012

This patch will generate the following for integer ABS:
      movl    %edi, %eax
      negl    %eax
      cmovll  %edi, %eax
INSTEAD OF
      movl    %edi, %ecx
      sarl    $31, %ecx
      leal    (%rdi,%rcx), %eax
      xorl    %ecx, %eax

There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. 
This is implemented in PerformXorCombine.

rdar://10695237

llvm-svn: 158175

2cdc8afc

Jun 07, 2012
- Do not optimize the used bits of the x86 vselect condition operand, when the... · bbd40f67
  Nadav Rotem authored Jun 07, 2012
```
Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates.
This may happen on MIC devices.

llvm-svn: 158168
```
  bbd40f67
- Continue factoring computeOperandLatency. Use it for ARM hasHighOperandLatency. · a5d24ca4
  Andrew Trick authored Jun 07, 2012
```
llvm-svn: 158164
```
  a5d24ca4
- ARM getOperandLatency rewrite. · 5b1cadf9
  Andrew Trick authored Jun 07, 2012
```
Match expectations of the new latency API. Cleanup and make the logic consistent.

llvm-svn: 158163
```
  5b1cadf9
- ARM getOperandLatency should return -1 for unknown, consistent with API · 3564bdfa
  Andrew Trick authored Jun 07, 2012
```
llvm-svn: 158162
```
  3564bdfa
- Fix ARM getInstrLatency logic to work with the current API. · fb1a74c2
  Andrew Trick authored Jun 07, 2012
```
llvm-svn: 158161
```
  fb1a74c2
- PR13046: we can't replace usage of SUB with CMP in the lowering phase. · 746e4859
  Manman Ren authored Jun 07, 2012
```
It will cause assertion failure later on.

llvm-svn: 158160
```
  746e4859
- Use a base register instead of an index register with the local dynamic model. · 55d1145b
  Rafael Espindola authored Jun 07, 2012
```
Fixes pr13048.

llvm-svn: 158158
```
  55d1145b
- X86: replace SUB with CMP if possible · ae02c5a9
  Manman Ren authored Jun 07, 2012
```
This patch will optimize the following
    movq    %rdi, %rax
    subq    %rsi, %rax
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax
to
    cmpq    %rsi, %rdi
    cmovsq  %rsi, %rdi
    movq    %rdi, %rax

Perform this optimization if the actual result of SUB is not used.

rdar: 11540023
llvm-svn: 158126
```
  ae02c5a9
- Revert r157755. · 9c964181
  Manman Ren authored Jun 06, 2012
```
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.

llvm-svn: 158122
```
  9c964181
Jun 06, 2012
- Round 2 of dead private variable removal. · 009b1c1c
  Benjamin Kramer authored Jun 06, 2012
```
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.

llvm-svn: 158096
```
  009b1c1c
- Remove unused private fields found by clang's new -Wunused-private-field. · 628a39fa
  Benjamin Kramer authored Jun 06, 2012
```
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.

llvm-svn: 158090
```
  628a39fa
- Add support for dynamic stack realignment in the presence of dynamic allocas on · 5d6f01ad
  Chad Rosier authored Jun 06, 2012
```
X86.
rdar://11496434

llvm-svn: 158087
```
  5d6f01ad
- Correct decoder for T1 conditional B encoding · f1ef87dd
  Richard Barton authored Jun 06, 2012
```
llvm-svn: 158055
```
  f1ef87dd
- Mark several instructions SSE2 instead of SSE3 as they should be. · bf2409e8
  Craig Topper authored Jun 06, 2012
```
llvm-svn: 158049
```
  bf2409e8
Jun 05, 2012
- misched: API for minimum vs. expected latency. · 4544606c
  Andrew Trick authored Jun 05, 2012
```
Minimum latency determines per-cycle scheduling groups.
Expected latency determines critical path and cost.

llvm-svn: 158021
```
  4544606c
- Fix header file include order in NVPTX backend NV_CONTRIB · 572a3a2c
  Yuan Lin authored Jun 05, 2012
```
llvm-svn: 158013
```
  572a3a2c
- PPC32 uses R2 as the TLS register. Fix the copy and paste. · c856653f
  Roman Divacky authored Jun 05, 2012
```
llvm-svn: 158004
```
  c856653f
- X86 itinerary properties. · 39a99140
  Andrew Trick authored Jun 05, 2012
```
llvm-svn: 157981
```
  39a99140
- ARM itinerary properties. · b2680c71
  Andrew Trick authored Jun 05, 2012
```
llvm-svn: 157980
```
  b2680c71
- misched: Added MultiIssueItineraries. · 73d7736b
  Andrew Trick authored Jun 05, 2012
```
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.

llvm-svn: 157979
```
  73d7736b
- whitespace · 515f1317
  Andrew Trick authored Jun 05, 2012
```
llvm-svn: 157976
```
  515f1317
- Revert commit r157966 · 7f2ac7a2
  Joel Jones authored Jun 05, 2012
```
llvm-svn: 157972
```
  7f2ac7a2
- This change handles a another case for generating the bic instruction · d08534f8
  Joel Jones authored Jun 04, 2012
```
when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.

<rdar://problem/11481151>

llvm-svn: 157966
```
  d08534f8
Jun 04, 2012
- Fix a bug in MipsTargetLowering::LowerLOAD. A shift-right-logical node is · 6734685f
  Akira Hatanaka authored Jun 04, 2012
```
inserted after the shift-left-logical node.

llvm-svn: 157937
```
  6734685f
- Implement local-exec TLS on PowerPC. · e3f15c98
  Roman Divacky authored Jun 04, 2012
```
llvm-svn: 157935
```
  e3f15c98
- MIPS TLS: use the model selected by TargetMachine::getTLSModel(). · 245917b5
  Hans Wennborg authored Jun 04, 2012
```
This was mostly done already in r156162, but I missed one place.

llvm-svn: 157929
```
  245917b5
- Better comments for TLS-related X86 MachineOperand flags. · 09610f3e
  Hans Wennborg authored Jun 04, 2012
```
llvm-svn: 157920
```
  09610f3e
- Add intrinsic forms for FMA instructions to opcode folding tables. · c6ac4cef
  Craig Topper authored Jun 04, 2012
```
llvm-svn: 157917
```
  c6ac4cef
- Add VFMADDSUB and VFMSUBADD FMA instructions to folding tables. Also add 213... · 3cb14301
  Craig Topper authored Jun 04, 2012
```
Add VFMADDSUB and VFMSUBADD FMA instructions to folding tables. Also add 213 forms of scalar FMA instructions.

llvm-svn: 157914
```
  3cb14301
- Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no functionality change). · 1de9bf01
  Hal Finkel authored Jun 04, 2012
```
llvm-svn: 157912
```
  1de9bf01
- Enable generating PPC pre-increment (r+imm) instructions by default. · 595817ee
  Hal Finkel authored Jun 04, 2012
```
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.

llvm-svn: 157911
```
  595817ee