Commits · 341c11da3bda1a3f877a3dc6e96f7d1f866f703f · Roger Ferrer / llvm-epi-0.8

Apr 22, 2011

DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless. · 341c11da

Benjamin Kramer authored Apr 22, 2011

On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
  movzbl	(%rdi), %eax
  cmpl	$47, %eax
->
  cmpb	$47, (%rdi)

This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)

llvm-svn: 130005

341c11da

Add asserts. · 3c39ec29
Devang Patel authored Apr 22, 2011
```
llvm-svn: 129995
```
3c39ec29

X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X &... · 4c816247

Benjamin Kramer authored Apr 22, 2011

X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)

This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a]
movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
andq %rdi, %rax ## encoding: [0x48,0x21,0xf8]
ret ## encoding: [0xc3]

with this patch we can fold the immediate into the and:
andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a]
ret ## encoding: [0xc3]

It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.

llvm-svn: 129990

4c816247

In Thumb2 mode, lower frame indix references to: · c0d2004e

Evan Cheng authored Apr 22, 2011

add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.

rdar://9321541

llvm-svn: 129971

c0d2004e

Compute the size of the FDE encoding instead of hard coding it. Update · 5395f44f
Rafael Espindola authored Apr 22, 2011
```
X8664_ELFTargetObjectFile::getFDEEncoding to match reality.

llvm-svn: 129959
```
5395f44f
Remove unused argument. · 6aea5926
Rafael Espindola authored Apr 21, 2011
```
llvm-svn: 129955
```
6aea5926
Fix DWARF description of Q registers. · 94ad6ac1
Devang Patel authored Apr 21, 2011
```
llvm-svn: 129952
```
94ad6ac1
Fix DWARF description of S registers. · 3712c14b
Devang Patel authored Apr 21, 2011
```
llvm-svn: 129947
```
3712c14b

Apr 21, 2011
- As per ARM docs, register Dx is described as DW_OP_regx(256+x) in DWARF. · 46bda61a
  Devang Patel authored Apr 21, 2011
```
llvm-svn: 129922
```
  46bda61a
- PTX: Expand useable register space · d74d88a8
  Justin Holewinski authored Apr 21, 2011
```
llvm-svn: 129913
```
  d74d88a8
- ptx: fix parameter ordering · 14c48e5d
  Che-Liang Chiou authored Apr 21, 2011
```
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.

Patch by Dan Bailey

llvm-svn: 129909
```
  14c48e5d
- ptx: PTXMachineFunctionInfo no longer sort registers and so should not use std::binary_search · cdc51569
  Che-Liang Chiou authored Apr 21, 2011
```
llvm-svn: 129908
```
  cdc51569
- Remove -use-divmod-libcall. Let targets opt in when they are available. · 5f1ba4cd
  Evan Cheng authored Apr 20, 2011
```
llvm-svn: 129884
```
  5f1ba4cd
Apr 20, 2011
- Revert r129846; it's breaking a buildbot. See · c93d399e
  Eli Friedman authored Apr 20, 2011
```
http://google1.osuosl.org:8011/builders/llvm-x86_64-linux-checks/builds/825/steps/test.llvm.stage2/logs/st.ll

llvm-svn: 129869
```
  c93d399e
- Prefer cheap registers for busy live ranges. · 0e34c1df
  Jakob Stoklund Olesen authored Apr 20, 2011
```
On the x86-64 and thumb2 targets, some registers are more expensive to encode
than others in the same register class.

Add a CostPerUse field to the TableGen register description, and make it
available from TRI->getCostPerUse. This represents the cost of a REX prefix or a
32-bit instruction encoding required by choosing a high register.

Teach the greedy register allocator to prefer cheap registers for busy live
ranges (as indicated by spill weight).

llvm-svn: 129864
```
  0e34c1df
- Excise unintended hunk in 129858. <rdar://problem/7662569> · 7850af6e
  Stuart Hastings authored Apr 20, 2011
```
llvm-svn: 129862
```
  7850af6e
- ARM byval support. Will be enabled by another patch to the FE. <rdar://problem/7662569> · 45fe3c38
  Stuart Hastings authored Apr 20, 2011
```
llvm-svn: 129858
```
  45fe3c38
- PTX: Add intrinsics to list of built-in intrinsics, which allows them to be · 7d8895e7
  Justin Holewinski authored Apr 20, 2011
```
     used by Clang.  To help Clang integration, the PTX target has been split
     into two targets: ptx32 and ptx64, depending on the desired pointer size.

- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64

llvm-svn: 129851
```
  7d8895e7
- ptx: add integer div and rem instruction · 6586f846
  Che-Liang Chiou authored Apr 20, 2011
```
Patched by Dan Bailey

llvm-svn: 129848
```
  6586f846
- ptx: add floating-point comparison to setp · 5a952b3c
  Che-Liang Chiou authored Apr 20, 2011
```
Patched by Dan Bailey

llvm-svn: 129847
```
  5a952b3c
- ptx: fix parameter ordering · 49160f9a
  Che-Liang Chiou authored Apr 20, 2011
```
Patched by Dan Bailey

llvm-svn: 129846
```
  49160f9a
- This should always be signed chars, so use int8_t. This fixes a miscompile when · 4dae63e3
  Nick Lewycky authored Apr 20, 2011
```
llvm is built with unsigned chars where an immediate such as 0xff would be zero
extended to 64-bits, turning "cmp $0xff,%eax" into
"cmp $0xffffffffffffffff,%eax".

llvm-svn: 129845
```
  4dae63e3
- Remove unused arguments. · e473aaf5
  Rafael Espindola authored Apr 20, 2011
```
llvm-svn: 129844
```
  e473aaf5
- ADT/Triple: Renambe isOSX... methods to isMacOSX for consistency with the OS · cd01ed5b
  Daniel Dunbar authored Apr 20, 2011
```
triple component.

llvm-svn: 129838
```
  cd01ed5b
- Fix typo in the comment. · dc62e597
  Johnny Chen authored Apr 19, 2011
```
llvm-svn: 129837
```
  dc62e597
Apr 19, 2011

ADT/Triple: Move a variety of clients to using isOSDarwin() and isOSWindows() · 2b9b0e37
Daniel Dunbar authored Apr 19, 2011
```
predicates.

llvm-svn: 129816
```
2b9b0e37
Target/X86: Eliminate uses of getDarwinVers(). · 100455a3
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129813
```
100455a3
Target/X86: Add getTargetTriple() accessor. · 44b53036
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129812
```
44b53036
Target/PPC: Kill off DarwinVers, which is now dead. · e3de896b
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129811
```
e3de896b
Target/PPC: Eliminate a use of getDarwinVers(). · f954a0f0
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129810
```
f954a0f0
Target/PPC: Add a TargetTriple field. · a37aab25
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129809
```
a37aab25
Target: Eliminate a use of getDarwinMajorNumber(). · 9483bb6b
Daniel Dunbar authored Apr 19, 2011
```
llvm-svn: 129803
```
9483bb6b
Remove some duplicate op action entries and reorganize. · c721b0db
Eric Christopher authored Apr 19, 2011
```
llvm-svn: 129781
```
c721b0db

This patch combines several changes from Evan Cheng for rdar://8659675 . · 0858c3aa

Bob Wilson authored Apr 19, 2011

Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775

0858c3aa

Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637. · d04a83f8
Bob Wilson authored Apr 19, 2011
```
llvm-svn: 129774
```
d04a83f8

Avoid some 's' 16-bit instruction which partially update CPSR · a2881ee8

Bob Wilson authored Apr 19, 2011

(and add false dependency) when it isn't dependent on last CPSR defining
instruction. rdar://8928208

llvm-svn: 129773

a2881ee8

Avoid write-after-write issue hazards for Cortex-A9. · df612ba0

Bob Wilson authored Apr 19, 2011

Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.

This is currently disabled by default. We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.

llvm-svn: 129772

df612ba0

Some single-precision VFP instructions can execute in either the VPF or Neon · 3e5944d9
Bob Wilson authored Apr 19, 2011
```
pipelines, at least on Cortex-A9.

llvm-svn: 129771
```
3e5944d9
Improvements for the Cortex-A9 scheduling itineraries. · f33715e5
Bob Wilson authored Apr 19, 2011
```
llvm-svn: 129770
```
f33715e5
Add support for FastISel'ing varargs calls. · ee92a6b3
Eli Friedman authored Apr 19, 2011
```
llvm-svn: 129765
```
ee92a6b3