Commits · a3088f09b3d647b25e7d01e16bdf2b15ac18417e · Roger Ferrer / llvm-epi-0.8

Jun 23, 2012

Handle aliases to tls variables in all architectures, not just x86. · a3088f09
Rafael Espindola authored Jun 23, 2012
```
llvm-svn: 159058
```
a3088f09

(sub X, imm) gets canonicalized to (add X, -imm) · 68c2f9a9

Evan Cheng authored Jun 23, 2012

There are patterns to handle immediates when they fit in the immediate field.
e.g. %sub = add i32 %x, -123
=>   sub r0, r0, #123
Add patterns to catch immediates that do not fit but should be materialized
with a single movw instruction rather than movw + movt pair.
e.g. %sub = add i32 %x, -65535
=>   movw r1, #65535
     sub r0, r0, r1

rdar://11726136

llvm-svn: 159057

68c2f9a9

ARM: Add a better diagnostic for some out of range immediates. · 087affe2

Jim Grosbach authored Jun 22, 2012

As an example of how the custom DiagnosticType can be used to provide
better operand-mismatch diagnostics, add a custom diagnostic for
the imm0_15 operand class used for several system instructions.
Update the tests to expect the improved diagnostic.

rdar://8987109

llvm-svn: 159051

087affe2

Add support for the PPC isel instruction. · 460e94d8

Hal Finkel authored Jun 22, 2012

The isel (integer select) instruction is supported on the 440 and A2
embedded cores and on the POWER7.

llvm-svn: 159045

460e94d8

Whitespace. · f5cdea3d
Chad Rosier authored Jun 22, 2012
```
llvm-svn: 159035
```
f5cdea3d

Jun 22, 2012

Revert r158679 - use case is unclear (and it increases the memory footprint). · 8db55472

Hal Finkel authored Jun 22, 2012

Original commit message:
    Allow up to 64 functional units per processor itinerary.

    This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
    This will be needed for some upcoming PowerPC itineraries.

llvm-svn: 159027

8db55472

Use "NoItineraries" for processors with no itineraries. · 9c302673

Andrew Trick authored Jun 22, 2012

This makes it explicit when ScoreboardHazardRecognizer will be used.
"GenericItineraries" would only make sense if it contained real
itinerary values and still required ScoreboardHazardRecognizer.

llvm-svn: 158963

9c302673

Functions calling __builtin_eh_return must have a frame pointer. · 321d41a8

Jakob Stoklund Olesen authored Jun 22, 2012

The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame
pointer exists, but the frame pointer was forced by the presence of
llvm.eh.unwind.init which isn't guaranteed.

If llvm.eh.unwind.init is actually required in functions calling
eh.return (is it?), we should diagnose that instead of emitting bad
machine code.

This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot.

llvm-svn: 158961

321d41a8

ARM scheduling fix: don't guess at implicit operand latency. · 77d0b889

Andrew Trick authored Jun 22, 2012

This is a minor drive-by fix with no robust way to unit test.
As an example see neon-div.ll:
SU(16):   %Q8<def> = VMOVLsv4i32 %D17, pred:14, pred:%noreg, %Q8<imp-use,kill>
 val SU(1): Latency=2 Reg=%Q8
...should be latency=1

llvm-svn: 158960

77d0b889

ARM scheduling fix: compute predicated implicit use properly. · 3ccb1b8c

Andrew Trick authored Jun 22, 2012

Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.

llvm-svn: 158959

3ccb1b8c

Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a · b8650f10

Lang Hames authored Jun 22, 2012

boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).

This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.

Fast mode - allows formation of fused FP ops whenever they're profitable.

Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.

Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.

Note: This option only controls formation of fused ops by the optimizers.  Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.

Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.

llvm-svn: 158956

b8650f10

Convert the PPC backend to use the new FMA infrastructure. · 0a479ae7

Hal Finkel authored Jun 22, 2012

The existing contraction patterns are replaced with fma/fneg.
Overall functionality should be the same.

llvm-svn: 158955

0a479ae7

Jun 21, 2012

1. fix null program output after some other changes · 765c3123
Akira Hatanaka authored Jun 21, 2012
```
2. re-enable null.ll test
3. fix some minor style violations

Patch by Reed Kotler.

llvm-svn: 158935
```
765c3123
Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC. · a86b0f20
Hal Finkel authored Jun 21, 2012
```
Thanks to Tobias von Koch for pointing out this problem.

llvm-svn: 158932
```
a86b0f20

The inline asm operand modifier 'c' is suppose · b2fd5f66

Jack Carter authored Jun 21, 2012

to be generic across architectures. It has the
following description in the gnu sources:

    Substitute immediate value without immediate syntax

Several Architectures such as x86 have local implementations
of operand modifier 'c' which go beyond the above description
slightly. To make use of the generic modifiers without overriding
local implementation one can make a call to the base class method
for AsmPrinter::PrintAsmOperand() in the locally derived method's 
"default" case in the switch statement. That way if it is already
defined locally the generic version will never get called.

This change is needed when test/CodeGen/generic/asm-large-immediate.ll
failed on a native Mips board. The test was assuming a generic
implementation was in place.

Affected files:

    lib/Target/Mips/MipsAsmPrinter.cpp:
        Changed the default case to call the base method.
    lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
        Added 'c' to the switch cases.
    test/CodeGen/Mips/asm-large-immediate.ll
        Mips compiled version of the generic one

Contributer: Jack Carter
llvm-svn: 158925

b2fd5f66

Add a missing llvm.fma -> VFNMS pattern to the ARM backend. · 90b2a4cb
Lang Hames authored Jun 21, 2012
```
llvm-svn: 158902
```
90b2a4cb

Jun 20, 2012

Revert r158846. · 87505f46
Akira Hatanaka authored Jun 20, 2012
```
llvm-svn: 158855
```
87505f46

In MipsDisassembler.cpp, instead of defining register class tables, use the ones · da448fe0

Akira Hatanaka authored Jun 20, 2012

that are generated by TableGen and are already available in
MipsGenRegisterInfo.inc. Suggested by Jakob Stoklund Olesen.

Also, fix bug in function DecodeAFGR64RegisterClass.

Patch by Vladimir Medic. 

llvm-svn: 158846

da448fe0

Add support for generating reg+reg (indexed) pre-inc loads on PPC. · ca542bef
Hal Finkel authored Jun 20, 2012
```
llvm-svn: 158823
```
ca542bef

Remove 'static' from inline functions defined in header files. · 5c0997f0

Chandler Carruth authored Jun 20, 2012

There is a pretty staggering amount of this in LLVM's header files, this
is not all of the instances I'm afraid. These include all of the
functions that (in my build) are used by a non-static inline (or
external) function. Specifically, these issues were caught by the new
'-Winternal-linkage-in-inline' warning.

I'll try to just clean up the remainder of the clearly redundant "static
inline" cases on functions (not methods!) defined within headers if
I can do so in a reliable way.

There were even several cases of a missing 'inline' altogether, or my
personal favorite "static bool inline". Go figure. ;]

llvm-svn: 158800

5c0997f0

Add predicate check around some patterns. · 21d04fc1
Craig Topper authored Jun 20, 2012
```
llvm-svn: 158797
```
21d04fc1
Add predicate check around some patterns. · 3b662a62
Craig Topper authored Jun 20, 2012
```
llvm-svn: 158795
```
3b662a62

Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector.... · b9e8e189

Craig Topper authored Jun 20, 2012

Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases.

llvm-svn: 158792

b9e8e189

Add DAG-combines for aggressive FMA formation. · 39fb1d08

Lang Hames authored Jun 19, 2012

This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
      AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
        OR
      UnsafeFPMath option (-enable-unsafe-fp-math)
    are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
    the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).

If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.

llvm-svn: 158757

39fb1d08

Jun 19, 2012

Implement PPCInstrInfo::isCoalescableExtInstr(). · 0f855e42

Jakob Stoklund Olesen authored Jun 19, 2012

The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.

This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.

This is related to PR5997.

llvm-svn: 158743

0f855e42

Have ARM ELF use correct reloc for "b" instr. · 7f5d79f8

Jan Wen Voung authored Jun 19, 2012

The condition code didn't actually matter for arm "b" instructions,
unlike "bl".  It should just use the R_ARM_JUMP24 reloc.

llvm-svn: 158722

7f5d79f8

Mark most PPC register classes to avoid write-after-write. · d465810f

Hal Finkel authored Jun 19, 2012

For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.

On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:

SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%

Largest slowdowns:

SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%

llvm-svn: 158719

d465810f

Make MipsLongBranch::runOnMachineFunction return true. · 9f96bb86
Akira Hatanaka authored Jun 19, 2012
```
llvm-svn: 158702
```
9f96bb86
Use MachineBasicBlock::instr_iterator instead of MachineBasicBlock::iterator in · 9846239b
Akira Hatanaka authored Jun 19, 2012
```
MipsCodeEmitter.cpp.

llvm-svn: 158701
```
9846239b
Add support for generating reg+reg preinc stores on PPC. · 1cc27e44
Hal Finkel authored Jun 19, 2012
```
PPC will now generate STWUX and friends.

llvm-svn: 158698
```
1cc27e44

Move the support for using .init_array from ARM to the generic · ca3e0ee8

Rafael Espindola authored Jun 19, 2012

TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.

Add a command line option to llc that enables it.

llvm-svn: 158692

ca3e0ee8

ARM: use NOEN loads and stores if possible when handling struct byval. · 6e1fd46f
Manman Ren authored Jun 18, 2012
```
This change is to be enabled in clang.

rdar://9877866

llvm-svn: 158684
```
6e1fd46f

Jun 18, 2012

Allow up to 64 functional units per processor itinerary. · 8eac0096

Hal Finkel authored Jun 18, 2012

This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.

llvm-svn: 158679

8eac0096

ARM: Define generic HINT instruction. · cb540f5c

Jim Grosbach authored Jun 18, 2012

The NOP, WFE, WFI, SEV and YIELD instructions are all hints w/
a different immediate value in bits [7,0]. Define a generic HINT
instruction and refactor NOP, WFI, WFI, SEV and YIELD to be
assembly aliases of that.

rdar://11600518

llvm-svn: 158674

cb540f5c

This change handles a another case for generating the bic instruction · 3237ce73

Joel Jones authored Jun 18, 2012

when a compile time constant is known.  This occurs when implicitly zero 
extending function arguments from 16 bits to 32 bits.  The 8 bit case doesn't
need to be handled, as the 8 bit constants are encoded directly, thereby
not needing a separate load instruction to form the constant into a register.

<rdar://problem/11481151>

llvm-svn: 158659

3237ce73

Temporarily revert r158087. · 2cc11fd8

Chandler Carruth authored Jun 18, 2012

This patch causes problems when both dynamic stack realignment and
dynamic allocas combine in the same function. With this patch, we no
longer build the epilog correctly, and silently restore registers from
the wrong position in the stack.

Thanks to Matt for tracking this down, and getting at least an initial
test case to Chad. I'm going to try to check a variation of that test
case in so we can easily track the fixes required.

llvm-svn: 158654

2cc11fd8

Jun 16, 2012

Cleanup trip-count finding for PPC CTR loops (and some bug fixes). · 6261c2dc

Hal Finkel authored Jun 16, 2012

This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.

In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).

llvm-svn: 158607

6261c2dc

*no need to pollute Intel syntax with bonus mnemonics; operand size is explicitly specified · 390edb0d
Kay Tiong Khoo authored Jun 16, 2012
```
llvm-svn: 158603
```
390edb0d
Mips/AsmParser/CMakeLists.txt: Fix dependency. · e2d4a093
NAKAMURA Takumi authored Jun 16, 2012
```
llvm-svn: 158602
```
e2d4a093
Fix the encoding of the armv7m (MClass) for MSR registers other than aspr, · 6c7279ec
Kevin Enderby authored Jun 15, 2012
```
iaspr, espr and xpsr which also needed to have 0b10 in their mask encoding bits.

llvm-svn: 158560
```
6c7279ec