Commits · 44937d98a30c5ad8b6fdcdfece07228b159fbf24 · Roger Ferrer / llvm-epi

Aug 21, 2014

Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence · 44937d98

Jon Roelofs authored 10 years ago

On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.

llvm-svn: 216138

44937d98

Aug 20, 2014

[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of · 03e43f8e

Quentin Colombet authored 10 years ago

the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874
llvm-svn: 216088

03e43f8e

ARM: Fix codegen for rbit intrinsic · c655f0c8

Yi Kong authored 10 years ago

LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

llvm-svn: 216064

c655f0c8

Aug 19, 2014

Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588). · 4bf6c01c

Juergen Ributzka authored 10 years ago

Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.

I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.

Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 216006

4bf6c01c

Aug 18, 2014

[ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage · 12993dd9

Oliver Stannard authored 10 years ago

Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890

12993dd9

ARM: improve RTABI 4.2 conformance on Linux · 017bd57f

Saleem Abdulrasool authored 10 years ago

The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI. It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.

The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.

The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).

llvm-svn: 215862

017bd57f

Aug 15, 2014

[AArch32] Add support for FP rounding operations for ARMv8/AArch32. · b1bbf6f8
Chad Rosier authored 10 years ago
```
Phabricator Revision: http://reviews.llvm.org/D4935

llvm-svn: 215772
```
b1bbf6f8
[FastISel][ARM] Fix unit test from r215682. · a6faae3c
Juergen Ributzka authored 10 years ago
```
Thanks Jim for finding this.

llvm-svn: 215733
```
a6faae3c

[FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant. · 81db58e1

Juergen Ributzka authored 10 years ago

FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.

This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.

This fixes <rdar://problem/18022633>.

llvm-svn: 215682

81db58e1

Aug 14, 2014

Revert several FastISel commits to track down a buildbot error. · 790bacf2

Juergen Ributzka authored 10 years ago

This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673

790bacf2

optimize vector fneg of bitcasted integer value · 35d31336

Sanjay Patel authored 10 years ago

This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646

35d31336

[FastISel] Let the target decide first if it wants to materialize a constant. · 7cee768e

Juergen Ributzka authored 10 years ago

This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 215588

7cee768e

Aug 13, 2014

[FastISel][ARM] Use MOVT/MOVW if the subtarget requests it. · a5b08385

Juergen Ributzka authored 10 years ago

This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

llvm-svn: 215584

a5b08385

Aug 11, 2014

ARM: try harder to detect non-IT eligible instructions · 27c78bf1

Saleem Abdulrasool authored 10 years ago

For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block.  We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances.  The current list is formed by manual
inspection of the ARMv6M ARM.

The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable).  This
results in code gen changes.

Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!

llvm-svn: 215382

27c78bf1

Correct a missing RUN line in the ARM codegen test for fneg ops. We should... · 1f80cde8

Sanjay Patel authored 10 years ago

Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp.

The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed.

Differential Revision: http://reviews.llvm.org/D4846

llvm-svn: 215377

1f80cde8

ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention · 11790b2d

Oliver Stannard authored 10 years ago

By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.

llvm-svn: 215348

11790b2d

ARM: correct isPredicable for MULS in ThHUMB mode · ed8885b4

Saleem Abdulrasool authored 10 years ago

The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.

The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block. However, because we only check for IT handling on
Thumb2 functions, we may miss some cases. Even then, it only validates that the
CPSR is not *live* rather than it is not accessed. This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.

This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable. Addresses PR20555.

llvm-svn: 215328

ed8885b4

Aug 08, 2014
- Make these regexes stricter by disallowing any additional characters in the output. · 80c8b274
  Adrian Prantl authored 10 years ago
  
  Thanks to dblaikie for pointing this out! llvm-svn: 215166
  80c8b274
- Reflow this comment. · 26e66b15
  Adrian Prantl authored 10 years ago
  
  llvm-svn: 215160
  26e66b15
Aug 07, 2014

[Branch probability] Recompute branch weights of tail-merged basic blocks. · bbd33f67

Akira Hatanaka authored 10 years ago

BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:

branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))

bb is a block that is in the set of merged blocks.

<rdar://problem/16256423>

llvm-svn: 215135

bbd33f67

Aug 06, 2014

ARM: do not generate BLX instructions on Cortex-M CPUs. · 2a417b96

Tim Northover authored 10 years ago

Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.

llvm-svn: 214959

2a417b96

ARM-MachO: materialize callee address correctly on v4t. · d4d294dd
Tim Northover authored 10 years ago
```
llvm-svn: 214958
```
d4d294dd

DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. · fb0412f0

David Blaikie authored 10 years ago

This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.

The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.

llvm-svn: 214939

fb0412f0

DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not... · cabf54a3

David Blaikie authored 10 years ago

DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not including a 13th field, had some subtle behavior.

Without the 13th field, the "emission kind" field defaults to 0 (which
is not equal to either of the values of the emission kind enum (1 ==
full debug info, 2 == line tables only)).

In this particular instance, the comparison with "FullDebugInfo" was
done when adding elements to the ranges list - so for these test cases
no values were added to the ranges list.

This got weirder when emitting debug_loc entries as the addresses should
be relative to the range of the CU if the CU has only one range (the
reasonable assumption is that if we're emitting debug_loc lists for a CU
that CU has at least one range - but due to the above situation, it has
zero) so the ranges were emitted relative to the start of the section
rather than relative to the start of the CU's singular range.

Fix these tests by accounting for the difference in the description of
debug_loc entries (in some cases making the test ignorant to these
differences, in others adding the extra label difference expression,
etc) or the presence/absence of high/low_pc on the CU, and add the 13th
field to their CUs to enable proper "full debug info" emission here.

In a future commit I'll fix up a bunch of other test cases that are not
so rigorously depending on this behavior, but still doing similarly
weird things due to the missing 13th field.

llvm-svn: 214937

cabf54a3

Aug 05, 2014

Re-apply r214881: Fix return sequence on armv4 thumb · ef84bda5

Jon Roelofs authored 10 years ago

This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.

POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214928

ef84bda5

Improved test cases that were added with r214892. · 1954f2e9

Sanjay Patel authored 10 years ago

1. Added ':' to CHECK-LABELs
2. Added more CHECKs
3. Added CHECK-NEXTs
4. Added verbose hex immediate comments to CHECKs

llvm-svn: 214921

1954f2e9

Revert r214881 because it broke lots of build-bots · 064eb5a1
Jon Roelofs authored 10 years ago
```
llvm-svn: 214893
```
064eb5a1

Optimize vector fabs of bitcasted constant integer values. · 8e5beb6e

Sanjay Patel authored 10 years ago

Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.

So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64

Instead of generating something like this:

movabsq (constant pool loadi of mask for sign bits)
vmovq   (move from integer register to vector/fp register)
vandps  (mask off sign bits)
vmovq   (move vector/fp register back to integer return register)

We should generate:

mov     (put constant value in return register)

I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()

is the same thing as:
IntVT.isInteger()

Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.

For more background, please see:
http://reviews.llvm.org/D4770

And:
http://llvm.org/bugs/show_bug.cgi?id=20354

Differential Revision: http://reviews.llvm.org/D4785

llvm-svn: 214892

8e5beb6e

Fix return sequence on armv4 thumb · f5fad376

Jon Roelofs authored 10 years ago

POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:

  POP {r3}
  ADD sp, #offset
  BX r3

This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:

  MOV ip, r3
  POP {r3}
  ADD sp, #offset
  MOV lr, r3
  MOV r3, ip
  BX lr

http://reviews.llvm.org/D4748

llvm-svn: 214881

f5fad376

Improve test for merged global debug info by using llvm-dwarfdump. · c74ffa9c

David Blaikie authored 10 years ago

It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of
the global symbol being used as an address in the addressing mode, but
this avoids the dependence on hardcoded set labels that keep changing
(5+ commits over the last few years that each update the set label as it
changes due to other, unrelated differences in output). This could've,
instead, been changed to match the set name then match the name in the
string pool but that would present other issues (needing to skip over
the sets that weren't of interest, etc) and checking that the addresses
(granted, without relocations applied - so it's not the whole story)
match in the two variable location descriptions seems sufficient and
fairly stable here.

There are a few similar other tests with similar label dependence that
I'll update soonish.

llvm-svn: 214878

c74ffa9c

Aug 02, 2014

[ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectly · dc08c30d

Akira Hatanaka authored 10 years ago

expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
in pic mode. This patch fixes the bug.

<rdar://problem/17886592>

llvm-svn: 214614

dc08c30d

Aug 01, 2014

[FastISel][ARM] Do not emit stores for undef arguments. · 4c018a12

Juergen Ributzka authored 10 years ago

This is a followup patch for r214366, which added the same behavior to the
AArch64 and X86 FastISel code. This fix reproduces the already existing
behavior of SelectionDAG in FastISel.

llvm-svn: 214531

4c018a12

Jul 31, 2014

Use "weak alias" instead of "alias weak" · 464fe024

Rafael Espindola authored 10 years ago

Before this patch we had

@a = weak global ...
but
@b = alias weak ...

The patch changes aliases to look more like global variables.

Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.

The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.

llvm-svn: 214355

464fe024

Jul 29, 2014

ARM: add __aeabi_d2h for truncation on AEABI systems · 4e13a614

Tim Northover authored 10 years ago

ARM does actually define the name for this conversion, so we should use it on
"-eabi" platforms.

llvm-svn: 214176

4e13a614

ARM: fix @llvm.convert.from.fp16 on softfloat targets. · f67bb207

Tim Northover authored 10 years ago

We need to make sure we use the softened version of all appropriate operands in
the libcall, or things go horribly wrong. This may entail actually executing a
1-stage softening.

llvm-svn: 214175

f67bb207

Jul 25, 2014

[stack protector] Fix a potential security bug in stack protector where the · e5b6e0d2

Akira Hatanaka authored 10 years ago

address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>

llvm-svn: 213967

e5b6e0d2

DebugInfo: Fix up some test cases to have more correct debug info metadata. · 48af9c35

David Blaikie authored 10 years ago

* Add CUs to the named CU node
* Add missing DW_TAG_subprogram nodes
* Add llvm::Functions to the DW_TAG_subprogram nodes

This cleans up the tests so that they don't break under a
soon-to-be-made change that is more strict about such things.

llvm-svn: 213951

48af9c35

[ARM] Emit ABI_PCS_R9_use build attribute. · 115d2df8

Amara Emerson authored 10 years ago

Patch by Ben Foster!

Differential Revision: http://reviews.llvm.org/D4657

llvm-svn: 213944

115d2df8

llvm/test/CodeGen/ARM/inlineasm-global.ll: Add explicit triple to appease targeting *-win32. · b4553da4
NAKAMURA Takumi authored 10 years ago
```
llvm-svn: 213933
```
b4553da4
llvm/test/CodeGen/ARM/inlineasm-global.ll: Avoid specifing source file on llc. · dd620a24
NAKAMURA Takumi authored 10 years ago
```
It sometimes confuses FileCheck. Consider the case that path contains 'stmib'. :)

llvm-svn: 213932
```
dd620a24