Commits · 7706107e6f6e17dfa8e1f2d85c44cd7430c32e52 · Lorenzo Albano / LLVM bpEVL

Jan 21, 2014

[NVPTX] Add missing patterns for div.approx with immediate denominator · 7706107e
Justin Holewinski authored Jan 21, 2014
```
llvm-svn: 199746
```
7706107e

tools: support decoding ARM EHABI opcodes in readobj · 9f0a21ef

Saleem Abdulrasool authored Jan 21, 2014

Add support to llvm-readobj to decode the actual opcodes.  The ARM EHABI opcodes
are a variable length instruction set that describe the operations required for
properly unwinding stack frames.

The primary motivation for this change is to ease the creation of tests for the
ARM EHABI object emission as well as the unwinding directive handling in the ARM
IAS.

Thanks to Logan Chien for an extra test case!

llvm-svn: 199708

9f0a21ef

ARM IAS: add support for .unwind_raw directive · d9f08603

Saleem Abdulrasool authored Jan 21, 2014

This implements the unwind_raw directive for the ARM IAS.  The unwind_raw
directive takes the form of a stack offset value followed by one or more bytes
representing the opcodes to be emitted.  The opcode emitted will interpreted as
if it were assembled by the opcode assembler via the standard unwinding
directives.

Thanks to Logan Chien for an extra test!

llvm-svn: 199707

d9f08603

ARM IAS: support .personalityindex · 662f5c1a

Saleem Abdulrasool authored Jan 21, 2014

The .personalityindex directive is equivalent to the .personality directive with
the ARM EABI personality with the specific index (0, 1, 2).  Both of these
directives indicate personality routines, so enhance the personality directive
handling to take into account personalityindex.

Bonus fix: flush the UnwindContext at the beginning of a new function.

Thanks to Logan Chien for additional tests!

llvm-svn: 199706

662f5c1a

[AArch64 NEON] Fix a bug caused by undef lane when generating VEXT. · 6d379abd

Kevin Qin authored Jan 21, 2014

It was commited as r199628 but reverted in r199628 as causing
regression test failed. It's because of old vervsion of patch
I used to commit. Sorry for mistake.

llvm-svn: 199704

6d379abd

Jan 20, 2014

[X86] Teach how to combine a vselect into a movss/movsd · 450d1661

Andrea Di Biagio authored Jan 20, 2014

Add target specific rules for combining vselect dag nodes into movss/movsd
when possible.

If the vector type of the vselect dag node in input is either MVT::v4i13 or
MVT::v4f32, then try to fold according to rules:

  1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B)
  2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)

If the vector type of the vselect dag node in input is either MVT::v2i64 or
MVT::v2f64 (and we have SSE2), then try to fold according to rules:

  3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
  4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)

llvm-svn: 199683

450d1661

Debug info: On ARM ensure that all __TEXT sections come before the · 671af5ca

Adrian Prantl authored Jan 20, 2014

optional DWARF sections, so compiling with -g does not result in
different code being generated for PC-relative loads.

This is reapplying a diet r197922 (__TEXT-only).

llvm-svn: 199681

671af5ca

Revert "Debug info: On ARM ensure that the data sections come before the" · 1a89924d

Adrian Prantl authored Jan 20, 2014

Cut back on the cargo cult. The order of __DATA sections doesn't affect
generated code.

This reverts commit r197922.

llvm-svn: 199680

1a89924d

Remove the useless pseudo instructions VDUPfdf and VDUPfqf, replacing them... · 43ccae1b
James Molloy authored Jan 20, 2014
```
Remove the useless pseudo instructions VDUPfdf and VDUPfqf, replacing them with patterns to match VDUPLN.

llvm-svn: 199675
```
43ccae1b
Fix misched-aa-colored.ll to require asserts (trying again) · 9ff54e1f
Hal Finkel authored Jan 20, 2014
```
Perhaps it needs to be in caps.

llvm-svn: 199661
```
9ff54e1f
Fix misched-aa-colored.ll to require asserts. · a6bcadeb
Hal Finkel authored Jan 20, 2014
```
-misched=shuffle is NDEBUG only. Maybe we should change that.

llvm-svn: 199659
```
a6bcadeb

Update IR when merging slots in stack coloring · cd9569c1

Hal Finkel authored Jan 20, 2014

The way that stack coloring updated MMOs when merging stack slots, while
correct, is suboptimal, and is incompatible with the use of AA during
instruction scheduling. The solution, which involves the use of const_cast (and
more importantly, updating the IR from within an MI-level pass), obviously
requires some explanation:

When the stack coloring pass was originally committed, the code in
ScheduleDAGInstrs::buildSchedGraph tracked possible alias sets by using
GetUnderlyingObject, and all load/store and store/store memory control
dependencies where added between SUs at the object level (where only one
object, that returned by GetUnderlyingObject, was used to identify the object
associated with each MMO). When stack coloring merged stack slots, it would
replace MMOs derived from the remapped alloca with the alloca with which the
remapped alloca was being replaced. Because ScheduleDAGInstrs only used single
objects, and tracked alias sets at the object level, this was a fine solution.

In r169744, (Andy and) I updated the code in ScheduleDAGInstrs to use
GetUnderlyingObjects, and track alias sets using, potentially, multiple
underlying objects for each MMO. This was done, primarily, to provide the
ability to look through PHIs, and provide better scheduling for
induction-variable-dependent loads and stores inside loops. At this point, the
MMO-updating code in stack coloring became suboptimal, because it would clear
the MMOs for (i.e. completely pessimize) all instructions for which r169744
might help in scheduling. Updating the IR directly is the simplest fix for this
(and the one with, by far, the least compile-time impact), but others are
possible (we could give each MMO a small vector of potential values, or make
use of a remapping table, constructed from MFI, inside ScheduleDAGInstrs).

Unfortunately, replacing all MMO values derived from the remapped alloca with
the base replacement alloca fundamentally breaks our ability to use AA during
instruction scheduling (which is critical to performance on some targets). The
reason is that the original MMO might have had an offset (either constant or
dynamic) from the base remapped alloca, and that offset is not present in the
updated MMO. One possible way around this would be to use
GetPointerBaseWithConstantOffset, and update not only the MMO's value, but also
its offset based on the original offset. Unfortunately, this solution would
only handle constant offsets, and for safety (because AA is not completely
restricted to deducing relationships with constant offsets), we would need to
clear all MMOs without constant offsets over the entire function. This would be
an even worse pessimization than the current single-object restriction. Any
other solution would involve passing around a vector of remapped allocas, and
teaching AA to use it, introducing additional complexity and overhead into AA.

Instead, when remapping an alloca, we replace all IR uses of that alloca as
well (optionally inserting a bitcast as necessary). This is even more efficient
that the old MMO-updating code in the stack coloring pass (because it removes
the need to call GetUnderlyingObject on all MMO values), removes the
single-object pessimization in the default configuration, and enables the
correct use of AA during instruction scheduling (all without any additional
overhead).

LLVM now no longer miscompiles itself on x86_64 when using -enable-misched
-enable-aa-sched-mi -misched-bottomup=0 -misched-topdown=0 -misched=shuffle!
Fixed PR18497.

Because the alloca replacement is now done at the IR level, unless the MMO
directly refers to the remapped alloca, the change cannot be seen at the MI
level. As a result, there is no good way to fix test/CodeGen/X86/pr14090.ll.

llvm-svn: 199658

cd9569c1

[x86] Fix disassembly of MOV16ao16 et al. · caaa2850

David Woodhouse authored Jan 20, 2014

The addition of IC_OPSIZE_ADSIZE in r198759 wasn't quite complete. It
also turns out to have been unnecessary. The disassembler handles the
AdSize prefix for itself, and doesn't care about the difference between
(e.g.) MOV8ao8 and MOB8ao8_16 definitions. So just let them coexist and
don't worry about it.

llvm-svn: 199654

caaa2850

[x86] Fix 16-bit disassembly of JCXZ/JECXZ · 9c74fdb8
David Woodhouse authored Jan 20, 2014
```
llvm-svn: 199653
```
9c74fdb8

[x86] Rename MOVSD/STOSD/LODSD/OUTSD to MOVSL/STOSL/LODSL/OUTSL · 3442f342

David Woodhouse authored Jan 20, 2014

The disassembler has a special case for 'L' vs. 'W' in its heuristic for
checking for 32-bit and 16-bit equivalents. We could expand the heuristic,
but better just to be consistent in using the 'L' suffix.

llvm-svn: 199652

3442f342

[x86] Fix disassembly of callw instruction · 70ced3e0

David Woodhouse authored Jan 20, 2014

Not quite sure why this was marked isAsmParserOnly, but it means that the
disassembler can't see it either.

llvm-svn: 199651

70ced3e0

[x86] Fix 16-bit handling of OpSize bit · 5cf4c675

David Woodhouse authored Jan 20, 2014

When disassembling in 16-bit mode the meaning of the OpSize bit is
inverted. Instructions found in the IC_OPSIZE context will actually
*not* have the 0x66 prefix, and instructions in the IC context will
have the 0x66 prefix. Make use of the existing special-case handling
for the 0x66 prefix being in the wrong place, to cope with this.

llvm-svn: 199650

5cf4c675

[x86] Support i386-*-*-code16 triple for emitting 16-bit code · 71d15eda
David Woodhouse authored Jan 20, 2014
```
llvm-svn: 199648
```
71d15eda

[PM] Wire up the Verifier for the new pass manager and connect it to the · 4d35631a

Chandler Carruth authored Jan 20, 2014

various opt verifier commandline options.

Mostly mechanical wiring of the verifier to the new pass manager.
Exercises one of the more unusual aspects of it -- a pass can be either
a module or function pass interchangably. If this is ever problematic,
we can make things more constrained, but for things like the verifier
where there is an "obvious" applicability at both levels, it seems
convenient.

This is the next-to-last piece of basic functionality left to make the
opt commandline driving of the new pass manager minimally functional for
testing and further development. There is still a lot to be done there
(notably the factoring into .def files to kill the current boilerplate
code) but it is relatively uninteresting. The only interesting bit left
for minimal functionality is supporting the registration of analyses.
I'm planning on doing that on top of the .def file switch mostly because
the boilerplate for the analyses would be significantly worse.

llvm-svn: 199646

4d35631a

ARM: add tlsldo relocation · e51c8138

Kai Nacke authored Jan 20, 2014

Add support for the symbol(tlsldo) relocation. This is required in order to 
solve PR18554.

Reviewed by R. Golin, A. Korobeynikov.

llvm-svn: 199644

e51c8138

[ARM] Do not generate Tag_DIV_use=AllowDIVExt when hardware div is... · 10e76a4e

Artyom Skrobov authored Jan 20, 2014

[ARM] Do not generate Tag_DIV_use=AllowDIVExt when hardware div is non-optional: it should have the default value of AllowDIVIfExists

llvm-svn: 199638

10e76a4e

Revert r199628: "[AArch64 NEON] Fix a bug caused by undef lane when generating VEXT." · f835fc6f
Chandler Carruth authored Jan 20, 2014
```
This test fails the newly added regression tests.

llvm-svn: 199631
```
f835fc6f

Fix all the remaining lost-fast-math-flags bugs I've been able to find. The... · 1664dc89

Owen Anderson authored Jan 20, 2014

Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators.
This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.

llvm-svn: 199629

1664dc89

[AArch64 NEON] Fix a bug caused by undef lane when generating VEXT. · ff42e06e
Kevin Qin authored Jan 20, 2014
```
llvm-svn: 199628
```
ff42e06e

[AArch64 NEON] Accept both #0.0 and #0 for comparing with floating point zero in asm parser. · ef66ff78

Kevin Qin authored Jan 20, 2014

For FCMEQ, FCMGE, FCMGT, FCMLE and FCMLT, floating point zero will be
printed as #0.0 instead of #0. To support the history codes using #0,
we consider to let asm parser accept both #0.0 and #0.

llvm-svn: 199621

ef66ff78

Jan 19, 2014

InstCombine: Modernize a bunch of cast combines. · b80e1699
Benjamin Kramer authored Jan 19, 2014
```
Also make them vector-aware.

llvm-svn: 199608
```
b80e1699
InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. · 7a74bd47
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199604
```
7a74bd47
InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. · 72196f3a
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199602
```
72196f3a
InstCombine: Refactor fmul/fdiv combines to handle vectors. · 76b15d04
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199598
```
76b15d04

Fix a really nasty SROA bug with how we handled out-of-bounds memcpy · 1bf38c6a

Chandler Carruth authored Jan 19, 2014

intrinsics.

Reported on the list by Evan with a couple of attempts to fix, but it
took a while to dig down to the root cause. There are two overlapping
bugs here, both centering around the circumstance of discovering
a memcpy operand which is known to be completely outside the bounds of
the alloca.

First, we need to kill the *other* side of the memcpy if it was added to
this alloca. Otherwise we'll factor it into our slicing and try to
rewrite it even though we know for a fact that it is dead. This is made
more tricky because we can visit the sides in either order. So we have
to both kill the other side and skip instructions marked as dead. The
latter really should be goodness in every case, but here is a matter of
correctness.

Second, we need to actually remove the *uses* of the alloca by the
memcpy when queuing it for later deletion. Otherwise it may still be
using the alloca when we go to promote it (if the rewrite re-uses the
existing alloca instruction). Do this by factoring out the
use-clobbering used when for nixing a Phi argument and re-using it
across the operands of a to-be-deleted instruction.

llvm-svn: 199590

1bf38c6a

ARM ELF: ensure that the tag types are corrected · 93900055

Saleem Abdulrasool authored Jan 19, 2014

Ensure that the tag types are reflected on a replacement.  This is particularly
important for the compatibility tag which has multiple representations where the
last definition wins.

llvm-svn: 199577

93900055

ARM: update build attributes for ABI r2.09 · 196c3212

Saleem Abdulrasool authored Jan 19, 2014

Update names for the names as per the current ABI errata.  Mark deprecated tags
as such.

llvm-svn: 199576

196c3212

LoopVectorizer: A reduction that has multiple uses of the reduction value is not · cc742dd9

Arnold Schwaighofer authored Jan 19, 2014

a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570

cc742dd9

[PM] Make the verifier work independently of any pass manager. · 043949d4

Chandler Carruth authored Jan 19, 2014

This makes the 'verifyFunction' and 'verifyModule' functions totally
independent operations on the LLVM IR. It also cleans up their API a bit
by lifting the abort behavior into their clients and just using an
optional raw_ostream parameter to control printing.

The implementation of the verifier is now just an InstVisitor with no
multiple inheritance. It also is significantly more const-correct, and
hides the const violations internally. The two layers that force us to
break const correctness are building a DomTree and dispatching through
the InstVisitor.

A new VerifierPass is used to implement the legacy pass manager
interface in terms of the other pieces.

The error messages produced may be slightly different now, and we may
have slightly different short circuiting behavior with different usage
models of the verifier, but generally everything works equivalently and
this unblocks wiring the verifier up to the new pass manager.

llvm-svn: 199569

043949d4

Jan 18, 2014
- Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg),... · a6a17d77
  Nick Lewycky authored Jan 18, 2014
  
  Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu! llvm-svn: 199564
  a6a17d77
- ARM: Let the assembler reject v5 instructions in v4 mode. · dd39a98b
  Benjamin Kramer authored Jan 18, 2014
  
  PR18524. llvm-svn: 199559
  dd39a98b
- [CMake] Add llvm-tblgen to dependencies of check-llvm. · 3189afc4
  NAKAMURA Takumi authored Jan 18, 2014
  
  llvm-tblgen is not built when external LLVM_TABLEGEN is specified. Even then, llvm-tblgen should be built for testing tblgen itself. llvm-svn: 199558
  3189afc4
- InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too. · fea9ac99
  Benjamin Kramer authored Jan 18, 2014
  
  PR18532. llvm-svn: 199553
  fea9ac99
- Debug info (LTO): Move the creation of accessibility flags to · ef129fbb
  Adrian Prantl authored Jan 18, 2014
  
  getOrCreateSubprogramDIE to avoid attributes being added twice when DIEs are merged. rdar://problem/15842330. llvm-svn: 199536
  ef129fbb
- Fix more instances of dropped fast math flags when optimizing FADD... · 48b842ef
  Owen Anderson authored Jan 18, 2014
  
  Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528
  48b842ef