Commits · e186319319600c36bd3b4c832e65f58f01217dff · Roger Ferrer / llvm-epi

Oct 17, 2014
- Introduce LLVMParseCommandLineOptions C API function. · e1863193
  Peter Collingbourne authored Oct 16, 2014
```
llvm-svn: 219975
```
  e1863193
Oct 16, 2014

Reduce code duplication between patchpoint and non-patchpoint lowering. NFC. · fd4633e1

Juergen Ributzka authored Oct 16, 2014

This is in preparation for another patch that makes patchpoints invokable.

Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5657

llvm-svn: 219967

fd4633e1

[SROA] Switch the common variable name for the 'AllocaSlices' class to · 8393406f

Chandler Carruth authored Oct 16, 2014

'AS'.

Using 'S' as this was a terrible idea. Arguably, 'AS' is not much
better, but it at least follows the idea of using initialisms and
removes active confusion about the AllocaSlices variable and a Slice
variable.

llvm-svn: 219963

8393406f

[SROA] More range-based cleanups to SROA, these brought to you by · 61747042

Chandler Carruth authored Oct 16, 2014

clang-modernize.

I did have to clean up the variable types and whitespace a bit because
the use of auto made the code much less readable here.

llvm-svn: 219962

61747042

[SROA] Switch a couple of overly complex iterator accessors to just be · 57d4cae2

Chandler Carruth authored Oct 16, 2014

ArrayRef accessors.

I think this even came up in review that this was over-engineered, and
indeed it was. Time to un-build it.

llvm-svn: 219958

57d4cae2

Erase fence insertion from SelectionDAGBuilder.cpp (NFC) · e2de06be

Robin Morisset authored Oct 16, 2014

Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957

e2de06be

R600/SI: Remove unnecessary VALU patterns · 70c82173

Matt Arsenault authored Oct 16, 2014

These haven't been necessary since allowing
selecting SALU instructions in non-entry blocks
was enabled.

llvm-svn: 219956

70c82173

[SROA] Start more deeply moving SROA to use ranges rather than just · c659df93

Chandler Carruth authored Oct 16, 2014

iterators.

There are a ton of places where it essentially wants ranges
rather than just iterators. This is just the first step that adds the
core slice range typedefs and uses them in a couple of places. I still
have to explicitly construct them because they've not been punched
throughout the entire set of code. More range-based cleanups incoming.

llvm-svn: 219955

c659df93

R600: Fix nonsensical implementation of computeKnownBits for BFE · a3fe7c62
Matt Arsenault authored Oct 16, 2014
```
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
```
a3fe7c62
Delete -std-compile-opts. · 11aaaeeb
Rafael Espindola authored Oct 16, 2014
```
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
```
11aaaeeb

Allow call-slop optzn for destinations with a suitable dereferenceable attribute · d20816fd

Bjorn Steinbrink authored Oct 16, 2014

Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950

d20816fd

Fix lang-ref doc bug: s/icmp lt/icmp slt/ · ec81c0b4
Jon Roelofs authored Oct 16, 2014
```
llvm-svn: 219947
```
ec81c0b4
[llvm-objdump] Fix -private-headers for mach-o to print all LC_*_DYLIB variants · 15558914
Nick Kledzik authored Oct 16, 2014
```
llvm-svn: 219945
```
15558914

fold: sqrt(x * x * y) -> fabs(x) * sqrt(y) · c699a611

Sanjay Patel authored Oct 16, 2014

If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944

c699a611

[AArch64] Fix miscompile of sdiv-by-power-of-2. · 03a06110

Juergen Ributzka authored Oct 16, 2014

When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934

03a06110

[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes. · 167c3721

Vasileios Kalintiris authored Oct 16, 2014

Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5753

llvm-svn: 219931

167c3721

[mips] Marked the DI/EI instruction aliases as MIPS32r2 · 711028f7

Vasileios Kalintiris authored Oct 16, 2014

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5751

llvm-svn: 219927

711028f7

Test commit access: remove extra new line at the end of file · f445a56b
Vasileios Kalintiris authored Oct 16, 2014
```
llvm-svn: 219925
```
f445a56b
Add missing header guard. · 0445380f
Benjamin Kramer authored Oct 16, 2014
```
llvm-svn: 219922
```
0445380f

Reapply r219832 - InstCombine: Narrow switch instructions using known bits. · 5c221ef9

Akira Hatanaka authored Oct 16, 2014

The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902

5c221ef9

TRE: make TRE a bit more aggressive · 7f529219

Saleem Abdulrasool authored Oct 16, 2014

Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899

7f529219

Revert r219832. · 40c2cf4a
Akira Hatanaka authored Oct 16, 2014
```
llvm-svn: 219884
```
40c2cf4a

[LVI] Add some additional comments about caching and context instructions · 2400c96c

Hal Finkel authored Oct 16, 2014

Philip Reames and I had a long conversation about this, mostly because it is
not obvious why the current logic is correct. Hopefully, these comments will
prevent such confusion in the future.

llvm-svn: 219882

2400c96c

llvm/Support/Options.h: Use \tparam. [-Wdocumentation] · e870f233
NAKAMURA Takumi authored Oct 16, 2014
```
llvm-svn: 219881
```
e870f233
R600: Remove dead function · f1b34cf6
Matt Arsenault authored Oct 16, 2014
```
llvm-svn: 219879
```
f1b34cf6

Revert "r219834 - Teach ScalarEvolution to sharpen range information" · 360b1ed5

Sanjoy Das authored Oct 15, 2014

This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468

llvm-svn: 219878

360b1ed5

Preserve non-byval pointer alignment attributes using @llvm.assume when inlining · 68dc3c7a

Hal Finkel authored Oct 15, 2014

For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876

68dc3c7a

Add CreateAlignmentAssumption to IRBuilder · 6f814db8

Hal Finkel authored Oct 15, 2014

Clang CodeGen had a utility function for creating pointer alignment assumptions
using the @llvm.assume intrinsic. This functionality will also be needed by the
inliner (to preserve function-argument alignment attributes when inlining), so
this moves the utility function into IRBuilder where it can be used both by
Clang CodeGen and also other LLVM-level code.

llvm-svn: 219875

6f814db8

[AVX512] Add DQ subvector inserts · 4285c1f8

Adam Nemet authored Oct 15, 2014

In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively.  These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).

Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass.  The "Alt" Pat<>'s are disabled with DQ.

Fixes <rdar://problem/18426089>

llvm-svn: 219874

4285c1f8

[AVX512] Add SKX testing to avx512-insert-extract.ll · 2b71ca5d
Adam Nemet authored Oct 15, 2014
```
This is in preparation to adding DQ subvector inserts to this testcase.

llvm-svn: 219873
```
2b71ca5d
[AVX512] Fix test to produce a defined value · f3aba14d
Adam Nemet authored Oct 15, 2014
```
We're inserting into a 8 wide vector, so the index should be < 8.

llvm-svn: 219872
```
f3aba14d

[AVX512] Two new attributes in X86VectorVTInfo for subvector insert · 449b3f09

Adam Nemet authored Oct 15, 2014

The new attributes are NumElts and the CD8TupleForm.  This prepares the code
to enable x8 and x2 inserts.

NFC, no change in X86.td.expanded except for the new attributes.

llvm-svn: 219871

449b3f09

[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size · b1c3ef4b

Adam Nemet authored Oct 15, 2014

It's the W bit that selects between 32 or 64 elt type and not the opcode.  The
opcode selects between the width of the insert (128 or 256).

llvm-svn: 219870

b1c3ef4b

R600: Remove unnecessary part of computeKnownBitsForTargetNode · 20893b36
Matt Arsenault authored Oct 15, 2014
```
Zero-width BFEs are combined away already, so there's no point in
handling them.

llvm-svn: 219868
```
20893b36
Move variable down to use · 6de7af42
Matt Arsenault authored Oct 15, 2014
```
llvm-svn: 219867
```
6de7af42

Add MachOObjectFile::getUuid() · 6909b5b5

Alexander Potapenko authored Oct 15, 2014

This CL introduces MachOObjectFile::getUuid(). This function returns an ArrayRef to the object file's UUID, or an empty ArrayRef if the object file doesn't contain an LC_UUID load command.
The new function is gonna be used by llvm-symbolizer.

llvm-svn: 219866

6909b5b5

Updating documentation based on my change to remove the template disambiguation. · 42e929f7
Chris Bieneman authored Oct 15, 2014
```
llvm-svn: 219862
```
42e929f7
Fixing the build failure due to compiler warnings and unnecessary disambiguation. · 5c4e9551
Chris Bieneman authored Oct 15, 2014
```
llvm-svn: 219861
```
5c4e9551

Oct 15, 2014

Defining a new API for debug options that doesn't rely on static global cl::opts. · 732e0aa9

Chris Bieneman authored Oct 15, 2014

Summary:
This is based on the discussions from the LLVMDev thread:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html

Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5389

llvm-svn: 219854

732e0aa9

R600/SI: Fix bug where immediates were being used in DS addr operands · c8d7920a

Tom Stellard authored Oct 15, 2014

The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.

This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.

This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created.  This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.

llvm-svn: 219848

c8d7920a