Commits · 7ecb0714be826145fa69fdf42956203f0d8bd39a · Roger Ferrer / llvm-epi

May 05, 2015

Added Andrey Churbanov as the owner of the OpenMP runtime library code · 7ecb0714
Andrey Churbanov authored May 05, 2015
```
llvm-svn: 236540
```
7ecb0714

[Inliner] Discard empty COMDAT groups · ac256cfe

David Majnemer authored May 05, 2015

COMDAT groups which have become rendered unused because of inline are
discardable if we can prove that we've made the group empty.

This fixes PR22285.

llvm-svn: 236539

ac256cfe

Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC · 7605e37a

Pete Cooper authored May 05, 2015

Note, this is a reapplication of r236515 with a fix to not assert on non-register operands, but instead only handle them until the subsequent commit. Original commit message follows.

The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.

Will be used in the next commit to also handle regmasks.

llvm-svn: 236538

7605e37a

Thumb2SizeReduction: Check the correct set of registers for LDMIA. · 85a0e23b

Peter Collingbourne authored May 05, 2015

The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.

Also clean up some dead code:

- The isARMLowRegister check is redundant with what VerifyLowRegs does;
  replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
  does not appear in ReduceTable).

Differential Revision: http://reviews.llvm.org/D9485

llvm-svn: 236535

85a0e23b

[DAGCombiner] Account for getVectorIdxTy() when narrowing vector load · 9958c489

Ulrich Weigand authored May 05, 2015

This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it.  This is needed on z, where element numbers are i32
but pointers are i64.

Original patch by Richard Sandiford.

llvm-svn: 236530

9958c489

[DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE · af2c618e

Ulrich Weigand authored May 05, 2015

For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt).  For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)

Original patch by Richard Sandiford.

llvm-svn: 236529

af2c618e

[LegalizeVectorTypes] Allow single loads and stores for more short vectors · 2693c0a4

Ulrich Weigand authored May 05, 2015

When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.

This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores.  E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.

Original patch by Richard Sandiford.

llvm-svn: 236528

2693c0a4

[SystemZ] Add vector intrinsics · c1708b26

Ulrich Weigand authored May 05, 2015

This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.

For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.

For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result.  Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.

Based on a patch by Richard Sandiford.

llvm-svn: 236527

c1708b26

[SystemZ] Mark v1i128 and v1f128 as unsupported · 5211f9ff

Ulrich Weigand authored May 05, 2015

The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers.  We do not yet support those types, and
some infrastructure is missing before we can do so.

In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.

llvm-svn: 236526

5211f9ff

[SystemZ] Handle sub-128 vectors · cd2a1b53

Ulrich Weigand authored May 05, 2015

The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register.  We therefore
want to legalize those types by widening the vector rather than promoting
the elements.

The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results.  One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.

Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended.  Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.

Based on a patch by Richard Sandiford.

llvm-svn: 236525

cd2a1b53

[SystemZ] Add CodeGen support for scalar f64 ops in vector registers · 49506d78

Ulrich Weigand authored May 05, 2015

The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers.  It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.

Based on a patch by Richard Sandiford.

llvm-svn: 236524

49506d78

[SystemZ] Add CodeGen support for v4f32 · 80b3af7a

Ulrich Weigand authored May 05, 2015

The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used.  Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.

This particularly helps with llvmpipe.

Based on a patch by Richard Sandiford.

llvm-svn: 236523

80b3af7a

[SystemZ] Add CodeGen support for v2f64 · cd808237

Ulrich Weigand authored May 05, 2015

This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.

Based on a patch by Richard Sandiford.

llvm-svn: 236522

cd808237

[SystemZ] Add CodeGen support for integer vector types · ce4c1095

Ulrich Weigand authored May 05, 2015

This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility.  This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).

When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
  (except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.

The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.

However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.

These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level.  This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.

Based on a patch by Richard Sandiford.

llvm-svn: 236521

ce4c1095

[SystemZ] Add z13 vector facility and MC support · a8b04e1c

Ulrich Weigand authored May 05, 2015

This patch adds support for the z13 processor type and its vector facility,
and adds MC support for all new instructions provided by that facilily.

Apart from defining the new instructions, the main changes are:

- Adding VR128, VR64 and VR32 register classes.
- Making FP64 a subclass of VR64 and FP32 a subclass of VR32.
- Adding a D(V,B) addressing mode for scatter/gather operations
- Adding 1-, 2-, and 3-bit immediate operands for some 4-bit fields.
  Until now all immediate operands have been the same width as the
  underlying field (hence the assert->return change in decode[SU]ImmOperand).

In addition, sys::getHostCPUName is extended to detect running natively
on a z13 machine.

Based on a patch by Richard Sandiford.

llvm-svn: 236520

a8b04e1c

Revert "Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC" · 336d90b6

Pete Cooper authored May 05, 2015

This reverts commit 963cdbccf6e5578822836fd9b2ebece0ba9a60b7 (ie r236514)

This is to get the bots green while i investigate.

llvm-svn: 236518

336d90b6

Revert "Fix IfConverter to handle regmask machine operands." · 05b84d41

Pete Cooper authored May 05, 2015

This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).

This is to get the bots green while i investigate the failures.

llvm-svn: 236517

05b84d41

Fix IfConverter to handle regmask machine operands. · 6ebc2077

Pete Cooper authored May 05, 2015

A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236515

6ebc2077

Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC · bbd1c727

Pete Cooper authored May 05, 2015

Will be used in the next commit to also handle regmasks.

llvm-svn: 236514

bbd1c727

Fix typo in assert message. NFC. · 32a0bee2
Diego Novillo authored May 05, 2015
```
llvm-svn: 236513
```
32a0bee2
Fix the clang -Werror build, use of uninitialized variable. · b10516e4
David Blaikie authored May 05, 2015
```
llvm-svn: 236512
```
b10516e4

Update BasicAliasAnalysis to understand that nothing aliases with undef values. · 3459d6ea

Daniel Berlin authored May 05, 2015

It got this in some cases (if one of them was an identified object), but not in all cases.

This caused stores to undef to block load-forwarding in some cases, etc.

Added test to Transforms/GVN to verify optimization occurs as expected.

llvm-svn: 236511

3459d6ea

[opaque pointer type] Track explicit GEP pointee type through in-memory IR · 73cf872a
David Blaikie authored May 05, 2015
```
llvm-svn: 236510
```
73cf872a

Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" · 0738a9c0

Reid Kleckner authored May 05, 2015

This reverts commit r236360.

This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.

llvm-svn: 236508

0738a9c0

[ShrinkWrap] Add (a simplified version) of shrink-wrapping. · 61b305ed

Quentin Colombet authored May 05, 2015

This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507

61b305ed

[Orc] Reapply r236465 with fixes for the MSVC bots. · cd68eba3
Lang Hames authored May 05, 2015
```
llvm-svn: 236506
```
cd68eba3

[bugpoint] Increase default memory limit to 400MB to fix bugpoint tests. · 85202063

Daniel Sanders authored May 05, 2015

I tracked down the bug to an unchecked malloc in SmallVectorBase::grow_pod().
This malloc is returning NULL on my machine when running under bugpoint but not
when -enable-valgrind is given.

llvm-svn: 236504

85202063

This patch adds ABI support for v1i128 data type. · d4eb73c0

Kit Barton authored May 05, 2015

It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503

d4eb73c0

Emit comment for gc.relocate showing base and derived pointers in human readable form. · 2aa8cafa
Igor Laevsky authored May 05, 2015
```
Differential Revision: http://reviews.llvm.org/D9326

llvm-svn: 236497
```
2aa8cafa

[mips] Generate code for insert/extract operations when using the N64 ABI and MSA. · eda60d21

Daniel Sanders authored May 05, 2015

Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.

This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9342

llvm-svn: 236494

eda60d21

Fix regression in parsing armv{6,7}hl- triples. These are used by SUSE · 5eb52b74
Ismail Donmez authored May 05, 2015
```
and Redhat currently.

Reviewed by Jonathan Roelofs.

llvm-svn: 236492
```
5eb52b74

[mips][msa] Test basic operations for the N32 ABI too. · 4160c802

Daniel Sanders authored May 05, 2015

Summary:
This required adding instruction aliases for dneg.

N64 will be enabled shortly but requires additional bugfixes.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9341

llvm-svn: 236489

4160c802

[lib/Fuzzer] use handle_abort=1 by default so that when assert() fires we save the test case · 177582c9
Kostya Serebryany authored May 05, 2015
```
llvm-svn: 236476
```
177582c9

[Orc] Revert r236465 - It broke the Windows bots. · ac31a1f1

Lang Hames authored May 04, 2015

Looks like the usual missing explicit move-constructor issue with MSVC. I should
have a fix shortly.

llvm-svn: 236472

ac31a1f1

[X86] Fix assertion while DAG combining offsets and ExternalSymbols · 9dad227b
Reid Kleckner authored May 04, 2015
```
ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes.

llvm-svn: 236471
```
9dad227b

[ARM] IT block insertion needs to update kill flags · 4dddbcfb

Pete Cooper authored May 04, 2015

When forming an IT block from the first MOV here:

	%R2<def> = t2MOVr %R0, pred:1, pred:%CPSR, opt:%noreg
	%R3<def> = tMOVr %R0<kill>, pred:14, pred:%noreg

the move in to R3 is moved out of the IT block so that later instructions on the same predicate can be inside this block, and we can share the IT instruction.

However, when moving the R3 copy out of the IT block, we need to clear its kill flags for anything in use at this point in time, ie, R0 here.

This appeases the machine verifier which thought that R0 wasn't defined when used.

I have a test case, but its extremely register allocator specific.  It would be too fragile to commit a test which depends on the register allocator here.

llvm-svn: 236468

4dddbcfb

Add TransformUtils dependency to lli. · b5445cce

Pete Cooper authored May 04, 2015

After r236465, Orc uses ValueMaterializer and so needs to link against TransformUtils to get the ValueMaterializer::anchor().

llvm-svn: 236467

b5445cce

[Orc] Refactor the compile-on-demand layer to make module partitioning lazy, · a68970df

Lang Hames authored May 04, 2015

and avoid cloning unused decls into every partition.

Module partitioning showed up as a source of significant overhead when I
profiled some trivial test cases. Avoiding the overhead of partitionging
for uncalled functions helps to mitigate this.

This change also means that it is no longer necessary to have a
LazyEmittingLayer underneath the CompileOnDemand layer, since the
CompileOnDemandLayer will not extract or emit function bodies until they are
called.

llvm-svn: 236465

a68970df

May 04, 2015
- Vim: Fix some bugs in llvm indent plugin. · 19ffc26c
  Matthias Braun authored May 04, 2015
```
llvm-svn: 236464
```
  19ffc26c
- Vim: Set filetype=python for lit configuration files. · 825b32ac
  Matthias Braun authored May 04, 2015
```
llvm-svn: 236463
```
  825b32ac