Commits · d9e39d53b6ebe528d0c5728a9dd95e227856bfee · Lorenzo Albano / LLVM bpEVL

Jun 12, 2015

[ARM] Disabling vfp4 should disable fp16 · d9e39d53

John Brawn authored Jun 12, 2015

ARMTargetParser::getFPUFeatures should disable fp16 whenever it
disables vfp4, as otherwise something like -mcpu=cortex-a7 -mfpu=none
leaves us with fp16 enabled (though the only effect that will have is
a wrong build attribute).

Differential Revision: http://reviews.llvm.org/D10397

llvm-svn: 239599

d9e39d53

[WinEH] Put finally pointers in the handler scope table field · 81d1cc00

Reid Kleckner authored Jun 11, 2015

We were putting them in the filter field, which is correct for 64-bit
but wrong for 32-bit.

Also switch the order of scope table entry emission so outermost entries
are emitted first, and fix an obvious state assignment bug.

llvm-svn: 239574

81d1cc00

[WinEH] Create an llvm.x86.seh.exceptioninfo intrinsic · a9d62535

Reid Kleckner authored Jun 11, 2015

This intrinsic is like framerecover plus a load. It recovers the EH
registration stack allocation from the parent frame and loads the
exception information field out of it, giving back a pointer to an
EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
expressions instead of accessing the EXCEPTION_POINTERS parameter that
is available on x64.

This required a minor change to MC to allow defining a label variable to
another absolute framerecover label variable.

llvm-svn: 239567

a9d62535

Jun 11, 2015

Object: Prepend __imp_ when mangling a dllimport symbol in IRObjectFile. · 82e657b5

Peter Collingbourne authored Jun 11, 2015

We cannot prepend __imp_ in the IR mangler because a function reference may
be emitted unmangled in a constant initializer. The linker is expected to
resolve such references to thunks. This is covered by the new test case.

Strictly speaking we ought to emit two undefined symbols, one with __imp_ and
one without, as we cannot know which symbol the final object file will refer
to. However, this would require rather intrusive changes to IRObjectFile,
and lld works fine without it for now.

This reimplements r239437, which was reverted in r239502.

Differential Revision: http://reviews.llvm.org/D10400

llvm-svn: 239560

82e657b5

This reverts commit r239529 and r239514. · 65d37e64

Rafael Espindola authored Jun 11, 2015

Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."

The  test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.

llvm-svn: 239544

65d37e64

Revert "Fix merges of non-zero vector stores" · 2691c59e

Reid Kleckner authored Jun 11, 2015

This reverts commit r239539.

It was causing SDAG assertions while building freetype.

llvm-svn: 239543

2691c59e

Fix merges of non-zero vector stores · e23a063d

Matt Arsenault authored Jun 11, 2015

Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.

The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.

llvm-svn: 239539

e23a063d

R600/SI: Add -mcpu=bonaire to a test that uses flat address space · 53e015f3

Tom Stellard authored Jun 11, 2015

Flat instructions don't exist on SI, but there is a bug in the backend that
allows them to be selected.

llvm-svn: 239533

53e015f3

[AArch64] Match interleaved memory accesses into ldN/stN instructions. · 4566d18e

Hao Liu authored Jun 11, 2015

Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"

E.g. Transform an interleaved load (Factor = 2):
       %wide.vec = load <8 x i32>, <8 x i32>* %ptr
       %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
       %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
     Into:
       %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
       %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
       %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Transform an interleaved store (Factor = 2):
       %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
       store <8 x i32> %i.vec, <8 x i32>* %ptr
     Into:
       %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
       %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
       call void aarch64.neon.st2(%v0, %v1, %ptr)

llvm-svn: 239514

4566d18e

[X86][SSE] Vectorized i8 and i16 shift operators · 5965680d

Simon Pilgrim authored Jun 11, 2015

This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:

1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.

The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.

Tested on SSE2, SSE41 and AVX machines.

Differential Revision: http://reviews.llvm.org/D9474

llvm-svn: 239509

5965680d

LLVM support for vector quad bit permute and gather instructions through builtins · ea1db8a6

Nemanja Ivanovic authored Jun 11, 2015

This patch corresponds to review:
http://reviews.llvm.org/D10096

This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd

llvm-svn: 239505

ea1db8a6

Jun 10, 2015

[x86] Add a reassociation optimization to increase ILP via the MachineCombiner pass · 08829bac

Sanjay Patel authored Jun 10, 2015

This is a reimplementation of D9780 at the machine instruction level rather than the DAG.

Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.

The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.

This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.

Differential Revision: http://reviews.llvm.org/D10321

llvm-svn: 239486

08829bac

[WinEH] _except_handlerN uses 0 instead of 1 to indicate catch-all · c87a6fab
Reid Kleckner authored Jun 10, 2015
```
Our usage of 1 was a holdover from __C_specific_handler.

llvm-svn: 239482
```
c87a6fab
[Hexagon] Adding decoders for signed operands and ensuring all signed operand... · 1e9d1d76
Colin LeMahieu authored Jun 10, 2015
```
[Hexagon] Adding decoders for signed operands and ensuring all signed operand types disassemble correctly.

llvm-svn: 239477
```
1e9d1d76

[StatepointLowering] Reuse stack slots across basic blocks · 346ff628

Igor Laevsky authored Jun 10, 2015

During statepoint lowering we can sometimes avoid spilling of the value if we know that it was already spilled for previous statepoint.
We were doing this by checking if incoming statepoint value was lowered into load from stack slot. This was working only in boundaries of one basic block.

But instead of looking at the lowered node we can look directly at the llvm-ir value and if it was gc.relocate (or some simple modification of it) look up stack slot for it's derived pointer and reuse stack slot from it. This allows us to look across basic block boundaries.

Differential Revision: http://reviews.llvm.org/D10251

llvm-svn: 239472

346ff628

AVX-512: Fixed a bug in comparison of i1 vectors. · 00c9ad5e

Elena Demikhovsky authored Jun 10, 2015

cmp eq should give kxnor instruction
cmp neq should give kxor 

https://llvm.org/bugs/show_bug.cgi?id=23631

llvm-svn: 239460

00c9ad5e

[WinEH] Call llvm.stackrestore in __except blocks · 673de15a

Reid Kleckner authored Jun 10, 2015

We have to do this manually, the runtime only sets up ebp. Fixes a crash
when returning after catching an exception.

llvm-svn: 239451

673de15a

[WinEH] Emit .safeseh directives for all 32-bit exception handlers · 2bc93ca8

Reid Kleckner authored Jun 10, 2015

Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.

llvm-svn: 239448

2bc93ca8

Add explicit -mtriple=arm-unknown to... · e6c1b567

NAKAMURA Takumi authored Jun 09, 2015

Add explicit -mtriple=arm-unknown to llvm/test/CodeGen/ARM/disable-tail-calls.ll, to satisfy *-win32.

llvm-svn: 239442

e6c1b567

Jun 09, 2015

[NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces · 75589ffc

Jingyue Wu authored Jun 09, 2015

Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.

This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.

Test Plan: @nested_const_expr and @rauw in access-non-generic.ll

Reviewers: broune, jholewinski

Reviewed By: broune, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10345

llvm-svn: 239435

75589ffc

[WinEH] Add 32-bit SEH state table emission prototype · f12c030f

Reid Kleckner authored Jun 09, 2015

This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.

The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.

llvm-svn: 239433

f12c030f

[AArch64] Remove an overly conservative check when generating store pairs. · cf90acc1

Chad Rosier authored Jun 09, 2015

Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.

Previously, the read of w1 (see below) prevented the formation of a stp.

        str      w0, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        str     w1, [x2, #4]
        ret

We now generate the following code.

        stp      w0, w1, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        ret

All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.

llvm-svn: 239432

cf90acc1

Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions · d9699bc7

Akira Hatanaka authored Jun 09, 2015

that was resetting it.

Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis. 
 
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.

Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.

rdar://problem/13752163

Differential Revision: http://reviews.llvm.org/D10099

llvm-svn: 239427

d9699bc7

The constant initialization for globals in NVPTX is generated as an · cd50135a

Samuel Antao authored Jun 09, 2015

array of bytes. The generation of this byte arrays was expecting 
the host to be little endian, which prevents big endian hosts to be 
used in the generation of the PTX code. This patch fixes the 
problem by changing the way the bytes are extracted so that it 
works for either little and big endian.

llvm-svn: 239412

cd50135a

Implement computeKnownBits for min/max nodes · 705eb8f6
Matt Arsenault authored Jun 09, 2015
```
llvm-svn: 239378
```
705eb8f6

[NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces · 2e4d1dd0

Jingyue Wu authored Jun 09, 2015

Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.

Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.

Reviewers: eliben, jholewinski

Reviewed By: eliben, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10322

llvm-svn: 239368

2e4d1dd0

Jun 08, 2015

[X86][SSE] Added lzcnt vector tests. · 4af289d0
Simon Pilgrim authored Jun 08, 2015
```
llvm-svn: 239333
```
4af289d0

[ARM] Pass a callback to FunctionPass constructors to enable skipping execution · 4a61619f

Akira Hatanaka authored Jun 08, 2015

on a per-function basis.

Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.

rdar://problem/20542263

Differential Revision: http://reviews.llvm.org/D8717

llvm-svn: 239325

4a61619f

X86: Reject register operands with obvious type mismatches. · 6f8db0e1

Matthias Braun authored Jun 08, 2015

While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.

Related to rdar://21042280

Differential Revision: http://reviews.llvm.org/D10260

llvm-svn: 239309

6f8db0e1

[Hexagon] Adding functionality for searching for compound instruction pairs. ... · 6aca6f0b

Colin LeMahieu authored Jun 08, 2015

[Hexagon] Adding functionality for searching for compound instruction pairs.  Compound instructions reduce slot resource requirements freeing those packet slots up for more instructions.

llvm-svn: 239307

6aca6f0b

[DAGCombiner] Added CTLZ vector constant folding support. · 4791f6d8
Simon Pilgrim authored Jun 08, 2015
```
llvm-svn: 239305
```
4791f6d8

AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNL · 00d9f845

Igor Breger authored Jun 08, 2015

Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.

Differential Revision: http://reviews.llvm.org/D10310

llvm-svn: 239300

00d9f845

[DAGCombiner] Added CTTZ vector constant folding support. · c789e1d5
Simon Pilgrim authored Jun 08, 2015
```
llvm-svn: 239293
```
c789e1d5

Jun 07, 2015
- [X86] Added tzcnt vector tests. · 856fc94a
  Simon Pilgrim authored Jun 07, 2015
```
llvm-svn: 239264
```
  856fc94a
- [X86] Added BitScanForward/BitScanReverse memory folding + tests · 3a771803
  Simon Pilgrim authored Jun 07, 2015
```
llvm-svn: 239257
```
  3a771803
- Fixed line endings · 19cefe69
  Simon Pilgrim authored Jun 07, 2015
```
llvm-svn: 239253
```
  19cefe69
- [DAGCombiner] Added CTPOP vector constant folding support. · 68cd237f
  Simon Pilgrim authored Jun 07, 2015
```
Added tests to the existing SSE/AVX test files.

llvm-svn: 239252
```
  68cd237f
Jun 05, 2015

Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM." · 6679fc1a

Peter Collingbourne authored Jun 05, 2015

as it caused miscompilations and assertion failures (PR23768,
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html).

llvm-svn: 239169

6679fc1a

DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is free · 666e3524

Fiona Glaser authored Jun 05, 2015

For targets with a free fneg, this fold is always a net loss if it
ends up duplicating the multiply, so definitely avoid it.

This might be true for some targets without a free fneg too, but
I'll leave that for future investigation.

llvm-svn: 239167

666e3524

[bpf] rename triple names bpf_be -> bpfeb · 8cf9a4c4
Alexei Starovoitov authored Jun 05, 2015
```
llvm-svn: 239162
```
8cf9a4c4