Commits · f02c92489abc5f9eff3397c8916955cb0fb6a7ef · Lorenzo Albano / LLVM bpEVL

Dec 03, 2014

Emit ABI_FP_rounding attribute. · f02c9248

Charlie Turner authored Dec 03, 2014

LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.

Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
llvm-svn: 223218

f02c9248

Add tests for default value of Tag_ABI_FP_rounding. · 1620a69f
Charlie Turner authored Dec 03, 2014
```
Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393
llvm-svn: 223217
```
1620a69f

R600/SI: Remove i1 pseudo VALU ops · becd656c

Matt Arsenault authored Dec 03, 2014

Select i1 logical ops directly to 64-bit SALU instructions.
Vector i1 values are always really in SGPRs, with each
bit for each item in the wave. This saves about 4 instructions
when and/or/xoring any condition, and also helps write conditions
that need to be passed in vcc.

This should work correctly now that the SGPR live range
fixing pass works. More work is needed to eliminate the VReg_1
pseudo regclass and possibly the entire SILowerI1Copies pass.

llvm-svn: 223206

becd656c

R600/SI: Enable inline assembly · 36930806

Tom Stellard authored Dec 03, 2014

We just needed to remove the assertion in
AMDGPURegisterInfo::getFrameRegister(), which is called when
initializing the parser for inline assembly.

llvm-svn: 223197

36930806

R600/SI: Change mubuf offsets to print as decimal · fb13b22d
Matt Arsenault authored Dec 03, 2014
```
This matches SC's behavior.

llvm-svn: 223194
```
fb13b22d

Prologue support · 51d2de7b

Peter Collingbourne authored Dec 03, 2014

Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189

51d2de7b

[PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets · 01fa7701

Hal Finkel authored Dec 03, 2014

We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.

llvm-svn: 223182

01fa7701

AArch64: strengthen Darwin ABI alignment assumptions · 4a8ac260

Tim Northover authored Dec 02, 2014

A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.

rdar://18533701

llvm-svn: 223180

4a8ac260

AArch64: don't be too greedy when folding :lo12: accesses into mem ops. · ec7ebebe

Tim Northover authored Dec 02, 2014

This frequently leads to cases like:
   ldr xD, [xN, :lo12:var]
   add xA, xN, :lo12:var
   ldr xD, [xA, #8]

where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).

llvm-svn: 223172

ec7ebebe

Dec 02, 2014

[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets · 6b988ad8

Simon Pilgrim authored Dec 02, 2014

4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458

llvm-svn: 223165

6b988ad8

[PowerPC] Implement readcyclecounter for PPC32 · bbdee936

Hal Finkel authored Dec 02, 2014

We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161

bbdee936

[AArch64][Stackmaps] Optimize stackmap shadows on AArch64. · a7395bf4

Lang Hames authored Dec 02, 2014

Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.

<rdar://problem/14959522>

llvm-svn: 223156

a7395bf4

R600/SI: Move more information into SIProgramInfo struct · 4df465bd
Tom Stellard authored Dec 02, 2014
```
llvm-svn: 223154
```
4df465bd
R600: Cleanup some tests and add missing testcases · 6f1e96b4
Matt Arsenault authored Dec 02, 2014
```
llvm-svn: 223151
```
6f1e96b4

[mips] Fix passing of small structures for big-endian O32. · d134c9da

Daniel Sanders authored Dec 02, 2014

Summary:
Like N32/N64, they must be passed in the upper bits of the register.

The new code could be merged with the existing if-statements but I've
refrained from doing this since it will make porting the O32 implementation
to tablegen harder later.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6463

llvm-svn: 223148

d134c9da

[Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder · 1a1bdb22

Philip Reames authored Dec 02, 2014

This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.

With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.

Reviewed by: atrick, ributzka

llvm-svn: 223137

1a1bdb22

[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction. · 54b7d334

Ahmed Bougacha authored Dec 02, 2014

Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
  subs  ... %NZCV<imp-def>        <- CSMI
  csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
  subs  ... %NZCV<imp-def>        <- MI, to be eliminated
  csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.

Also, add an exhaustive testcase for the motivating example.

Reviewed by: Juergen Ributzka <juergen@apple.com>

llvm-svn: 223133

54b7d334

AArch64: make register block rules apply to vector types too. · 24ec87de

Tim Northover authored Dec 02, 2014

The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.

llvm-svn: 223126

24ec87de

R600/SI: Set the ATC bit on all resource descriptors for the HSA runtime · 794c8c0f
Tom Stellard authored Dec 02, 2014
```
llvm-svn: 223125
```
794c8c0f

Emit Tag_ABI_FP_denormal correctly in fast-math mode. · 15f91c52

Charlie Turner authored Dec 02, 2014

The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,

  * For VFPv2, it is implementation defined as to whether the sign of zero is
    preserved.
  * For VFPv3 and above, the sign of zero is always preserved when a denormal
    is flushed to zero.

When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.

Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110

15f91c52

Dec 01, 2014

[NVPTX] Do not emit .weak symbols for NVPTX · 5b62eb9b

Jingyue Wu authored Dec 01, 2014

Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.

Test Plan: weak-linkage.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6455

llvm-svn: 223077

5b62eb9b

Parse 'ghccc' in .ll files as the GHC convention (cc 10) · 35fc363c
Reid Kleckner authored Dec 01, 2014
```
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

llvm-svn: 223076
```
35fc363c

[AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR". · d0ce058f

Ahmed Bougacha authored Dec 01, 2014

r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6308

llvm-svn: 223075

d0ce058f

[AArch64] Fix v2i8->i16 bitcast legalization. · 87946320

Ahmed Bougacha authored Dec 01, 2014

r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack.  That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).

Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.

Part of PR21549.

Differential Revision: http://reviews.llvm.org/D6307

llvm-svn: 223074

87946320

[MachineVerifier] Accept a MBB with a single landing pad successor. · fb6eeb74

Ahmed Bougacha authored Dec 01, 2014

The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.

llvm-svn: 223059

fb6eeb74

ARM: lower tail calls correctly when using GHC calling convention. · 3024b553
Tim Northover authored Dec 01, 2014
```
Patch by Ben Gamari.

llvm-svn: 223055
```
3024b553
Revert r223049, r223050 and r223051 while investigating test failures. · 5bef5b52
Hans Wennborg authored Dec 01, 2014
```
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
```
5bef5b52

SelectionDAG switch lowering: Replace unreachable default with most popular case. · 1571336f

Hans Wennborg authored Dec 01, 2014

This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049

1571336f

[PowerPC] Fix unwind info with dynamic stack realignment · 1f0a44e6

Jay Foad authored Dec 01, 2014

Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

llvm-svn: 222996

1f0a44e6

[stack protector] Set edge weights for newly created basic blocks. · b9991a26

Akira Hatanaka authored Dec 01, 2014

This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987

b9991a26

Nov 29, 2014
- Switch lowering: Fix broken 'Figure out which block is next' code · 6c42d1a5
  Hans Wennborg authored Nov 29, 2014
```
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
```
  6c42d1a5
Nov 28, 2014

R600/SI: Fix assertion on sign extend of 3 vectors · 8596f719
Matt Arsenault authored Nov 28, 2014
```
This was trying to create an MVT with 3x vectors which
created an invalid EVT

llvm-svn: 222942
```
8596f719

Revert "Masked Vector Load and Store Intrinsics." · 9bc81fbe

Duncan P. N. Exon Smith authored Nov 28, 2014

This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936

9bc81fbe

Enable FeatureFastUAMem for btver2 · e57f3c0a

Sanjay Patel authored Nov 28, 2014

Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy.
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360

llvm-svn: 222925

e57f3c0a

Nov 27, 2014

AArch64: treat [N x Ty] as a block during procedure calls. · 3c55ccac

Tim Northover authored Nov 27, 2014

The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.

This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.

llvm-svn: 222903

3c55ccac

Stop uppercasing build attribute data. · 8d433691

Charlie Turner authored Nov 27, 2014

The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882

8d433691

Nov 26, 2014

Update AArch64 ELF relocations to ABI 1.0 · 40f08faa

Will Newton authored Nov 26, 2014

This mostly entails adding relocations, however there are a couple of
changes to existing relocations:

1. R_AARCH64_NONE is defined to be zero rather than 256

R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.

2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21

I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.

Tested with check-all with no regressions.

llvm-svn: 222821

40f08faa

AVX-512: Scalar ERI intrinsics · 905a5a60

Elena Demikhovsky authored Nov 26, 2014

including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378

llvm-svn: 222820

905a5a60

Nov 25, 2014

[X86][SSE] Improvements to byte shift shuffle matching · 371417db

Simon Pilgrim authored Nov 25, 2014

Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409

llvm-svn: 222796

371417db

[AVX512] Add 512b integer shift by variable intrinsics and patterns. · 9b7c15a3
Cameron McInally authored Nov 25, 2014
```
llvm-svn: 222786
```
9b7c15a3