Commits · b0b17469629b58de2183f4648c3d124d44820414 · Roger Ferrer / llvm-epi

Sep 06, 2016

AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies · 2add8a11

Tom Stellard authored Sep 06, 2016

Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.

This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands.  Right now, the pass just considers one operand at a time.

Reviewers: arsenm

Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23402

llvm-svn: 280744

2add8a11

AMDGPU : Add XNACK feature to GPUs that support it. · 5e832e86
Wei Ding authored Sep 06, 2016
```
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
```
5e832e86
Fix ItaniumDemangle.cpp build with MSVC 2013 · b2881f1f
Reid Kleckner authored Sep 06, 2016
```
llvm-svn: 280740
```
b2881f1f

[llvm-cov] Add the "Go to first unexecuted line" feature. · d36b47c4

Ying Yi authored Sep 06, 2016

This patch provides easy navigation to find the zero count lines, especially useful when the source file is very large.

Differential Revision: https://reviews.llvm.org/D23277

llvm-svn: 280739

d36b47c4

[AArch64] Adjust the scheduling model for Exynos M1. · 405c90e6
Evandro Menezes authored Sep 06, 2016
```
Further refine the model for branches.

llvm-svn: 280736
```
405c90e6
[AArch64] Adjust the scheduling model for Exynos M1. · 77e6b5d4
Evandro Menezes authored Sep 06, 2016
```
Further refine the model for stores.

llvm-svn: 280735
```
77e6b5d4
[AArch64] Adjust the scheduling model for Exynos M1. · 199cad4f
Evandro Menezes authored Sep 06, 2016
```
Further refine the model for loads.

llvm-svn: 280734
```
199cad4f

Add an c++ itanium demangler to llvm. · b940b66c

Rafael Espindola authored Sep 06, 2016

This adds a copy of the demangler in libcxxabi.

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:

    No constexpr.
    No alignas

On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi

My current plan is:

    Commit something like this
    Change lld to use it
    Change lldb to use it as the fallback

    Add a few #ifdefs so that exactly the same file can be used in
    libcxxabi to export abi::__cxa_demangle.

Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.

llvm-svn: 280732

b940b66c

fix formatting; NFC · 4e463b4a
Sanjay Patel authored Sep 06, 2016
```
llvm-svn: 280727
```
4e463b4a
[MCTargetDesc] Delete dead code. Found by GCC7 -Wunused-function. · 5715012b
Davide Italiano authored Sep 06, 2016
```
Also unbreak newer gcc build with -Werror.

llvm-svn: 280726
```
5715012b
Fix comment formatting for DebugInfoFlags.def · a2cd4131
Victor Leschuk authored Sep 06, 2016
```
llvm-svn: 280722
```
a2cd4131

bugpoint: Return Errors instead of passing around strings · 1c039155

Justin Bogner authored Sep 06, 2016

This replaces the threading of `std::string &Error` through all of
these APIs with checked Error returns instead. There are very few
places here that actually emit any errors right now, but threading the
APIs through will allow us to replace a bunch of exit(1)'s that are
scattered through this code with proper error handling.

This is more or less NFC, but does move around where a couple of error
messages are printed out.

llvm-svn: 280720

1c039155

[RDF] Ignore undef use operands · 7c9b0126
Krzysztof Parzyszek authored Sep 06, 2016
```
llvm-svn: 280717
```
7c9b0126
Formatting with clang-format patch r280700 · 40c6235b
Leny Kholodov authored Sep 06, 2016
```
llvm-svn: 280716
```
40c6235b

[SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In · 1b4462b7

Simon Pilgrim authored Sep 06, 2016

If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715

1b4462b7

[JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info · c520822d

Adam Nemet authored Sep 06, 2016

Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713

c520822d

Fix for Bindings/Go/go.test after patch r280700 · dabff7d8
Leny Kholodov authored Sep 06, 2016
```
llvm-svn: 280711
```
dabff7d8

[Sparc][Leon] Corrected supported atomics size for processors supporting Leon... · 92cac932

Chris Dewhurst authored Sep 06, 2016

[Sparc][Leon] Corrected supported atomics size for processors supporting Leon CASA instruction back to 32 bits.

This was erroneously checked-in for 64 bits while trying to find if there was a way to get 64 bit atomicity in Leon processors. There is not and this change should not have been checked-in. There is no unit test for this as the existing unit tests test for behaviour to 32 bits, which was the original intention of the code.

llvm-svn: 280710

92cac932

[mips] Tighten FastISel restrictions · b432a3ed

Simon Dardis authored Sep 06, 2016

LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.

This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.

Thanks to mike.k for reporting this issue!

Reviewers: seanbruno, vkalintiris

Differential Review: https://reviews.llvm.org/D23795

llvm-svn: 280706

b432a3ed

[PPC] Claim stack frame before storing into it, if no red zone is present · 020ec299

Krzysztof Parzyszek authored Sep 06, 2016

Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093

llvm-svn: 280705

020ec299

DebugInfo: use strongly typed enum for debug info flags · 5fcc4185

Leny Kholodov authored Sep 06, 2016

Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:

Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
Flags are now strongly typed
Patch by: Victor Leschuk <vleschuk@gmail.com>

Differential Revision: https://reviews.llvm.org/D23766

llvm-svn: 280700

5fcc4185

[RegisterScavenger] Remove aliasing registers of operands from the candidate set · 0b7c4af3

Silviu Baranga authored Sep 06, 2016

Summary:
In addition to not including the register operand of the current
instruction also don't include any aliasing registers. We can't consider
these as candidates because using them will clobber the corresponding
register operand of the current instruction.

This change doesn't include a test case and it would probably be difficult
to produce a stable one since the bug depends on the results of register
allocation.

Reviewers: MatzeB, qcolombet, hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D24130

llvm-svn: 280698

0b7c4af3

[AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast. · 4fa3b50f

Craig Topper authored Sep 06, 2016

We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.

llvm-svn: 280696

4fa3b50f

[AVX-512] Add a test case to show that we don't select masked vpermi2ps when... · cf9f1b8d

Craig Topper authored Sep 06, 2016

[AVX-512] Add a test case to show that we don't select masked vpermi2ps when the index operand comes from a bitcast.

It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.

Taken from optimized output from clang's test case.

llvm-svn: 280695

cf9f1b8d

[X86] Remove unused encoding from IntrinsicType enum. · 43fbd840
Craig Topper authored Sep 06, 2016
```
llvm-svn: 280694
```
43fbd840
[X86] Fix indentation. NFC · a0055d31
Craig Topper authored Sep 06, 2016
```
llvm-svn: 280693
```
a0055d31

Revert "bugpoint: Stop threading errors through APIs that never fail" · 24dac6af

Justin Bogner authored Sep 06, 2016

This isn't the right thing to do - it turns out a number of the APIs
that "never fail" just exit(1) if something bad happens. We can and
should thread Error through this instead.

That diff will make more sense with this reverted. Sorry for the
noise.

This reverts r280690

llvm-svn: 280691

24dac6af

bugpoint: Stop threading errors through APIs that never fail · 46b1a9a7

Justin Bogner authored Sep 06, 2016

This simplifies ListReducer and most of its subclasses by removing the
std::string &Error that was threaded through all of them but almost
never used. If we end up needing error handling in more places here we
can reinstate it using llvm::Error instead of these unwieldy strings.

The 2 cases (out of 12) that actually can hit the error cases are a
little bit awkward now, but those will clean up as I refactor this API
further.

llvm-svn: 280690

46b1a9a7

ARM: workaround bundled operation predication · bfa25bd1

Saleem Abdulrasool authored Sep 06, 2016

This is a Windows ARM specific issue. If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up. This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.

For now, report a bundle as being unpredicatable. Although this is false, this
would trigger a failure case previously anyways, so this is no worse. That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.

Under certain circumstances, it may be possible to "predicate the bundle". This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence. If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.

llvm-svn: 280689

bfa25bd1

Revert "DebugInfo: use strongly typed enum for debug info flags" · 3821b53b
Mehdi Amini authored Sep 06, 2016
```
This reverts commit r280686, bots are broken.

llvm-svn: 280688
```
3821b53b
[LTO] Constify (NFC) · 767e1457
Mehdi Amini authored Sep 06, 2016
```
llvm-svn: 280687
```
767e1457

DebugInfo: use strongly typed enum for debug info flags · 356d6b63

Mehdi Amini authored Sep 06, 2016

Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:
    * Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
    * Flags are now strongly typed

Patch by: Victor Leschuk <vleschuk@gmail.com>

Differential Revision: https://reviews.llvm.org/D23766

llvm-svn: 280686

356d6b63

Fix DensetSet::insert_as() for MSVC2015 (NFC) · ac00212f

Mehdi Amini authored Sep 06, 2016

The latest MSVC update apparently resolve the call from the
const ref variant to itself, leading to an infinite
recursion. It is not clear to me why the r-value overload is
not selected. `ValueT` is a pointer type, and the functional-style
cast in the call `insert_as(ValueT(V), LookupKey);` should result
in a r-value ref. A bug in MSVC?

Differential Revision: https://reviews.llvm.org/D23956

llvm-svn: 280685

ac00212f

[AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets. · 62d0a5e7
Craig Topper authored Sep 06, 2016
```
llvm-svn: 280684
```
62d0a5e7

CodeGen: ensure that libcalls are always AAPCS CC · a6519b1d

Saleem Abdulrasool authored Sep 06, 2016

All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts.  Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.

The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured.  Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).

llvm-svn: 280683

a6519b1d

[AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions... · dfc4fc9f

Craig Topper authored Sep 05, 2016

[AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double.

Still need to fix the register classes to allow the extended range of registers.

llvm-svn: 280682

dfc4fc9f

[X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add... · 70e13480

Craig Topper authored Sep 05, 2016

[X86] Update fast-isel store test to have more 256 and 512-bit test cases. Add command lines for AVX and AVX512 feature sets.

llvm-svn: 280681

70e13480

[X86] Update fast-isel vector load test to have more 256 and 512-bit test... · f54ebca2

Craig Topper authored Sep 05, 2016

[X86] Update fast-isel vector load test to have more 256 and 512-bit test cases. Add a command line for SKX features too.

llvm-svn: 280680

f54ebca2

fix FileCheck variables for test added with r280677 · e341c919

Sanjay Patel authored Sep 05, 2016

The script (utils/update_test_checks.py) seems to have problems 
with variable names that start with the same string. 

llvm-svn: 280679

e341c919

[Coroutines] Part12: Handle alloca address-taken · ccabaca2

Gor Nishanov authored Sep 05, 2016

Summary:
Move early uses of spilled variables after CoroBegin.

For example, if a parameter had address taken, we may end up with the code
like:
        define @f(i32 %n) {
          %n.addr = alloca i32
          store %n, %n.addr
          ...
          call @coro.begin

This patch fixes the problem by moving uses of spilled variables after CoroBegin.

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24234

llvm-svn: 280678

ccabaca2