Commits · 000ce9a6868e36ce5d31217c67d69317f6c4ef9a · Roger Ferrer / llvm-epi

Nov 16, 2016

[LoopVectorize] Fix for non-determinism in codegen · 000ce9a6

Mandeep Singh Grang authored Nov 16, 2016

Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718

Reviewers: mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26727

llvm-svn: 287135

000ce9a6

AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass · 0d162b1c

Tom Stellard authored Nov 16, 2016

Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131

0d162b1c

[ExecutionEngine] Fix examples build broken in r287126 and other Include What You Use warnings. · caf28033
Eugene Zelenko authored Nov 16, 2016
```
llvm-svn: 287130
```
caf28033
fix comment formatting; NFC · 4ce99d4d
Sanjay Patel authored Nov 16, 2016
```
llvm-svn: 287127
```
4ce99d4d

[ExecutionEngine] Fix some Clang-tidy modernize-use-default,... · cecb0183

Eugene Zelenko authored Nov 16, 2016

[ExecutionEngine] Fix some Clang-tidy modernize-use-default, modernize-use-equals-delete and Include What You Use warnings; other minor fixes.

Differential revision: https://reviews.llvm.org/D26729

llvm-svn: 287126

cecb0183

[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes · 7f3d51f8

Sanjay Patel authored Nov 16, 2016

We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. 
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of 
compilers, but logically equivalent int, float, and double variants of bitwise-logic 
instructions are reality in x86, and the float variant may be a shorter instruction 
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all 
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

llvm-svn: 287122

7f3d51f8

[Orc] Re-enable the RPC unit test disabled in r286917. · d4758898

Lang Hames authored Nov 16, 2016

This unit test infinite-looped on s390x due to a thread_yield being optimized
out. I've updated the QueueChannel class (where thread_yield was called) to use
a condition variable instead. This should cause the unit test to behave
correctly.

llvm-svn: 287121

d4758898

[sancov] Name the global containing the main source file name · 3a83e768

Reid Kleckner authored Nov 16, 2016

If the global name doesn't start with __sancov_gen, ASan will insert
unecessary red zones around it.

llvm-svn: 287117

3a83e768

test commit, changed tab to spaces, NFC · e870398e
Daniil Fukalov authored Nov 16, 2016
```
llvm-svn: 287116
```
e870398e
Add a little endian variant of TCE. · 8483cf0a
Pekka Jaaskelainen authored Nov 16, 2016
```
llvm-svn: 287111
```
8483cf0a
[X86] Add integer division test for PR23590 · 79416ea7
Simon Pilgrim authored Nov 16, 2016
```
Shows missed opportunity to recognise reduced integer division result size

llvm-svn: 287110
```
79416ea7

[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR · b57dd171

Simon Pilgrim authored Nov 16, 2016

Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108

b57dd171

[X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64 · 9e355bc5
Simon Pilgrim authored Nov 16, 2016
```
llvm-svn: 287106
```
9e355bc5
[X86] Regenerated integer divide tests to test on 32 and 64 bit targets · c223aa52
Simon Pilgrim authored Nov 16, 2016
```
llvm-svn: 287104
```
c223aa52
[X86][SSE] Added PSUBUS from SELECT tests from D25987 · dd8c71c6
Simon Pilgrim authored Nov 16, 2016
```
llvm-svn: 287103
```
dd8c71c6

[mips] Fix unsigned/signed type error · 8ca1cbcc

Simon Dardis authored Nov 16, 2016

MipsFastISel uses a a class to represent addresses with a signed member
to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress
all treated the offset as being positive. In cases where the offset was
actually negative and a frame pointer was used, this would cause the constant
synthesis routine to crash as it would generate an unexpected instruction
sequence when frame indexes are replaced.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26192

llvm-svn: 287099

8ca1cbcc

[mips] not instruction alias · 7b7cb8d9

Simon Dardis authored Nov 16, 2016

This patch adds the single operand form of the not alias to microMIPS and
MIPS along with additional tests.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

llvm-svn: 287097

7b7cb8d9

Remove TimeValue class · 0c20e05e

Pavel Labath authored Nov 16, 2016

Summary:
All uses have been replaced by appropriate std::chrono types, and the class is
now unused.

Reviewers: zturner, mehdi_amini

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D26447

llvm-svn: 287094

0c20e05e

[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics. · 4d60243b
Ayman Musa authored Nov 16, 2016
```
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 287087
```
4d60243b

[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul · 6910fa0e

Craig Topper authored Nov 16, 2016

Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083

6910fa0e

[ELF] Convert ELF.h to Expected<T>. · 6cf09265

Davide Italiano authored Nov 16, 2016

This has two advantages:
1) We slowly move away from ErrorOr to the new handling interface,
in the hope of having an uniform error handling in LLVM, eventually.
2) We're starting to have *meaningful* error messages for invalid
object ELF files, rather than a generic "parse error". At some point
we should include also the offset to improve the quality of the
diagnostic.

llvm-svn: 287081

6cf09265

test: use separate input file for test · d05c5aea

Saleem Abdulrasool authored Nov 16, 2016

Rather than using sed to generate the input and pipe the result to
strings, use the static input instead.

llvm-svn: 287079

d05c5aea

[AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFC · bf998c70
Konstantin Zhuravlyov authored Nov 16, 2016
```
Differential Revision: https://reviews.llvm.org/D26711

llvm-svn: 287077
```
bf998c70

AArch64: Use DeadRegisterDefinitionsPass before regalloc. · 3d51cf0a

Matthias Braun authored Nov 16, 2016

Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.

Differential Revision: https://reviews.llvm.org/D26111

llvm-svn: 287076

3d51cf0a

Fix build break when the host C compiler is C89. · 6b335d19
Richard Smith authored Nov 16, 2016
```
llvm-svn: 287075
```
6b335d19

[AMDGPU] Handle f16 select{_cc} · 2a87a420

Konstantin Zhuravlyov authored Nov 16, 2016

- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714

llvm-svn: 287074

2a87a420

[XRay][docs] Define requirements on installed log handlers. · 6eec7d41

Dean Michael Berris authored Nov 16, 2016

Summary:
We update the documentation to define what the requirements are for the
provided XRay log handler. This is to make it clear that the function
pointer provided must do internal synchronisation and that there are no
guarantees provided by XRay on when the function shall be invoked once
it has been installed as a log handler.

Reviewers: rSerge, rengolin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26651

llvm-svn: 287073

6eec7d41

[RegAllocGreedy] Record missed hint for late recoloring. · fb9b0cdc

Quentin Colombet authored Nov 16, 2016

In https://reviews.llvm.org/D25347, Geoff noticed that we still have
useless copy that we can eliminate after register allocation. At the
time the allocation is chosen for those copies, they are not useless
but, because of changes in the surrounding code, later on they might
become useless.
The Greedy allocator already has a mechanism to deal with such cases
with a late recoloring. However, we missed to record the some of the
missed hints.

This commit fixes that.

llvm-svn: 287070

fb9b0cdc

Align Modi and FileInfo substreams on 32-byte offsets. · fb1e6d22

Rui Ueyama authored Nov 16, 2016

This is required by DbiStream, but DbiStreamBuilder didn't align
these substreams, so the output of DbiSTreamBuilder couldn't be
read by DbiStream.

Test will be added to LLD.

llvm-svn: 287067

fb1e6d22

Fixed the lost FastMathFlags for CALL operations in SLPVectorizer. · b3dc774a
Vyacheslav Klochkov authored Nov 16, 2016
```
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
```
b3dc774a

[BypassSlowDivision] Handle division by constant numerators better. · 28605735

Justin Lebar authored Nov 16, 2016

Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

 * If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

 * If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

llvm-svn: 287062

28605735

[BypassSlowDivision] Simplify partially-tautological if statement. · 583b8687
Justin Lebar authored Nov 16, 2016
```
if (A || (B && A)) --> if (A).

llvm-svn: 287061
```
583b8687

Fix Modi and File count if there are more than 65535 modules/files. · 50701318

Rui Ueyama authored Nov 16, 2016

These numbers are intended to be capped at 65535, but
`std::max<uint16_t>(UINT16_MAX, N)` always returns N for any N because
the expression is the same as `std::max((uint16_t)UINT16_MAX, (uint16_t)N)`.

llvm-svn: 287060

50701318

Always use relative jump table encodings on PowerPC64. · 8c1a9ac5

Joerg Sonnenberger authored Nov 16, 2016

For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336

llvm-svn: 287059

8c1a9ac5

AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument · e8cc395e

Jan Vesely authored Nov 15, 2016

wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

llvm-svn: 287056

e8cc395e

[x86] regenerate checks; NFC · aaf43045
Sanjay Patel authored Nov 15, 2016
```
llvm-svn: 287051
```
aaf43045

General clean up of Mach-O error handling in llvm-objdump. · 844c4ac5

Kevin Enderby authored Nov 15, 2016

To get a good error message for all files that could contain Mach-O
files the code in llvm-objdump needs to use the archive member name
and name of the architecture of a slice of a universal file in those cases
where the error come from a Mach-O file in an archive or a universal file.

Most of this is fixed by moving the call to checkSymbolTable() into
ProcessMachO() and calling it when the operation needs the symbol
table. And then calling the form of report_error() that has the
ArchiveName and ArchitectureName arguments. One other place
needed to call this form of report_error() also with these arguments.

Also changed the code in MachODump.cpp to not use report_fatal_error()
and use report_error() instead to make the code smaller and cleaner. All
cases of this are for errors with the symbol table which should now never
be tripped since checkSymbolTable() should be called first to get a good
error message in these cases.

llvm-svn: 287050

844c4ac5

[x86] auto-generate better checks; NFC · 07529a31
Sanjay Patel authored Nov 15, 2016
```
llvm-svn: 287049
```
07529a31

Nov 15, 2016

[x86] auto-generate better checks; NFC · 87cb0745
Sanjay Patel authored Nov 15, 2016
```
llvm-svn: 287048
```
87cb0745

[AddressSanitizer] Add support for (constant-)masked loads and stores. · ec350b71

Filipe Cabecinhas authored Nov 15, 2016

This patch adds support for instrumenting masked loads and stores under
ASan, if they have a constant mask.

isInterestingMemoryAccess now supports returning a mask to be applied to
the loads, and instrumentMop will use it to generate additional checks.

Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and
stores (as well as a test to verify we don't add checks to non-constant
masks).

Differential Revision: https://reviews.llvm.org/D26230

llvm-svn: 287047

ec350b71