Commits · 7a462ab7ae7d4fe45eace5942519bf305391e11f · Roger Ferrer / llvm-epi

Mar 08, 2019

[cmake] Remove llvm from LLVM_ALL_PROJECTS · 7a462ab7

Shoaib Meenai authored Mar 08, 2019

LLVM is always built; including it in LLVM_ENABLE_PROJECTS has no
effect, but since it's in LLVM_ALL_PROJECTS, we produce a confusing
message about it being disabled. Drop it from LLVM_ALL_PROJECTS to avoid
this. Pointed out by David Greene on the mailing list [1].

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-March/130854.html

llvm-svn: 355735

7a462ab7

[GN] Merge 355720. · 13661a9c
Mitch Phillips authored Mar 08, 2019
```
llvm-svn: 355734
```
13661a9c

[RegionPass] Fix forgotten "!". · 65c5821e

Michael Kruse authored Mar 08, 2019

Commit r355068 "Fix IR/Analysis layering issue with OptBisect" uses the
template

   return Gate.isEnabled() && !Gate.shouldRunPass(this, getDescription(...));

for all pass kinds. For the RegionPass, it left out the not operator,
causing region passes to be skipped as soon as a pass gate is used.

llvm-svn: 355733

65c5821e

AMDGPU: Move d16 load matching to preprocess step · e8c03a25

Matt Arsenault authored Mar 08, 2019

When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.

I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.

llvm-svn: 355731

e8c03a25

DAG: Don't try to cluster loads with tied inputs · 26e76ef0

Matt Arsenault authored Mar 08, 2019

This avoids breaking possible value dependencies when sorting loads by
offset.

AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.

Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.

llvm-svn: 355728

26e76ef0

[x86] add tests for extracted vector FP cmp; NFC · 43f098e7
Sanjay Patel authored Mar 08, 2019
```
llvm-svn: 355727
```
43f098e7
Revert "[runtimes] Move libunwind, libc++abi and libc++ to lib/ and include/" · 1262e52e
Matthew Voss authored Mar 08, 2019
```
This broke the windows bots.

This reverts commit 28302c66.

llvm-svn: 355725
```
1262e52e
AMDGPU: Add more tests for d16 loads · 74c9c305
Matt Arsenault authored Mar 08, 2019
```
Also fix a few cases that weren't testing what they were supposed to.

llvm-svn: 355724
```
74c9c305

AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtr · f587fd9c

Matt Arsenault authored Mar 08, 2019

This is only called in contexts that are verifying the chain itself,
and the query itself is only asking about the address.

llvm-svn: 355723

f587fd9c

AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr · 07f904be

Matt Arsenault authored Mar 08, 2019

This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.

llvm-svn: 355722

07f904be

[DEBUG_INFO][NVPTX]Emit empty .debug_loc section in presence of the debug option. · 78fcb838

Alexey Bataev authored Mar 08, 2019

Summary:
If the LLVM module shows that it has debug info, but the file is
actually empty and the real debug info is not emitted, the ptxas tool
emits error 'Debug information not found in presence of .target debug'.
We need at leas one empty debug section to silence this message. Section
`.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc`
section if `debug` option was emitted.

Reviewers: tra

Subscribers: jholewinski, aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D57250

llvm-svn: 355719

78fcb838

[DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a) · 782ac933

Amaury Sechet authored Mar 08, 2019

Summary: This pattern is sometime created after legalization.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58874

llvm-svn: 355716

782ac933

[CFLAnders] Fix typo in comment; NFC · 4ea679f1

George Burgess IV authored Mar 08, 2019

Patch by Enna1!

Differential Revision: https://reviews.llvm.org/D58756

llvm-svn: 355715

4ea679f1

[RegisterCoalescer] Limit the number of joins for large live interval with · 72ec6801

Wei Mi authored Mar 08, 2019

many valnos.

Recently we found compile time out problem in several cases when
SpeculativeLoadHardening was enabled. The significant compile time was spent
in register coalescing pass, where register coalescer tried to join many other
live intervals with some very large live intervals with many valnos.

Specifically, every time JoinVals::mapValues is called, computeAssignment will
be called by getNumValNums() times of the target live interval. If the large
live interval has N valnos and has N copies associated with it, trying to
coalescing those copies will at least cost N^2 complexity.

The patch adds some limit to the effort trying to join those very large live
intervals with others. By default, for live interval with > 100 valnos, and
when it has been coalesced with other live interval by more than 100 times,
we will stop coalescing for the live interval anymore. That put a compile
time cap for the N^2 algorithm and effectively solves the compile time
problem we saw.

Differential revision: https://reviews.llvm.org/D59143

llvm-svn: 355714

72ec6801

[x86] prevent infinite looping from inverse shuffle transforms · b22f438d
Sanjay Patel authored Mar 08, 2019
```
llvm-svn: 355713
```
b22f438d
[X86] Add test case for PR22473 · 53652fea
Simon Pilgrim authored Mar 08, 2019
```
llvm-svn: 355712
```
53652fea

[ARM][FIX] Fix vfmal.f16 and vfmsl.f16 operand · c20c37ba

Diogo N. Sampaio authored Mar 08, 2019

The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).

This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.

Differential Revision: https://reviews.llvm.org/D59021

llvm-svn: 355707

c20c37ba

Fix typo in constant vector · 00ab0339
Simon Pilgrim authored Mar 08, 2019
```
llvm-svn: 355699
```
00ab0339

[llvm-readelf]Don't lose negative-ness of negative addends for no symbol relocations · b41130be

James Henderson authored Mar 08, 2019

llvm-readelf prints relocation addends as:

  <symbol value>[+-]<absolute addend>

where [+-] is determined from whether addend is less than zero or not.
However, it does not print the +/- if there is no symbol, which meant
that negative addends became their positive value with no indication
that this had happened. This patch stops the absolute conversion when
addends are negative and there is no associated symbol.

Reviewed by: Higuoxing, mattd, MaskRay

Differential Revision: https://reviews.llvm.org/D59095

llvm-svn: 355696

b41130be

gn build: Merge r355685 · 6bce2f8e
Nico Weber authored Mar 08, 2019
```
llvm-svn: 355695
```
6bce2f8e

gn build: Unbreak finding a working `gn` on $PATH on Unix after r355645 · c3130a8a

Nico Weber authored Mar 08, 2019

From the Python subprocess docs:

   If shell is True, it is recommended to pass args as a string rather than as
   a sequence.

   [...]

   If args is a sequence, the first item specifies the command string, and any
   additional items will be treated as additional arguments to the shell itself.

Prior to this change, the `--version` would be passed to the shell, not to
a potential gn binary on $PATH, and running `gn` without any arguments makes
it exit with an exit code != 0, so the script would think that there wasn't
a working gn binary on $PATH.

Fix this by following the documentation's recommendation of using a string
now that we pass shell=True. I tested this on macOS and Windows, each with
the three cases of

- no gn on PATH (should run gn downloaded by get.py if present,
  else suggest running get.py)
- broken gn wrapper on PATH (should behave like the previous item)
- working gn on PATH (should use gn on PATH)

llvm-svn: 355694

c3130a8a

gn build: Unbreak get.py and gn.py on Windows · 38e6bcc1

Nico Weber authored Mar 08, 2019

`os.uname()` doesn't exist on Windows, so use `platform.machine()` which
returns `os.uname()[4]` on non-Win and (on 64-bit systems) "AMD64" on Windows.
Also use `sys.platform` instead of `platform` to check for Windows-ness for the
file extension in gn.py (get.py got this right).

Differential Revision: https://reviews.llvm.org/D59115

llvm-svn: 355693

38e6bcc1

[DAGCombine] Merge visitSMULO+visitUMULO into visitMULO. NFCI. · 04e8439f
Simon Pilgrim authored Mar 08, 2019
```
llvm-svn: 355690
```
04e8439f
[DAGCombine] Merge visitSADDO+visitUADDO into visitADDO. NFCI. · c71d6d15
Simon Pilgrim authored Mar 08, 2019
```
llvm-svn: 355689
```
c71d6d15
[DAGCombine] Merge visitSSUBO+visitUSUBO into visitSUBO. NFCI. · 2c2e76a9
Simon Pilgrim authored Mar 08, 2019
```
llvm-svn: 355688
```
2c2e76a9

[IR][ARM] Add function pointer alignment to datalayout · 308e82ec

Michael Platings authored Mar 08, 2019

Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.

Differential Revision: https://reviews.llvm.org/D57335

llvm-svn: 355685

308e82ec

[SelectionDAG] Allow the user to specify a memeq function. · 8e16d733

Clement Courbet authored Mar 08, 2019

Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672

8e16d733

[AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions · 1a98dc18

Carl Ritson authored Mar 08, 2019

Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions.

Reviewers: arsenm, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59091

llvm-svn: 355671

1a98dc18

[X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather. · 4505c99e

Craig Topper authored Mar 08, 2019

We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.

For FP we now explicitly check for only double and float.

For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.

We only check bitwidth after checking that the type is an integer.

llvm-svn: 355667

4505c99e

[runtimes] Move libunwind, libc++abi and libc++ to lib/ and include/ · 28302c66

Petr Hosek authored Mar 08, 2019

This change is a consequence of the discussion in "RFC: Place libs in
Clang-dedicated directories", specifically the suggestion that
libunwind, libc++abi and libc++ shouldn't be using Clang resource
directory.  Tools like clangd make this assumption, but this is
currently not true for the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR build.
This change addresses that by moving the output of these libraries to
lib/<target> and include/ directories, leaving resource directory only
for compiler-rt runtimes and Clang builtin headers.

Differential Revision: https://reviews.llvm.org/D59013

llvm-svn: 355665

28302c66

[Bitcode] Fix bitcode compatibility issue with clang.arc.use intrinsic · ed982292

Steven Wu authored Mar 08, 2019

Summary:
In r349534, objc arc implementation is switched to use intrinsics and at
the same time, clang.arc.use is renamed to llvm.objc.clang.arc.use to
make the naming more consistent. The side-effect of that is llvm no
longer recognize it as intrinsics and codegen external references to
it instead.

Rather than upgrade the old intrinsics name to the new one and wait for
the arc-contract pass to remove it, simply remove it in the bitcode
upgrader.

rdar://problem/48607063

Reviewers: pete, ahatanak, erik.pilkington, dexonsmith

Reviewed By: pete, dexonsmith

Subscribers: jkorous, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59112

llvm-svn: 355663

ed982292

[x86] add extract FP tests for target-specific nodes; NFC · 5ed14ef1
Sanjay Patel authored Mar 07, 2019
```
llvm-svn: 355655
```
5ed14ef1

Temporarily diasble debug output in GenericDomTreeConstruction.h · de04a8c1

Adrian Prantl authored Mar 07, 2019

to get the modules bots running again.

The LLVM_DEBUG macro only plays well with a modular build of LLVM when
the header is marked as textual, but doing so causes redefinition
errors.

llvm-svn: 355653

de04a8c1

Make GenericDomTreeConstruction textual instead. · 1d1ff88b
Adrian Prantl authored Mar 07, 2019
```
I think the problem is that it uses the LLVM_DEBUG macro in funciton bodies.

llvm-svn: 355652
```
1d1ff88b

Mar 07, 2019

Work around a module build error on the LLDB incremental green dragon bot. · d61c80b8
Adrian Prantl authored Mar 07, 2019
```
llvm-svn: 355646
```
d61c80b8

[GN] Locate prebuilt binaries correctly. · c90886b9

Mitch Phillips authored Mar 07, 2019

Use the system shell to see if we can find a 'gn' binary on $PATH. This solves the error wherein subprocess.call fails ungracefully if the binary doesn't exist.

llvm-svn: 355645

c90886b9

Add secondary libstdc++ 4.8 and 5.1 detection mechanisms · 51dcfdbb

Hubert Tong authored Mar 07, 2019

Summary:
The date-based approach to detecting unsupported versions of libstdc++
does not handle bug fix releases of older versions. As an example, the
`__GLIBCXX__` value associated with version 5.1, `20150422`, is less
than the values associated with versions 4.8.5 and 4.9.3.

This patch adds secondary checks based on certain properties in
sufficiently new versions of libstdc++.

Reviewers: jfb, tstellar, rnk, sfertile, nemanjai

Reviewed By: jfb

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58682

llvm-svn: 355638

51dcfdbb

[X86] Correct scheduler information for rotate by constant for Haswell, Broadwell, and Skylake. · d0c2dba6

Craig Topper authored Mar 07, 2019

Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not.

The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that.

RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd.

Differential Revision: https://reviews.llvm.org/D59077

llvm-svn: 355636

d0c2dba6

[X86] Model ADC/SBB with immediate 0 more accurately in the Haswell scheduler model · b3af5d3e

Craig Topper authored Mar 07, 2019

Haswell and possibly Sandybridge have an optimization for ADC/SBB with immediate 0 to use a single uop flow. This only applies GR16/GR32/GR64 with an 8-bit immediate. It does not apply to GR8. It also does not apply to the implicit AX/EAX/RAX forms.

Differential Revision: https://reviews.llvm.org/D59058

llvm-svn: 355635

b3af5d3e

[CodeGen] Reuse BlockUtils for -unreachableblockelim pass (NFC) · 4e467043

Brian Gesiak authored Mar 07, 2019

Summary:
The logic in the -unreachableblockelim pass does the following:

1. It traverses the function it's given in depth-first order and
   creates a set of basic blocks that are unreachable from the
   function's entry node.
2. It iterates over each of those unreachable blocks and (1) removes any
   successors' references to the dead block, and (2) replaces any uses of
   instructions from the dead block with null.

The logic in (2) above is identical to what the `llvm::DeleteDeadBlocks`
function from `BasicBlockUtils.h` does. The only difference is that
`llvm::DeleteDeadBlocks` replaces uses of instructions from dead blocks
not with null, but with undef.

Replace the duplicate logic in the -unreachableblockelim pass with a
call to `llvm::DeleteDeadBlocks`. This results in less code but no
functional change (NFC).

Reviewers: mkazantsev, wmi, davidxl, silvas, davide

Reviewed By: davide

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59064

llvm-svn: 355634

4e467043