Commits · 9af03864df746aa9a9cf3573da952ce6c5d902cd · Lorenzo Albano / LLVM bpEVL

Jan 17, 2021

mydeveloperday authored Jan 17, 2021

Reverting {D92753} due to issues with #pragma indentation in #ifdef/endif structure

9af03864

Reapply [BasicAA] Handle recursive queries more efficiently · 0b84afa5

Nikita Popov authored Nov 10, 2020

There are no changes relative to the original commit. However, an issue
this exposed in BasicAA assumption tracking has been fixed in the
previous commit.

-----

An alias query currently works out roughly like this:

 * Look up location pair in cache.
 * Perform BasicAA logic (including cache lookup and insertion...)
 * Perform a recursive query using BestAAResults.
   * Look up location pair in cache (and thus do not recurse into BasicAA)
   * Query all the other AA providers.
 * Query all the other AA providers.

This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().

There are some tradeoffs:

 * We can no longer pass through the precomputed underlying object
   to aliasCheck(). This is not a major concern, because nowadays
   getUnderlyingObject() is quite cheap.
 * Results from other AA providers are no longer cached inside
   BasicAA. The way this worked was already a bit iffy, in that a
   result could be cached, but if it was MayAlias, we'd still end
   up re-querying other providers anyway. If we want to cache
   non-BasicAA results, we should do that in a more principled manner.

In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.

Differential Revision: https://reviews.llvm.org/D90094

0b84afa5

[BasicAA] Move assumption tracking into AAQI · b1c2f128

Nikita Popov authored Jan 16, 2021

D91936 placed the tracking for the assumptions into BasicAA.
However, when recursing over phis, we may use fresh AAQI instances.
In this case AssumptionBasedResults from an inner AAQI can reesult
in a removal of an element from the outer AAQI.

To avoid this, move the tracking into AAQI. This generally makes
more sense, as the NoAlias assumptions themselves are also stored
in AAQI.

The test case only produces an assertion failure with D90094
reapplied. I think the issue exists independently of that change
as well, but I wasn't able to come up with a reproducer.

b1c2f128

[ELF] Support R_PPC_ADDR24 (ba foo; bla foo) · 3809f4eb
Fangrui Song authored Jan 17, 2021

3809f4eb

[VE] Support VE in libunwind · 3cbd476c

Kazushi (Jam) Marukawa authored Dec 26, 2020

Modify libunwind to support SjLj exception handling routines for VE.
In order to do that, we need to implement not only SjLj exception
handling routines but also a Registers_ve class.  This implementation
of Registers_ve is incomplete.  We will work on it later when we need
backtrace in libunwind.

Reviewed By: #libunwind, compnerd

Differential Revision: https://reviews.llvm.org/D94591

3cbd476c

[RISCV] Remove an extra map lookup from RISCVCompressInstEmitter. NFC · 061f681c

Craig Topper authored Jan 16, 2021

When we looked up the map to see if the entry already existed,
this created the new entry for us. So save a reference to it so
we can use it to update the entry instead of looking it up again.

Also remove unnecessary StringRef constructors around string
literals on calls to this function.

061f681c

[RISCV] Few more minor cleanups to RISCVCompressInstEmitter. NFC · 1327c730

Craig Topper authored Jan 16, 2021

-Use StringRef instead of std::string.
-Const correct a parameter.
-Don't call StringRef::data() before printing. Just pass the StringRef.

1327c730

[RISCV] Simplify mergeCondAndCode in RISCVCompressInstEmitter.cpp. NFC · 2b6a9262

Craig Topper authored Jan 16, 2021

Instead forming a std::string and returning it to pass into another
raw_ostream, just pass the raw_ostream as a parameter.

Take StringRef as arguments instead raw_string_ostream references
making the caller responsible for converting to strings. Use
StringRef operations instead of std::string::substr.a

2b6a9262

[RISC] Replace dyn_casts that are only checked by an assert with a cast. NFC · 97f7e4e8
Craig Topper authored Jan 16, 2021

97f7e4e8

[RISCV] Remove unneeded StringRef to std::string conversions in RISCVCompressInstEmitter. NFC · 633c5afc

Craig Topper authored Jan 16, 2021

Stop concatenating std::string before streaming into a raw_ostream.
Just stream the pieces.

Remove some new lines from asserts. Remove std::string concatenation
from an assert. assert strings aren't really evaluated like this at
runtime. An assertion failure will just print exactly what's between
the parentheses in the source.

633c5afc

[X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g · a048ce13

Fangrui Song authored Jan 16, 2021

Fix PR48742: the D75203 assembler optimization locates MCRelaxableFragment's
within two MCSymbol's and relaxes some MCRelaxableFragment's to reduce the size
of a MCAlignFragment.  A -g build has more MCSymbol's and therefore may have
different assembler output (e.g. a MCRelaxableFragment (jmp) may have 5 bytes
with -O1 while 2 bytes with -O1 -g).

`.p2align 4, 0x90` is common due to loops. For a larger program, with a
lot of temporary labels, the assembly output difference is somewhat
destined. The cost seems to overweigh the benefits so we default to
-x86-pad-for-align=false until the heuristic is improved.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D94542

a048ce13

Jan 16, 2021

[InstCombine] Replace one-use select operand based on condition · 5238e7b3

Nikita Popov authored Jan 16, 2021

InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862

5238e7b3

[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree · 32fc3231

Roman Lebedev authored Jan 16, 2021

When removing catchpad's from catchswitch, if that removes a successor,
we need to record that in DomTreeUpdater.

This fixes PostDomTree preservation failure in an existing test.
This appears to be the single issue that i see in my current test coverage.

32fc3231

[ARM] Align blocks that are not fallthough targets · 14547242

David Green authored Jan 16, 2021

If the previous block in a function does not fallthough, adding nop's to
align it will never be executed. This means we can freely (except for
codesize) align more branches. This happens in constantislandspass (as
it cannot happen later) and only happens at aggressive optimization
levels as it does increase codesize.

Differential Revision: https://reviews.llvm.org/D94394

14547242

[ARM] Test for aligned blocks. NFC · 2a5b576e
David Green authored Jan 16, 2021

2a5b576e
[NFC] Removed extra text in comments · bfd75bdf
Dávid Bolvanský authored Jan 16, 2021

bfd75bdf

[mlir][sparse] improved sparse runtime support library · d8fc2730

Aart Bik authored Jan 15, 2021

Added the ability to read (an extended version of) the FROSTT
file format, so that we can now read in sparse tensors of arbitrary
rank. Generalized the API to deal with more than two dimensions.

Also added the ability to sort the indices of sparse tensors
lexicographically. This is an important step towards supporting
auto gen of initialization code, since sparse storage formats
are easier to initialize if the indices are sorted. Since most
external formats don't enforce such properties, it is convenient
to have this ability in our runtime support library.

Lastly, the re-entrant problem of the original implementation
is fixed by passing an opaque object around (rather than having
a single static variable, ugh!).

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D94852

d8fc2730

[OpenMP] Added the support for hidden helper task in RTL · ed939f85

Shilei Tian authored Jan 16, 2021

The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609

ed939f85

[SLP] remove opcode field from reduction data class · 49b96cd9

Sanjay Patel authored Jan 16, 2021

This is NFC-intended and another step towards supporting
intrinsics as reduction candidates.

The remaining bits of the OperationData class do not make
much sense as-is, so I will try to improve that, but I'm
trying to take minimal steps because it's still not clear
how this was intended to work.

49b96cd9

[SLP] fix typos; NFC · fcfcc3cc
Sanjay Patel authored Jan 16, 2021

fcfcc3cc

[SLP] remove unnecessary use of 'OperationData' · 48dbac5b

Sanjay Patel authored Jan 16, 2021

This is another NFC-intended patch to allow matching
intrinsics (example: maxnum) as candidates for reductions.

It's possible that the loop/if logic can be reduced now,
but it's still difficult to understand how this all works.

48dbac5b

[InstSimplify] Handle commutativity for 'and' and 'outer or' for (~A & B) | ~(A | B) --> ~A · 63bedc80
Dávid Bolvanský authored Jan 16, 2021
```
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D94870
```
63bedc80

[ARM] Add low overhead loops terminators to AnalyzeBranch · 372eb2bb

David Green authored Jan 16, 2021

This treats low overhead loop branches the same as jump tables and
indirect branches in analyzeBranch - they cannot be analyzed but the
direct branches on the end of the block may be removed. This helps
remove the unnecessary branches earlier, which can help produce better
codegen (and change block layout in a number of cases).

Differential Revision: https://reviews.llvm.org/D94392

372eb2bb

[ARM] Remove LLC tests from transform/hardware loop tests. · c1ab698d

David Green authored Jan 16, 2021

We now have a lot of llc tests for hardware loops in CodeGen, which test
a larger variety of loops and are easier to maintain. This removes the
llc from mixed llc/opt tests.

c1ab698d

[InstSimplify] Precommit new testcases; NFC · 416854d0
Dávid Bolvanský authored Jan 16, 2021

416854d0
[llvm] Use *::empty (NFC) · 2082b10d
Kazu Hirata authored Jan 16, 2021

2082b10d
[llvm] Construct SmallVector with iterator ranges (NFC) · 19aacdb7
Kazu Hirata authored Jan 16, 2021

19aacdb7
[StringExtras] Fix comment typos (NFC) · ba0fc7e1
Kazu Hirata authored Jan 16, 2021

ba0fc7e1

[LTO] Remove options to disable inlining, vectorization & GVNLoadPRE. · bca16e2f

Florian Hahn authored Jan 16, 2021

This patch removes some ancient options as a clean-up before moving
code-gen to use LTOBackend in D94487.

I think it would preferable to remove those ancient options, because

  1. There are no corresponding options in LTOBackend based tools,
  2. There are no unit tests for them,
  3. They are not passed through by Clang,
  4. At least for GNVLoadPRE, users could just use GVN's `enable-load-pre`.

Alternatively we could add support for those options to lto::Config &
co, but I think it would be better to remove them, unless they are
actually used in practice.

Reviewed By: steven_wu, tejohnson

Differential Revision: https://reviews.llvm.org/D94783

bca16e2f

[InstSimplify] Update comments, remove redundant tests · bdd4dda5
Dávid Bolvanský authored Jan 16, 2021

bdd4dda5

[RISCV] Correct alignment settings for vector registers. · 098dbf19

Hsiangkai Wang authored Jan 15, 2021

According to "9. Vector Memory Alignment Constraints" in V
specification, the alignment of vector memory access is aligned to the
size of the element. In our current implementation, we support ELEN up
to 64. We could assume the alignment of vector registers is 64 under the
assumption.

Differential Revision: https://reviews.llvm.org/D94751

098dbf19

[InstSimplify] Add (~A & B) | ~(A | B) --> ~A · a4e2a514
Dávid Bolvanský authored Jan 16, 2021

a4e2a514
[Tests] Added tests for new instcombine or simplification; NFC · 9fc814ed
Dávid Bolvanský authored Jan 16, 2021

9fc814ed

Fix llvm::Optional build breaks in MSVC using std::is_trivially_copyable · 25c1578a

James Player authored Jan 16, 2021

Current code breaks this version of MSVC due to a mismatch between `std::is_trivially_copyable` and `llvm::is_trivially_copyable` for `std::pair` instantiations.  Hence I was attempting to use `std::is_trivially_copyable` to set `llvm::is_trivially_copyable<T>::value`.

I spent some time root causing an `llvm::Optional` build error on MSVC 16.8.3 related to the change described above:

```
62>C:\src\ocg_llvm\llvm-project\llvm\include\llvm/ADT/BreadthFirstIterator.h(96,12): error C2280: 'llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>>::operator =(const llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &)': attempting to reference a deleted function (compiling source file C:\src\ocg_llvm\llvm-project\llvm\unittests\ADT\BreadthFirstIteratorTest.cpp)
...
```
The "trivial" specialization of `optional_detail::OptionalStorage` assumes that the value type is trivially copy constructible and trivially copy assignable. The specialization is invoked based on a check of `is_trivially_copyable` alone, which does not imply both `is_trivially_copy_assignable` and `is_trivially_copy_constructible` are true.

[[ https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable | According to the spec ]], a deleted assignment operator does not make `is_trivially_copyable` false. So I think all these properties need to be checked explicitly in order to specialize `OptionalStorage` to the "trivial" version:
```
/// Storage for any type.
template <typename T, bool = std::is_trivially_copy_constructible<T>::value
                          && std::is_trivially_copy_assignable<T>::value>
class OptionalStorage {
```
Above fixed my build break in MSVC, but I think we need to explicitly check `is_trivially_copy_constructible` too since it might be possible the copy constructor is deleted.  Also would be ideal to move over to `std::is_trivially_copyable` instead of the `llvm` namespace verson.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93510

25c1578a

[ASTMatchers] Add support for CXXRewrittenBinaryOperator · b765eaf9
Stephen Kelly authored Jan 05, 2021
```
Differential Revision: https://reviews.llvm.org/D94130
```
b765eaf9

[ASTMatchers] Add binaryOperation matcher · e810e95e

Stephen Kelly authored Jan 05, 2021

This is a simple utility which allows matching on binaryOperator and
cxxOperatorCallExpr. It can also be extended to support
cxxRewrittenBinaryOperator.

Add generic support for MapAnyOfMatchers to auto-marshalling functions.

Differential Revision: https://reviews.llvm.org/D94129

e810e95e

[LegalizeDAG] Handle NeedInvert when expanding BR_CC · 4f155567

Bjorn Pettersson authored Jan 15, 2021

This is a follow-up fix to commit 03c8d6a0.
Seems like we now end up with NeedInvert being set in the result
from LegalizeSetCCCondCode more often than in the past, so we
need to handle NeedInvert when expanding BR_CC.

Not sure how to deal with the "Tmp4.getNode()" case properly,
but current assumption is that that code path isn't impacted
by the changes in 03c8d6a0 so we can simply move
the old assert into the if-branch and only handle NeedInvert in the
else-branch.

I think that the test case added here, for PowerPC, might have
failed also before commit 03c8d6a0. But we started
to hit the assert more often downstream when having merged that
commit.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94762

4f155567

[ASTMatchers] Make cxxOperatorCallExpr matchers API-compatible with n-ary operators · dbe056c2
Stephen Kelly authored Jan 02, 2021
```
This makes them composable with mapAnyOf().

Differential Revision: https://reviews.llvm.org/D94128
```
dbe056c2

[ASTMatchers] Add mapAnyOf matcher · a7101450

Stephen Kelly authored Jan 01, 2021

Make it possible to compose a matcher for different base nodes.

This accepts one or more node matcher functors and zero or more
matchers, composing the latter into the former.

This allows composing of matchers where the same inner matcher name is
used for the same concept, but with a different node functor. Currently,
there is a limitation that the nodes must be in the same "clade", so
while

  mapAnyOf(ifStmt, forStmt).with(hasBody(stmt()))

can be used, functionDecl can not be added to the tuple.

It is possible to use this in clang-query, but it will require changes
to the QueryParser, so is deferred to a future review.

Differential Revision: https://reviews.llvm.org/D94127

a7101450

[InstCombine] Add more tests for select operand replacement (NFC) · f0a0ec2d
Nikita Popov authored Jan 16, 2021

f0a0ec2d