Commits · 4c7c87f245c5291ed38d47983e89d69b98660008 · Lorenzo Albano / LLVM bpEVL

Mar 05, 2020

[X86] Simplify the code at the end of lowerShuffleAsBroadcast. · 4c7c87f2

Craig Topper authored Mar 04, 2020

The original code could create a bitcast from f64 to i64 and back
on 32-bit targets. This was only working because getBitcast was
able to fold the casts away to avoid leaving the illegal i64 type.

Now we handle the scalar case directly by broadcasting using the
scalar type as the element type. Then bitcasting to the final VT.
This works since we ensure the scalar type is the same size as
the final VT element type. No more casts to i64.

For the vector case, we cast to VT or subvector of VT. And then
do the broadcast.

I think this all matches what we generated before, just in a more
readable way.

4c7c87f2

clang: Treat ieee mode as the default for denormal-fp-math · c64ca930

Matt Arsenault authored Nov 05, 2019

The IR hasn't switched the default yet, so explicitly add the ieee
attributes.

I'm still not really sure how the target default denormal mode should
interact with -fno-unsafe-math-optimizations. The target may have
selected the default mode to be non-IEEE based on the flags or based
on its true behavior, but we don't know which is the case. Since the
only users of a non-IEEE mode without a flag still support IEEE mode,
just reset to IEEE.

c64ca930

Consistently capitalize a variable [NFC] · c94a4133
Philip Reames authored Mar 04, 2020
```
One instance in a copy paste was pointed out in a review, fix all instances at once.
```
c94a4133

Fix dyld opcode *_ADD_ADDR_IMM_SCALED error detection. · df058699

Michael Trent authored Mar 04, 2020

Summary:
Move the check for malformed REBASE_OPCODE_ADD_ADDR_IMM_SCALED and
BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED opcodes after the immediate
has been applied to the SegmentOffset. This fixes specious errors
where SegmentOffset is pointing between two sections when trying to
correct the SegmentOffset value.

Update the regression tests to verify the proper error message.

Reviewers: pete, ab, lhames, steven_wu, jhenderson

Reviewed By: pete

Subscribers: hiraditya, dexonsmith, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75629

df058699

[DebugInfo] Avoid crashing on an invalid section identifier. · cc61283b

Igor Kudrin authored Mar 04, 2020

A DWARFSectionKind is read from input. It is not validated on parsing,
so an unexpected value may result in reaching llvm_unreachable() in
DWARFUnitIndex::getColumnHeader() when dumping the index section.

Differential Revision: https://reviews.llvm.org/D75609

cc61283b

[DAGCombine] Check the uses of negated floating constant and remove the hack · 3906ae38

QingShan Zhang authored Mar 05, 2020

PowerPC hits an assertion due to somewhat the same reason as https://reviews.llvm.org/D70975.
Though there are already some hack, it still failed with some case, when the operand 0 is NOT
a const fp, it is another fma that with const fp. And that const fp is negated which result in multi-uses.

A better fix is to check the uses of the negated const fp. If there are already use of its negated
value, we will have benefit as no extra Node is added.

Differential revision: https://reviews.llvm.org/D75501

3906ae38

[AVR][NFC] Use Register instead of unsigned · ea6eb813

Jim Lin authored Mar 05, 2020

Summary: Use Register type for variables instead of unsigned type.

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75595

ea6eb813

Fix buildbots with merge that didn't happen for 4050b01b . · ffe6695a
Greg Clayton authored Mar 04, 2020

ffe6695a

Fix GSYM tests to run the yaml files and fix test failures on some machines. · 4050b01b

Greg Clayton authored Mar 04, 2020

YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Fixed memory sanitizer bot bugs as well.

Differential Revision: https://reviews.llvm.org/D75390

4050b01b

AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics · 3fda1fde

hsmahesha authored Mar 05, 2020

Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s).

Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74688

3fda1fde

[X86] Add a private member function determinePaddingPrefix for X86AsmBackend · b3722dea

Shengchen Kan authored Mar 01, 2020

Summary: X86 can reduce the bytes of NOP by padding instructions with prefixes to get a better peformance in some cases. So a private member function `determinePaddingPrefix` is added to determine which prefix is the most suitable.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75357

b3722dea

[mlir][Linalg] Fix load/store operations generated while lower loops when · 755c0502

MaheshRavishankar authored Mar 04, 2020

output has zero rank.

While lowering to loops, no indices should be used in the load/store
operation if the buffer is zero-rank.

Differential Revision: https://reviews.llvm.org/D75391

755c0502

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes · f708c823

Philip Reames authored Mar 04, 2020

If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead.

This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation.

The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops.

For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent.

Note that this patch *doesn't* change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion.

There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically:
* If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86-64), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure.
* Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole.

On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction.

Differential Revision: https://reviews.llvm.org/D75203

f708c823

[libc++] Mark deprecation test as UNSUPPORTED on Clang 6 · 2b2a1a42
Louis Dionne authored Mar 04, 2020

2b2a1a42

[clangd] Track document versions, include them with diags, enhance logs · 2cd33e6f

Sam McCall authored Mar 04, 2020

Summary:
This ties to an LSP feature (diagnostic versioning) but really a lot
of the value is in being able to log what's happening with file versions
and queues more descriptively and clearly.

As such it's fairly invasive, for a logging patch :-\

Key decisions:
 - at the LSP layer, we don't reqire the client to provide versions (LSP
   makes it mandatory but we never enforced it). If not provided,
   versions start at 0 and increment. DraftStore handles this.
 - don't propagate magically using contexts, but rather manually:
   addDocument -> ParseInputs -> (ParsedAST, Preamble, various callbacks)
   Context-propagation would hide the versions from ClangdServer, which
   would make producing good log messages hard
 - within ClangdServer, treat versions as opaque and unordered.
   std::string is a convenient type for this, and allows richer versions
   for embedders. They're "mandatory" but "null" is a reasonable default.

Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75582

2cd33e6f

[clangd] Remove unused+broken InvalidationError class. · e6d9b2cb
Sam McCall authored Mar 05, 2020

e6d9b2cb
Revert "[clang-doc] Improving Markdown Output" · ea086d10
Petr Hosek authored Mar 04, 2020
```
This reverts commit 45499f38, it's
still failing on Windows bots.
```
ea086d10
Add constexpr to DenormalMode constructors · b2dcde08
Matt Arsenault authored Mar 04, 2020
```
This will allow their use in member initializers in a future commit.
```
b2dcde08
X86: Generate mir checks in sqrt test · 7459781b
Matt Arsenault authored Feb 12, 2020

7459781b

[ORC] Decompose LazyCallThroughManager::callThroughToSymbol() · 76c59a63

Stefan Gränitz authored Mar 04, 2020

Summary: Decompose callThroughToSymbol() into findReexport(), resolveSymbol(), notifyResolved() and reportCallThroughError(). This allows derived classes to reuse the functionality while adding their own code in between.

Reviewers: lhames

Reviewed By: lhames

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75084

76c59a63

[clangd] Cancel certain operations if the file changes before we start. · c627b120

Sam McCall authored Mar 04, 2020

Summary:
Otherwise they can force us to build lots of snapshots that we don't need.
Particularly, try to do this for operations that are frequently
generated by editors without explicit user interaction, and where
editing the file makes the result less useful. (Code action
enumeration is a good example).

https://github.com/clangd/clangd/issues/298

This doesn't return the "right" LSP error code (ContentModified) to the client,
we need to teach the cancellation API to distinguish between different causes.

Reviewers: kadircet

Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, jfb, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75602

c627b120

[X86] Convert vXi1 vectors to xmm/ymm/zmm types via... · eadea786

Craig Topper authored Mar 04, 2020

[X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file

Previously we tried to promote these to xmm/ymm/zmm by promoting
in the X86CallingConv.td file. But this breaks when we run out
of xmm/ymm/zmm registers and need to fall back to memory. We end
up trying to create a non-sensical scalar to vector. This lead
to an assertion. The new tests in avx512-calling-conv.ll all
trigger this assertion.

Since we really want to treat these types like we do on avx2,
it seems better to promote them before the calling convention
code gets involved. Except when the calling convention is one
that passes the vXi1 type in a k register.

The changes in avx512-regcall-Mask.ll are because we indicated
that xmm/ymm/zmm types should be passed indirectly for the
Win64 ABI before we go to the common lines that promoted the
vXi1 types. This caused the promoted types to be picked up by
the default calling convention code. Now we promote them earlier
so they get passed indirectly as though they were xmm/ymm/zmm.

Differential Revision: https://reviews.llvm.org/D75154

eadea786

Mar 04, 2020

[clangd] Fix isInsideMainFile to be aware of preamble. · 2be45697

Sam McCall authored Mar 02, 2020

Reviewers: kadircet

Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75460

2be45697

[dsymutil] Fix template stripping in getDIENames(...) to account for overloaded operators · 37549464

shafik authored Mar 04, 2020

Currently dsymutil when generating accelerator tables will attempt to strip the template parameters from names for subroutines.
For some overload operators which contain < in their names e.g. operator< the current method ends up stripping the operator name as well,
we just end up with the name operator in the table for each case.

Differential Revision: https://reviews.llvm.org/D75545

37549464

Partially inline basic_string copy constructor in UNSTABLE · b019c5c0

Martijn Vels authored Mar 04, 2020

    Summary:
    This is a recommit of https://reviews.llvm.org/D73223 where the added function accidentally ended up inside an idef block.

    This change splits the copy constructor up inlining short initialization, and explicitly outlining long initialization into __init_copy_ctor_external() which is the externally instantiated slow path.

    For unstable ABI, this has the following changes:

    remove basic_string(const basic_string&)
    remove basic_string(const basic_string&, const Allocator&)
    add __init_copy_ctor_external(const value_type*, size_type)
    Quick local benchmark for Copy:

    Master
    ```
    ---------------------------------------------------------------
    Benchmark                    Time             CPU   Iterations
    ---------------------------------------------------------------
    BM_StringCopy_Empty       3.50 ns         3.51 ns    199326720
    BM_StringCopy_Small       3.50 ns         3.51 ns    199510016
    BM_StringCopy_Large       15.7 ns         15.7 ns     45230080
    BM_StringCopy_Huge        1503 ns         1503 ns       464896
    ```
    With this change
    ```
    ---------------------------------------------------------------
    Benchmark                    Time             CPU   Iterations
    ---------------------------------------------------------------
    BM_StringCopy_Empty       1.99 ns         2.00 ns    356471808
    BM_StringCopy_Small       3.29 ns         3.30 ns    203425792
    BM_StringCopy_Large       13.3 ns         13.3 ns     52948992
    BM_StringCopy_Huge        1472 ns         1472 ns       475136
    ```

    Subscribers: libcxx-commits

    Tags: #libc

    Differential Revision: https://reviews.llvm.org/D75639

b019c5c0

[clang-doc] Improving Markdown Output · 45499f38

Petr Hosek authored Feb 29, 2020

This change has two components. The moves the generated file
for a namespace to the directory named after the namespace in
a file named 'index.<format>'. This greatly improves the browsing
experience since the index page is shown by default for a directory.

The second improves the markdown output by adding the links to the
referenced pages for children objects and the link back to the source
code.

Patch By: Clayton

Differential Revision: https://reviews.llvm.org/D72954

45499f38

[X86] Disable commuting for the first source operand of zero masked scalar fma... · 6ca96765

Craig Topper authored Mar 04, 2020

[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions.

I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources.

Differential Revision:m https://reviews.llvm.org/D75526

6ca96765

[scudo][standalone] Do not fill 32b regions at once · a0e86420

Kostya Kortchinsky authored Mar 03, 2020

Summary:
For the 32b primary, whenever we created a region, we would fill it
all at once (eg: create all the transfer batches for all the blocks
in that region). This wasn't ideal as all the potential blocks in
a newly created region might not be consummed right away, and it was
using extra memory (and release cycles) to keep all those free
blocks.

So now we keep track of the current region for a given class, and
how filled it is, carving out at most `MaxNumBatches` worth of
blocks at a time.

Additionally, lower `MaxNumBatches` on Android from 8 to 4. This
lowers the randomness of blocks, which isn't ideal for security, but
keeps things more clumped up for PSS/RSS accounting purposes.

Subscribers: #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D75551

a0e86420

AMDGPU: Remove VOP3OpSelMods0 complex pattern · 15bf916b
Matt Arsenault authored Mar 04, 2020
```
Use default operand of 0 instead.
```
15bf916b

[MLIR][Affine][NFC] Expose expandAffineMap · cdc5cba7

Frank Laub authored Mar 04, 2020

Summary:
Expose expandAffineMap so that it can be used by lowerings defined outside of
MLIR core.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D75589

cdc5cba7

[InstSimplify] Constant fold icmp of gep · c6ff3c9b

Nikita Popov authored Feb 29, 2020

InstSimplify can fold icmps of gep where the base pointers are the
same and the offsets are constant. It does so by constructing a
constant expression icmp and assumes that it gets folded -- but
this doesn't actually happen, because GEP expressions can usually
only be folded by the target-dependent constant folding layer.
As such, we need to explicitly invoke it here.

Differential Revision: https://reviews.llvm.org/D75407

c6ff3c9b

[mlir][vulkan-runner] Add basic timing for compute pipeline · f6981ac5

Lei Zhang authored Mar 03, 2020

This commit adds timestamp query commands in Vulkan runner's
compute pipeline to gain insights into how long it takes to
run the compute shader. This commit also adds timing from CPU
side for VkQueueSubmit and vkQueueWaitIdle.

Differential Revision: https://reviews.llvm.org/D75531

f6981ac5

Revert "[GlobalISel][Localizer] Enable intra-block localization of already-local uses." · 5583c2f2
Muhammad Omair Javaid authored Mar 05, 2020
```
This reverts commit e91e1df6.
```
5583c2f2
[libc++] Un-xfail GCC test for new version · 50b8088b
Eric Fiselier authored Mar 04, 2020

50b8088b

AMDGPU/GlobalISel: Don't use vector G_EXTRACT in arg lowering · 9e1d2afc

Matt Arsenault authored Feb 25, 2020

Create a wider source vector, and unmerge with dead defs like the
legalizer. The legalization handling for G_EXTRACT is incomplete, and
it's preferrable to keep everything in 32-bit pieces.

We should probably start moving these functions into utils, since we
have a growing number of places that do almost the same thing.

9e1d2afc

AMDGPU/GlobalISel: Switch target in argument test · f70e7dc1

Matt Arsenault authored Feb 25, 2020

Since this is still largely relying on the DAG argument type lowering
code, this has inherited the problem where i16 vectors have a
different ABI on targets with and without legal i16. Switch to using a
target with legal i16, so the i16 vector argument tests are more
useful.

f70e7dc1

GlobalISel: Move some legalizer functions to utils · b71203a7
Matt Arsenault authored Feb 24, 2020

b71203a7
GlobalISel: Set alignment on function argument stack load/store · fb0c35fa
Matt Arsenault authored Aug 28, 2019

fb0c35fa

[OPENMP50]Codegen for 'destroy' clause in depobj directive. · b27ff4d0

Alexey Bataev authored Mar 04, 2020

If the destroy clause is appplied, the previously allocated memory for
the dependency object must be destroyed.

b27ff4d0

Fix regression in bdad0a1b: force rebuilding of StmtExpr nodes in · f545ede9
Richard Smith authored Mar 04, 2020
```
TreeTransform if the 'dependent' flag would change.
```
f545ede9