- May 05, 2020
-
-
Andrea Di Biagio authored
[MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793. This fixes a regression introduced by a very old commit 280ac1fd (was llvm-svn 361950). Commit 280ac1fd redesigned the logic in the LSUnit with the goal of speeding up isReady() queries, and stabilising the LSUnit API (while also making the load store unit more customisable). The concept of MemoryGroup (effectively an alias set) was added by that commit to better describe and track dependencies between memory operations. However, that concept was not just used for alias dependencies, but it was also used for describing memory "order" dependencies (enforced by the memory consistency model). Instructions of a same memory group were considered "equivalent" as in: independent operations that can potentially execute in parallel. The problem was that the cost of a dependency (in terms of number of cycles) should have been different for "order" dependency. Instructions in an order dependency simply have to have to wait until their predecessors are "issued" to an underlying pipeline (rather than having to wait until predecessors have beeng fully executed). For simple "order" dependencies, this was effectively introducing an artificial delay on the "issue" of independent loads and stores. This patch fixes the issue and adds a new test named 'independent-load-stores.s' to a bunch of x86 targets. That test contains the reproducible posted by Fabian Ritter on PR45793. I had to rerun the update-mca-tests script on several files. To avoid expected regressions on some Exynos tests, I have added a -noalias=false flag (to match the old strict behavior on latencies). Some tests for processor Barcelona are improved/fixed by this change and they now show better results. In a few tests we were incorrectly counting the time spent by instructions in a scheduler queue. In one case in particular we now correctly see a store executed out of order. That test was affected by the same underlying issue reported as PR45793. Reviewers: mattd Differential Revision: https://reviews.llvm.org/D79351
-
Pratyai Mazumder authored
Reviewers: vitalybuka, kcc Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79392
-
Alex Zinenko authored
This was missing from the original commit that changed the interface of `::build` methods to take `OpBuilder &` instead of `Builder *.
-
Heejin Ahn authored
Summary: This fixes a few things that are connected. It is very hard to provide an independent test case for each of those fixes, because they are interconnected and sometimes one masks another. The provided test case triggers some of those bugs below but not all. --- 1. Background: `placeBlockMarker` takes a BB, and if the BB is a destination of some branch, it places `end_block` marker there, and computes the nearest common dominator of all predecessors (what we call 'header') and places a `block` marker there. When we first place markers, we traverse BBs from top to bottom. For example, when there are 5 BBs A, B, C, D, and E and B, D, and E are branch destinations, if mark the BB given to `placeBlockMarker` with `*` and draw a rectangle representing the border of `block` and `end_block` markers, the process is going to look like ``` ------- ----- |-----| --- |---| ||---|| |A| ||A|| |||A||| --- --> |---| --> ||---|| *B | B | || B || C | C | || C || D ----- |-----| E *D | D | E ------- *E ``` which means when we first place markers, we go from inner to outer scopes. So when we place a `block` marker, if the header already contains other `block` or `try` marker, it has to belong to an inner scope, so the existing `block`/`try` markers should go _after_ the new marker. This was the assumption we had. But after placing all markers we run `fixUnwindMismatches` function. There we do some control flow transformation and create some branches, and we call `placeBlockMarker` again to place `block`/`end_block` markers for those newly created branches. We can't assume that we are traversing branch destination BBs from top to bottom now because we are basically inserting some new markers in the middle of existing markers. Fix: In `placeBlockMarker`, we don't have the assumption that the BB given is in the order of top to bottom, and when placing `block` markers, calculates whether existing `block` or `try` markers are inner or outer scopes with respect to the current scope. --- 2. Background: In `fixUnwindMismatches`, when there is a call whose correct unwind destination mismatches the current destination after initially placing `try` markers, we wrap that with a new nested `try`/`catch`/`end` and jump to the correct handler within the new `catch`. The correct handler code is split as a separate BB from its original EH pad so it can be branched to. Here's an example: - Before ``` mbb: call @foo <- Unwind destination mismatch! wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch [handler code] ``` - After ``` mbb: try (new) call @foo nested-ehpad: (new) catch (new) local.set n / drop (new) br %handleri (new) nested-end: (new) end_try (new) wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch local.set n / drop (new) handler: (new) end_try [handler code] ``` Note that after this transformation, it is possible there are no calls to actually unwind to `correct-ehpad` here. `call @foo` now branches to `handler`, and there can be no other calls to unwind to `correct-ehpad`. In this case `correct-ehpad` does not have any predecessors anymore. This can cause a bug in `placeBlockMarker`, because we may need to place `end_block` marker in `handler`, and `placeBlockMarker` computes the nearest common dominator of all predecessors. If one of `handler`'s predecessor (here `correct-ehpad`) does not have any predecessors, i.e., no way of reaching it, we cannot correctly compute the common dominator of predecessors of `handler`, and end up placing no `block`/`end` markers. This bug actually sometimes masks the bug 1. Fix: When we have an EH pad that does not have any predecessors after this transformation, deletes all its successors, so that its successors don't have any dangling predecessors. --- 3. Background: Actually the `handler` BB in the example shown in bug 2 doesn't need `end_block` marker, despite it being a new branch destination, because it already has `end_try` marker which can serve the same purpose. I just put that example there for an illustration purpose. There is a case we actually need to place `end_block` marker: when the branch dest is the appendix BB. The appendix BB is created when there is a call that is supposed to unwind to the caller ends up unwinding to a wrong EH pad. In this case we also wrap the call with a nested `try`/`catch`/`end`, create an 'appendix' BB at the very end of the function, and branch to that BB, where we rethrow the exception to the caller. Fix: When we don't actually need to place block markers, we don't. --- 4. In case we fall through to the continuation BB after the catch block, after extracting handler code in `fixUnwindMismatches` (refer to bug 2 for an example), we now have to add a branch to it to bypass the handler. - Before ``` try ... (falls through to 'cont') catch handler body end <-- cont ``` - After ``` try ... br %cont (new) catch end handler body <-- cont ``` The problem is, we haven't been placing a new `end_block` marker in the `cont` BB in this case. We should, and this fixes it. But it is hard to provide a test case that triggers this bug, because the current compilation pipeline from .ll to .s does not generate this kind of code; we always have a `br` after `invoke`. But code without `br` is still valid, and we can have that kind of code if we have some pipeline changes or optimizations later. Even mir test cases cannot trigger this part for now, because we don't encode auxiliary EH-related data structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities can be added later, but I don't think we should block this fix on that. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79324
-
Pierre-vh authored
This patch makes the folding of or(A, B) into not(and(not(A), not(B))) more agressive for I1 vector. This only affects Thumb2 MVE and improves codegen, because it removes a lot of msr/mrs instructions on VPR.P0. This patch also adds a xor(vcmp) -> !vcmp fold for MVE. Differential Revision: https://reviews.llvm.org/D77202
-
Pierre-vh authored
This patch adds an implementation of PerformVSELECTCombine in the ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into vselect(cond, rhs, lhs). Normally, this should be done by the target-independent DAG Combiner, but it doesn't handle the kind of constants that we generate, so we have to reimplement it here. Differential Revision: https://reviews.llvm.org/D77712
-
Peter Smith authored
A linker will create .ARM.exidx sections for InputSections that don't have them. This can cause a relocation out of range error If the InputSection happens to be extremely far away from the other sections. This is often the case for the vector table on older ARM CPUs as the only two places that the table can be placed is 0 or 0xffff0000. We fix this by removing InputSections that need a linker generated .ARM.exidx section if that would cause an error. Differential Revision: https://reviews.llvm.org/D79289
-
David Green authored
-
Martin Storsjö authored
-
Haojian Wu authored
Summary: I was surprised to see the LocalOffset can exceed uint32_t, but it does happen and lead to crashes in one of our internal huge TU with a large preamble. with this patch, the crash is gone. Reviewers: sammccall Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D79397
-
Alexander Belyaev authored
Adding this pattern reduces code duplication. There is no need to have a custom implementation for lowering to llvm.cmpxchg. Differential Revision: https://reviews.llvm.org/D78753
-
David Sherwood authored
Summary: I have fixed several places in getSplatSourceVector and isSplatValue to work correctly with scalable vectors. I added new support for the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can support with scalable vectors. In other places I have tried to do the sensible thing, such as bail out for vector types we don't yet support or don't intend to support. It's not possible to add IR test cases to cover these changes, since they are currently only ever exercised on certain targets, e.g. only X86 targets use the result of getSplatSourceVector. I've assumed that X86 tests already exist to test these code paths for fixed vectors. However, I have added some AArch64 unit tests that test the specific functions I have changed. Differential revision: https://reviews.llvm.org/D79083
-
Julian Lettner authored
The argparse 'append' action concatenates multiple occurrences of an argument (even when we specify `nargs=1` or `nargs='?'`). This means that we create multiple identical output files if the `--output` argument is given more than once. This isn't useful and we instead want this to behave like a standard optional argument: last occurrence wins.
-
Reid Kleckner authored
The GSIHashStreamBuilder doesn't need to know the stream index. Standardize the naming (Idx -> Index in public APIs).
-
Stephen Neuendorffer authored
-
Stephen Neuendorffer authored
This reverts commit ab1ca6e6.
-
Stephen Neuendorffer authored
Portions of MLIR which depend on LLVMIR generally need to depend on intrinsics_gen, to ensure that tablegen'd header files from LLVM are built first. Without this, we get errors, typically about llvm/IR/Attributes.inc not being found. Note that previously the Linalg Dialect depended on intrinsics_gen, but it doesn't need to, since it doesn't use LLVMIR. Differential Revision: https://reviews.llvm.org/D79389
-
Jonas Devlieghere authored
This patch threads the virtual file system through dsymutil. Currently there is no good way to find out exactly what files are necessary in order to reproduce a dsymutil link, at least not without knowledge of how dsymutil's internals. My motivation for this change is to add lightweight "reproducers" that automatically gather the input object files through the FileCollectorFileSystem. The files together with the YAML mapping will allow us to transparently reproduce a dsymutil link, even without having to mess with the OSO path prefix. Differential revision: https://reviews.llvm.org/D79376
-
River Riddle authored
This revision adds support for merging identical blocks, or those with the same operations that branch to the same successors. Operands that mismatch between the different blocks are replaced with new block arguments added to the merged block. Differential Revision: https://reviews.llvm.org/D79134
-
Geoffrey Martin-Noble authored
This change removes tabs from the comments printed by the asmprinter after basic block declarations in favor of two spaces. This is currently the only place in the printed IR that uses tabs. Differential Revision: https://reviews.llvm.org/D79377
-
Sergey Dmitriev authored
Summary: Otherwise we can get unaccounted references to call graph nodes. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79382
-
Yaxun (Sam) Liu authored
union ctor does not call ctors of its data members. union dtor does not call dtors of its data members. Also union does not have base class. Currently when clang checks whether union has an empty ctor/dtor, it checks the ctors/dtors of its data members. This causes incorrectly diagnose device side global variables and shared variables as having non-empty ctors/dtors. This patch fixes that. Differential Revision: https://reviews.llvm.org/D79367
-
Zakk Chen authored
Summary: That unless the user requested an output object (--lto-obj-path), the an unused empty combined module is not emitted. This changed is helpful for some target (ex. RISCV-V) which encoded the ABI info in IR module flags (target-abi). Empty unused module has no ABI info so the linker would get the linking error during merging incompatible ABIs. Reviewers: tejohnson, espindola, MaskRay Subscribers: emaste, inglorion, arichardson, hiraditya, simoncook, MaskRay, steven_wu, dexonsmith, PkmX, dang, lenary, s.egerton, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78988
-
Adrian Prantl authored
-
Nicolas Vasilache authored
Summary: In the particular case of an insertion in a block without a terminator, the BlockBuilder insertion point should be block->end(). Adding a unit test to exercise this. Differential Revision: https://reviews.llvm.org/D79363
-
River Riddle authored
This allows for walking the operations nested directly within a region, without traversing nested regions. Differential Revision: https://reviews.llvm.org/D79056
-
River Riddle authored
This removes the unnecessary/costly context synchronization when parsing, as the context is guaranteed to not be used by any other threads.
-
Reid Kleckner authored
It looks like the new implementation is correct, since there were TODOs here about getting the new behavior. I am not sure if "C:..\.." should become "C:" or "C:\", though. The new output doesn't precisely match the TODO message, but it seems appropriate given the specification of remove_dots and how .. traversals work at the root directory.
-
Lang Hames authored
Refering to the link order of a dylib better matches the terminology used in static compilation. As upcoming patches will increase the number of places where link order matters (for example when closing JITDylibs) it's better to get this name change out of the way early.
-
Reid Kleckner authored
This reverts commit fb5fd746. Re-instates commit 53913a65 The fix is to trim off trailing separators, as in `/foo/bar/` and produce `/foo/bar`. VFS tests rely on this. I added unit tests for remove_dots.
-
Reid Kleckner authored
Profiling shows that time is spent destroying the allocator member of PDBLinker, and that is unneeded.
-
Hanhan Wang authored
Summary: As D78974, this patch implements the emulation for store op. The emulation is done with atomic operations. E.g., if the storing value is i8, rewrite the StoreOp to: 1) load a 32-bit integer 2) clear 8 bits in the loading value 3) store 32-bit value back 4) load a 32-bit integer 5) modify 8 bits in the loading value 6) store 32-bit value back The step 1 to step 3 are done by AtomicAnd as one atomic step, and the step 4 to step 6 are done by AtomicOr as another atomic step. Differential Revision: https://reviews.llvm.org/D79272
-
Haruki Imai authored
This std::copy_n copies 8 byte data (APInt raw data) by 1 byte from the beginning of char array. This is no problem in little endian, but the data is not copied correctly in big endian because the data should be copied from the end of the char array. - Example of 4 byte data (such as float32) Little endian (First 4 bytes): Address | 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 Data | 0xcd 0xcc 0x8c 0x3f 0x00 0x00 0x00 0x00 Big endian (Last 4 bytes): Address | 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 Data | 0x00 0x00 0x00 0x00 0x3f 0x8c 0xcc 0xcd In general, when it copies N(N<8) byte data in big endian, the start address should be incremented by (8 - N) bytes. The original code has no problem when it includes 8 byte data(such as double) even in big endian. Differential Revision: https://reviews.llvm.org/D78076
-
Fangrui Song authored
We currently only support extern relocations. `X86_64_RELOC_SIGNED_{1,2,4}` are like X86_64_RELOC_SIGNED, but with the implicit addend fixed to 1, 2, and 4, respectively. See the comment in `lib/Target/X86/MCTargetDesc/X86MachObjectWriter.cpp RecordX86_64Relocation`. Reviewed By: int3 Differential Revision: https://reviews.llvm.org/D79311
-
- May 04, 2020
-
-
Krzysztof Parzyszek authored
Register live ranges may have had gaps that after coalescing should be removed. This is done by adding a new segment to the range, and merging it with neighboring segments. When doing so, do not assume that each subrange of the register ended at the same index. If a subrange ended earlier, adding this segment could make the live range invalid. Instead, if the subrange is not live at the start of the segment, extend it first.
-
Vedant Kumar authored
Allow Language() to be called from const methods within UserExpression.
-
Jonas Devlieghere authored
Fix warning: ISO C++ requires the name after '::~' to be found in the same scope as the name before '::~' [-Wdtor-name]
-
Sanjay Patel authored
D79360 could change this kind of sequence.
-
Davide Italiano authored
Debug info generation & codegen now steps onto the correct line.
-
Stephen Neuendorffer authored
Previous patch broken flang, which has some yet-to-be resolved cyclic dependencies. This patch fixes the breakage by restricting the dependencies which are generated to public libraries, which is probably more sensible anyway. Differential Revision: https://reviews.llvm.org/D79366
-