Skip to content
  1. May 05, 2020
    • Andrea Di Biagio's avatar
      [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as... · 5578ec32
      Andrea Di Biagio authored
      [MCA] Fixed a bug where loads and stores were sometimes incorrectly marked as depedent. Fixes PR45793.
      
      This fixes a regression introduced by a very old commit 280ac1fd (was
      llvm-svn 361950).
      
      Commit 280ac1fd redesigned the logic in the LSUnit with the goal of
      speeding up isReady() queries, and stabilising the LSUnit API (while also making
      the load store unit more customisable).
      
      The concept of MemoryGroup (effectively an alias set) was added by that commit
      to better describe and track dependencies between memory operations.  However,
      that concept was not just used for alias dependencies, but it was also used for
      describing memory "order" dependencies (enforced by the memory consistency
      model).
      
      Instructions of a same memory group were considered "equivalent" as in:
      independent operations that can potentially execute in parallel.  The problem
      was that the cost of a dependency (in terms of number of cycles) should have
      been different for "order" dependency. Instructions in an order dependency
      simply have to have to wait until their predecessors are "issued" to an
      underlying pipeline (rather than having to wait until predecessors have beeng
      fully executed). For simple "order" dependencies, this was effectively
      introducing an artificial delay on the "issue" of independent loads and stores.
      
      This patch fixes the issue and adds a new test named 'independent-load-stores.s'
      to a bunch of x86 targets. That test contains the reproducible posted by Fabian
      Ritter on PR45793.
      
      I had to rerun the update-mca-tests script on several files. To avoid expected
      regressions on some Exynos tests, I have added a -noalias=false flag (to match
      the old strict behavior on latencies).
      
      Some tests for processor Barcelona are improved/fixed by this change and they
      now show better results.  In a few tests we were incorrectly counting the time
      spent by instructions in a scheduler queue.  In one case in particular we now
      correctly see a store executed out of order.  That test was affected by the same
      underlying issue reported as PR45793.
      
      Reviewers: mattd
      
      Differential Revision: https://reviews.llvm.org/D79351
      5578ec32
    • Pratyai Mazumder's avatar
      [SanitizerCoverage] Replace the unconditional store with a load, then a conditional store. · 08032e71
      Pratyai Mazumder authored
      Reviewers: vitalybuka, kcc
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79392
      08032e71
    • Alex Zinenko's avatar
      [mlir] NFC: update ::build signature in the tutorial document · 898f74c3
      Alex Zinenko authored
      This was missing from the original commit that changed the interface of
      `::build` methods to take `OpBuilder &` instead of `Builder *.
      898f74c3
    • Heejin Ahn's avatar
      [WebAssembly] Fix block marker placing after fixUnwindMismatches · 834debff
      Heejin Ahn authored
      Summary:
      This fixes a few things that are connected. It is very hard to provide
      an independent test case for each of those fixes, because they are
      interconnected and sometimes one masks another. The provided test case
      triggers some of those bugs below but not all.
      
      ---
      
      1. Background:
      `placeBlockMarker` takes a BB, and if the BB is a destination of some
      branch, it places `end_block` marker there, and computes the nearest
      common dominator of all predecessors (what we call 'header') and places
      a `block` marker there.
      
      When we first place markers, we traverse BBs from top to bottom. For
      example, when there are 5 BBs A, B, C, D, and E and B, D, and E are
      branch destinations, if mark the BB given to `placeBlockMarker` with `*`
      and draw a rectangle representing the border of `block` and `end_block`
      markers, the process is going to look like
      ```
                             -------
                 -----       |-----|
       ---       |---|       ||---||
       |A|       ||A||       |||A|||
       ---  -->  |---|  -->  ||---||
       *B        | B |       || B ||
        C        | C |       || C ||
        D        -----       |-----|
        E         *D         |  D  |
                   E         -------
                               *E
      ```
      which means when we first place markers, we go from inner to outer
      scopes. So when we place a `block` marker, if the header already
      contains other `block` or `try` marker, it has to belong to an inner
      scope, so the existing `block`/`try` markers should go _after_ the new
      marker. This was the assumption we had.
      
      But after placing all markers we run `fixUnwindMismatches` function.
      There we do some control flow transformation and create some branches,
      and we call `placeBlockMarker` again to place `block`/`end_block`
      markers for those newly created branches. We can't assume that we are
      traversing branch destination BBs from top to bottom now because we are
      basically inserting some new markers in the middle of existing markers.
      
      Fix:
      In `placeBlockMarker`, we don't have the assumption that the BB given is
      in the order of top to bottom, and when placing `block` markers,
      calculates whether existing `block` or `try` markers are inner or
      outer scopes with respect to the current scope.
      
      ---
      
      2. Background:
      In `fixUnwindMismatches`, when there is a call whose correct unwind
      destination mismatches the current destination after initially placing
      `try` markers, we wrap that with a new nested `try`/`catch`/`end` and
      jump to the correct handler within the new `catch`. The correct handler
      code is split as a separate BB from its original EH pad so it can be
      branched to. Here's an example:
      
      - Before
      ```
      mbb:
        call @foo       <- Unwind destination mismatch!
      wrong-ehpad:
        catch
        ...
      cont:
        end_try
        ...
      correct-ehpad:
        catch
        [handler code]
      ```
      
      - After
      ```
      mbb:
        try                (new)
        call @foo
      nested-ehpad:        (new)
        catch              (new)
        local.set n / drop (new)
        br %handleri       (new)
      nested-end:          (new)
        end_try            (new)
      wrong-ehpad:
        catch
        ...
      cont:
        end_try
        ...
      correct-ehpad:
        catch
        local.set n / drop (new)
      handler:             (new)
        end_try
        [handler code]
      ```
      
      Note that after this transformation, it is possible there are no calls
      to actually unwind to `correct-ehpad` here. `call @foo` now
      branches to `handler`, and there can be no other calls to unwind to
      `correct-ehpad`. In this case `correct-ehpad` does not have any
      predecessors anymore.
      
      This can cause a bug in `placeBlockMarker`, because we may need to place
      `end_block` marker in `handler`, and `placeBlockMarker` computes the
      nearest common dominator of all predecessors. If one of `handler`'s
      predecessor (here `correct-ehpad`) does not have any predecessors, i.e.,
      no way of reaching it, we cannot correctly compute the common dominator
      of predecessors of `handler`, and end up placing no `block`/`end`
      markers. This bug actually sometimes masks the bug 1.
      
      Fix:
      When we have an EH pad that does not have any predecessors after this
      transformation, deletes all its successors, so that its successors don't
      have any dangling predecessors.
      
      ---
      
      3. Background:
      Actually the `handler` BB in the example shown in bug 2 doesn't need
      `end_block` marker, despite it being a new branch destination, because
      it already has `end_try` marker which can serve the same purpose. I just
      put that example there for an illustration purpose. There is a case we
      actually need to place `end_block` marker: when the branch dest is the
      appendix BB. The appendix BB is created when there is a call that is
      supposed to unwind to the caller ends up unwinding to a wrong EH pad. In
      this case we also wrap the call with a nested `try`/`catch`/`end`,
      create an 'appendix' BB at the very end of the function, and branch to
      that BB, where we rethrow the exception to the caller.
      
      Fix:
      When we don't actually need to place block markers, we don't.
      
      ---
      
      4. In case we fall through to the continuation BB after the catch block,
      after extracting handler code in `fixUnwindMismatches` (refer to bug 2
      for an example), we now have to add a branch to it to bypass the
      handler.
      - Before
      ```
      try
        ...
        (falls through to 'cont')
      catch
        handler body
      end
                    <-- cont
      ```
      
      - After
      ```
      try
        ...
        br %cont    (new)
      catch
      end
      handler body
                    <-- cont
      ```
      
      The problem is, we haven't been placing a new `end_block` marker in the
      `cont` BB in this case. We should, and this fixes it. But it is hard to
      provide a test case that triggers this bug, because the current
      compilation pipeline from .ll to .s does not generate this kind of code;
      we always have a `br` after `invoke`. But code without `br` is still
      valid, and we can have that kind of code if we have some pipeline
      changes or optimizations later. Even mir test cases cannot trigger this
      part for now, because we don't encode auxiliary EH-related data
      structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities
      can be added later, but I don't think we should block this fix on that.
      
      Reviewers: dschuff
      
      Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79324
      834debff
    • Pierre-vh's avatar
      [Target][ARM] Fold or(A, B) more aggressively for I1 vectors · d5eb7ffa
      Pierre-vh authored
      This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
      more agressive for I1 vector. This only affects Thumb2 MVE and improves
      codegen, because it removes a lot of msr/mrs instructions on VPR.P0.
      
      This patch also adds a xor(vcmp) -> !vcmp fold for MVE.
      
      Differential Revision: https://reviews.llvm.org/D77202
      d5eb7ffa
    • Pierre-vh's avatar
      [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops · ffdda495
      Pierre-vh authored
      This patch adds an implementation of PerformVSELECTCombine in the
      ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into
      vselect(cond, rhs, lhs).
      
      Normally, this should be done by the target-independent DAG Combiner,
      but it doesn't handle the kind of constants that we generate, so we
      have to reimplement it here.
      
      Differential Revision: https://reviews.llvm.org/D77712
      ffdda495
    • Peter Smith's avatar
      [ELF][ARM] Do not create .ARM.exidx sections for out of range inputs · 48aebfc9
      Peter Smith authored
      A linker will create .ARM.exidx sections for InputSections that don't
      have them. This can cause a relocation out of range error If the
      InputSection happens to be extremely far away from the other sections.
      This is often the case for the vector table on older ARM CPUs as the only
      two places that the table can be placed is 0 or 0xffff0000. We fix this
      by removing InputSections that need a linker generated .ARM.exidx
      section if that would cause an error.
      
      Differential Revision: https://reviews.llvm.org/D79289
      48aebfc9
    • David Green's avatar
      [ARM] MVE predcast with const test. NFC · 09767af8
      David Green authored
      09767af8
    • Martin Storsjö's avatar
      5a1c3017
    • Haojian Wu's avatar
      [clang] Fix an uint32_t overflow in large preamble. · 4f8d9722
      Haojian Wu authored
      Summary:
      I was surprised to see the LocalOffset can exceed uint32_t, but it
      does happen and lead to crashes in one of our internal huge TU with a large
      preamble.
      
      with this patch, the crash is gone.
      
      Reviewers: sammccall
      
      Subscribers: cfe-commits
      
      Tags: #clang
      
      Differential Revision: https://reviews.llvm.org/D79397
      4f8d9722
    • Alexander Belyaev's avatar
      [MLIR] Add conversion from AtomicRMWOp -> GenericAtomicRMWOp. · b79751e8
      Alexander Belyaev authored
      Adding this pattern reduces code duplication. There is no need to have a
      custom implementation for lowering to llvm.cmpxchg.
      
      Differential Revision: https://reviews.llvm.org/D78753
      b79751e8
    • David Sherwood's avatar
      [CodeGen] Fix warnings due to SelectionDAG::getSplatSourceVector · cd3a54c5
      David Sherwood authored
      Summary:
      I have fixed several places in getSplatSourceVector and isSplatValue
      to work correctly with scalable vectors. I added new support for
      the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
      support with scalable vectors. In other places I have tried to do
      the sensible thing, such as bail out for vector types we don't yet
      support or don't intend to support.
      
      It's not possible to add IR test cases to cover these changes, since
      they are currently only ever exercised on certain targets, e.g.
      only X86 targets use the result of getSplatSourceVector. I've
      assumed that X86 tests already exist to test these code paths for
      fixed vectors. However, I have added some AArch64 unit tests that
      test the specific functions I have changed.
      
      Differential revision: https://reviews.llvm.org/D79083
      cd3a54c5
    • Julian Lettner's avatar
      [lit] Create one output file when `--output` is specified more than once · 47b25c33
      Julian Lettner authored
      The argparse 'append' action concatenates multiple occurrences of an
      argument (even when we specify `nargs=1` or `nargs='?'`).  This means
      that we create multiple identical output files if the `--output`
      argument is given more than once.  This isn't useful and we instead want
      this to behave like a standard optional argument: last occurrence wins.
      47b25c33
    • Reid Kleckner's avatar
      [PDB] Move stream index tracking to GSIStreamBuilder · b7438c25
      Reid Kleckner authored
      The GSIHashStreamBuilder doesn't need to know the stream index.
      Standardize the naming (Idx -> Index in public APIs).
      b7438c25
    • Stephen Neuendorffer's avatar
    • Stephen Neuendorffer's avatar
      5469f434
    • Stephen Neuendorffer's avatar
      [MLIR] Normalize usage of intrinsics_gen · 146192ad
      Stephen Neuendorffer authored
      Portions of MLIR which depend on LLVMIR generally need to depend on
      intrinsics_gen, to ensure that tablegen'd header files from LLVM are built
      first.  Without this, we get errors, typically about llvm/IR/Attributes.inc
      not being found.
      
      Note that previously the Linalg Dialect depended on intrinsics_gen, but it
      doesn't need to, since it doesn't use LLVMIR.
      
      Differential Revision: https://reviews.llvm.org/D79389
      146192ad
    • Jonas Devlieghere's avatar
      [dsymutil] Thread the VFS through dsymutil (NFC) · 0be7acab
      Jonas Devlieghere authored
      This patch threads the virtual file system through dsymutil.
      
      Currently there is no good way to find out exactly what files are
      necessary in order to reproduce a dsymutil link, at least not without
      knowledge of how dsymutil's internals.  My motivation for this change is
      to add lightweight "reproducers" that automatically gather the input
      object files through the FileCollectorFileSystem. The files together
      with the YAML mapping will allow us to transparently reproduce a
      dsymutil link, even without having to mess with the OSO path prefix.
      
      Differential revision: https://reviews.llvm.org/D79376
      0be7acab
    • River Riddle's avatar
      [mlir] Add support for merging identical blocks during canonicalization · 469c02d0
      River Riddle authored
      This revision adds support for merging identical blocks, or those with the same operations that branch to the same successors. Operands that mismatch between the different blocks are replaced with new block arguments added to the merged block.
      
      Differential Revision: https://reviews.llvm.org/D79134
      469c02d0
    • Geoffrey Martin-Noble's avatar
      [mlir] Remove tabs from predecessor comments · 13090ec7
      Geoffrey Martin-Noble authored
      This change removes tabs from the comments printed by the asmprinter after basic
      block declarations in favor of two spaces. This is currently the only place in
      the printed IR that uses tabs.
      
      Differential Revision: https://reviews.llvm.org/D79377
      13090ec7
    • Sergey Dmitriev's avatar
      [CallGraphUpdater] Removed references to calles when deleting function · f637334d
      Sergey Dmitriev authored
      Summary: Otherwise we can get unaccounted references to call graph nodes.
      
      Reviewers: jdoerfert, sstefan1
      
      Reviewed By: jdoerfert
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D79382
      f637334d
    • Yaxun (Sam) Liu's avatar
      [CUDA][HIP] Fix empty ctor/dtor check for union · d75a6e93
      Yaxun (Sam) Liu authored
      union ctor does not call ctors of its data members. union dtor does not call dtors of its data members.
      Also union does not have base class.
      
      Currently when clang checks whether union has an empty ctor/dtor, it checks the ctors/dtors of its
      data members. This causes incorrectly diagnose device side global variables and shared variables as
      having non-empty ctors/dtors.
      
      This patch fixes that.
      
      Differential Revision: https://reviews.llvm.org/D79367
      d75a6e93
    • Zakk Chen's avatar
      [LTO] Suppress emission of empty combined module by default · ad5fad0a
      Zakk Chen authored
      Summary:
      That unless the user requested an output object (--lto-obj-path), the an
      unused empty combined module is not emitted.
      
      This changed is helpful for some target (ex. RISCV-V) which encoded the
      ABI info in IR module flags (target-abi). Empty unused module has no ABI
      info so the linker would get the linking error during merging
      incompatible ABIs.
      
      Reviewers: tejohnson, espindola, MaskRay
      
      Subscribers: emaste, inglorion, arichardson, hiraditya, simoncook, MaskRay, steven_wu, dexonsmith, PkmX, dang, lenary, s.egerton, luismarques, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D78988
      ad5fad0a
    • Adrian Prantl's avatar
      Clarify comment · 36183811
      Adrian Prantl authored
      36183811
    • Nicolas Vasilache's avatar
      [mlir][EDSC] Fix off-by-one BlockBuilder insertion point. · 036772ac
      Nicolas Vasilache authored
      Summary:
      In the particular case of an insertion in a block without a terminator, the BlockBuilder insertion point should be block->end().
      
      Adding a unit test to exercise this.
      
      Differential Revision: https://reviews.llvm.org/D79363
      036772ac
    • River Riddle's avatar
      [mlir][IR] Add a Region::getOps method that returns a range of immediately nested operations · 1e4faf23
      River Riddle authored
      This allows for walking the operations nested directly within a region, without traversing nested regions.
      
      Differential Revision: https://reviews.llvm.org/D79056
      1e4faf23
    • River Riddle's avatar
      [mlir][mlir-opt] Disable multithreading when parsing the input module. · 6bce7d8d
      River Riddle authored
      This removes the unnecessary/costly context synchronization when parsing, as the context is guaranteed to not be used by any other threads.
      6bce7d8d
    • Reid Kleckner's avatar
      Update LLDB filespec tests for remove_dots change · 58c7bf24
      Reid Kleckner authored
      It looks like the new implementation is correct, since there were TODOs
      here about getting the new behavior.
      
      I am not sure if "C:..\.." should become "C:" or "C:\", though. The new
      output doesn't precisely match the TODO message, but it seems
      appropriate given the specification of remove_dots and how .. traversals
      work at the root directory.
      58c7bf24
    • Lang Hames's avatar
      [ORC] Rename SearchOrder operations on JITDylib to LinkOrder. · c66f8900
      Lang Hames authored
      Refering to the link order of a dylib better matches the terminology used in
      static compilation. As upcoming patches will increase the number of places where
      link order matters (for example when closing JITDylibs) it's better to get this
      name change out of the way early.
      c66f8900
    • Reid Kleckner's avatar
      Re-land "Optimize path::remove_dots" · 75cbf6dc
      Reid Kleckner authored
      This reverts commit fb5fd746.
      Re-instates commit 53913a65
      
      The fix is to trim off trailing separators, as in `/foo/bar/` and
      produce `/foo/bar`. VFS tests rely on this. I added unit tests for
      remove_dots.
      75cbf6dc
    • Reid Kleckner's avatar
      [PDB] Use the global BumpPtrAllocator · 2868ee5b
      Reid Kleckner authored
      Profiling shows that time is spent destroying the allocator member of
      PDBLinker, and that is unneeded.
      2868ee5b
    • Hanhan Wang's avatar
      [mlir][StandardToSPIRV] Emulate bitwidths not supported for store op. · 5d10613b
      Hanhan Wang authored
      Summary:
      As D78974, this patch implements the emulation for store op. The emulation is
      done with atomic operations. E.g., if the storing value is i8, rewrite the
      StoreOp to:
      
       1) load a 32-bit integer
       2) clear 8 bits in the loading value
       3) store 32-bit value back
       4) load a 32-bit integer
       5) modify 8 bits in the loading value
       6) store 32-bit value back
      
      The step 1 to step 3 are done by AtomicAnd as one atomic step, and the step 4
      to step 6 are done by AtomicOr as another atomic step.
      
      Differential Revision: https://reviews.llvm.org/D79272
      5d10613b
    • Haruki Imai's avatar
      [mlir] Support big endian in DenseElementsAttr · 3a7be241
      Haruki Imai authored
      This std::copy_n copies 8 byte data (APInt raw data) by 1 byte from the
      beginning of char array. This is no problem in little endian, but the
      data is not copied correctly in big endian because the data should be
      copied from the end of the char array.
      
      - Example of 4 byte data (such as float32)
      
      Little endian (First 4 bytes):
      Address | 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
      Data    | 0xcd 0xcc 0x8c 0x3f 0x00 0x00 0x00 0x00
      
      Big endian (Last 4 bytes):
      Address | 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
      Data    | 0x00 0x00 0x00 0x00 0x3f 0x8c 0xcc 0xcd
      
      In general, when it copies N(N<8) byte data in big endian, the start
      address should be incremented by (8 - N) bytes.
      The original code has no problem when it includes 8 byte data(such as
       double) even in big endian.
      
      Differential Revision: https://reviews.llvm.org/D78076
      3a7be241
    • Fangrui Song's avatar
      [lld-macho] Support X86_64_RELOC_SIGNED_{1,2,4} · 6939fe6e
      Fangrui Song authored
      We currently only support extern relocations.
      `X86_64_RELOC_SIGNED_{1,2,4}` are like X86_64_RELOC_SIGNED, but with the
      implicit addend fixed to 1, 2, and 4, respectively.
      See the comment in `lib/Target/X86/MCTargetDesc/X86MachObjectWriter.cpp RecordX86_64Relocation`.
      
      Reviewed By: int3
      
      Differential Revision: https://reviews.llvm.org/D79311
      6939fe6e
  2. May 04, 2020
Loading