Skip to content
  1. Oct 12, 2021
    • Jay Foad's avatar
      Revert "[AMDGPU] Enable load clustering in the post-RA scheduler" · 66ce1015
      Jay Foad authored
      This reverts commit 66e13c7f.
      
      It was committed by accident.
      66ce1015
    • Jay Foad's avatar
      [TwoAddressInstruction] Remove ad hoc machine verification · f7ee21aa
      Jay Foad authored
      With the -early-live-intervals command line flag,
      TwoAddressInstructionPass::runOnMachineFunction would call
      MachineFunction::verify before returning to check the live intervals.
      But there was not much benefit to doing this since -verify-machineinstrs
      and LLVM_ENABLE_EXPENSIVE_CHECKS provide a more general way of
      scheduling machine verification after every pass.
      
      Also it caused problems on targets like Lanai which are marked as "not
      machine verifier clean", since verification would fail for known
      target-specific problems which are nothing to do with LiveIntervals.
      
      Differential Revision: https://reviews.llvm.org/D111618
      f7ee21aa
    • Jay Foad's avatar
      [AMDGPU] Enable load clustering in the post-RA scheduler · 66e13c7f
      Jay Foad authored
      This has a couple of benefits:
      1. It can sometimes fix clusters that got broken apart when the register
         allocator inserted a copy.
      2. Post-RA scheduling does not have to worry about increasing register
         pressure, which in some cases gives it more freedom to reorder
         instructions.
      
      Testing on a collection of 10,000 graphics shaders compiled for gfx1010
      showed:
      - The average length of each run of one or more load instructions
        increased by about 1%.
      - The number of runs of two or more load instructions increased by
        about 4%.
      66e13c7f
    • Jeremy Morse's avatar
      [DebugInfo][NFC] Move LiveDebugValues class to header · 838b4a53
      Jeremy Morse authored
      This patch shifts the InstrRefBasedLDV class declaration to a header.
      Partially because it's already massive, but mostly so that I can start
      writing some unit tests for it. This patch also adds the boilerplate for
      said unit tests.
      
      Differential Revision: https://reviews.llvm.org/D110165
      838b4a53
    • Bradley Smith's avatar
      [AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR · 2eb42e3d
      Bradley Smith authored
      Depends on D111135
      
      Differential Revision: https://reviews.llvm.org/D111165
      2eb42e3d
    • Simon Pilgrim's avatar
      61d124f7
    • Kerry McLaughlin's avatar
      [LoopVectorize] Classify pointer induction updates as scalar only if they have one use · 1439ef1a
      Kerry McLaughlin authored
      collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
      that the instruction will be scalar after vectorization. This may crash later
      in VPReplicateRecipe::execute() if there there is another user of the instruction
      other than the Phi node which needs to be widened.
      
      This changes collectLoopScalars so that if there are any other users of
      Update other than a Phi node, it is not added to ScalarPtrs.
      
      Reviewed By: david-arm, fhahn
      
      Differential Revision: https://reviews.llvm.org/D111294
      1439ef1a
    • Florian Hahn's avatar
      [LoopPeel] Use any_of & contains instead of for & find. · 40d85f16
      Florian Hahn authored
      Using contains was suggested in D108114, but I forgot to include it when
      landing the patch.
      40d85f16
    • Sjoerd Meijer's avatar
      [FuncSpec] Allow ConstExprs that are function pointers · fc0fa851
      Sjoerd Meijer authored
      This is a follow up of D110529 that disallowed constexprs. That change
      introduced a regression as this also disallowed constexprs that are function
      pointers, which is actually one of the motivating use cases that we do want to
      support.
      
      Differential Revision: https://reviews.llvm.org/D111567
      fc0fa851
    • Florian Hahn's avatar
      [LoopPeel] Peel if it turns invariant loads dereferenceable. · cd0ba9dc
      Florian Hahn authored
      This patch adds a new cost heuristic that allows peeling a single
      iteration off read-only loops, if the loop contains a load that
      
          1. is feeding an exit condition,
          2. dominates the latch,
          3. is not already known to be dereferenceable,
          4. and has a loop invariant address.
      
      If all non-latch exits are terminated with unreachable, such loads
      in the loop are guaranteed to be dereferenceable after peeling,
      enabling hoisting/CSE'ing them.
      
      This enables vectorization of loops with certain runtime-checks, like
      multiple calls to `std::vector::at` if the vector is passed as pointer.
      
      Reviewed By: mkazantsev
      
      Differential Revision: https://reviews.llvm.org/D108114
      cd0ba9dc
    • jacquesguan's avatar
      [RISCV] Rename assembler mnemonic of unordered floating-point reductions for v1.0-rc change · 0608bbd4
      jacquesguan authored
      Rename vfredsum and vfwredsum to vfredusum and vfwredusum. Add aliases for vfredsum and vfwredsum.
      
      Reviewed By: luismarques, HsiangKai, khchen, frasercrmck, kito-cheng, craig.topper
      
      Differential Revision: https://reviews.llvm.org/D105690
      0608bbd4
    • Lang Hames's avatar
      [ORC] More attempts to work around compiler failures. · 5829ba7a
      Lang Hames authored
      Commit 731f991c seems to have helped, but did not catch all instances (see
      https://lab.llvm.org/buildbot/#/builders/193/builds/104). Switch more inner
      structs to C++98 initializers to work around the issue. Add FIXMEs to revisit
      in the future.
      5829ba7a
    • Lang Hames's avatar
      [ORC] Add more explicit narrowing casts. · 3a52a639
      Lang Hames authored
      This should fix the buildbot failure at
      https://lab.llvm.org/buildbot/#/builders/187/builds/2140
      3a52a639
    • Lang Hames's avatar
      [ORC] Fix a typo in a variable name. · 9ca50641
      Lang Hames authored
      9ca50641
    • Lang Hames's avatar
      Re-apply e50aea58, "Major JITLinkMemoryManager refactor". with fixes. · 962a2479
      Lang Hames authored
      Adds explicit narrowing casts to JITLinkMemoryManager.cpp.
      
      Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
      dropped in the refactor.
      
      This effectively reverts commit 6641d29b.
      962a2479
    • Yonghong Song's avatar
      BPF: rename BTF_KIND_TAG to BTF_KIND_DECL_TAG · 1321e472
      Yonghong Song authored
      Per discussion in https://reviews.llvm.org/D111199,
      the existing btf_tag attribute will be renamed to
      btf_decl_tag. This patch updated BTF backend to
      use btf_decl_tag attribute name and also
      renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG.
      
      Differential Revision: https://reviews.llvm.org/D111592
      1321e472
    • hsmahesha's avatar
      [AMDGPU] Remove dead frame indices after sgpr spill. · 52cb3af0
      hsmahesha authored
      All those frame indices which are dead after sgpr spill should be removed from
      the function frame. Othewise, there is a side effect such as re-mapping of free
      frame index ids by the later pass(es) like "stack slot coloring" which in turn
      could mess-up with the book keeping of "frame index to VGPR lane".
      
      Reviewed By: cdevadas
      
      Differential Revision: https://reviews.llvm.org/D111150
      52cb3af0
    • Yonghong Song's avatar
      [NFC][Attr] rename attribute btf_tag to btf_decl_tag · 325d0007
      Yonghong Song authored
      Per discussion in https://reviews.llvm.org/D111199,
      the existing btf_tag attribute will be renamed to
      btf_decl_tag. This patch mostly updated the Bitcode and
      DebugInfo test cases with new attribute name.
      
      Differential Revision: https://reviews.llvm.org/D111591
      325d0007
    • Freddy Ye's avatar
      [X86][ISel] Lowering llvm.thread.pointer · d57a87ea
      Freddy Ye authored
      Reviewed By: pengfei
      
      Differential Revision: https://reviews.llvm.org/D110681
      d57a87ea
    • Lang Hames's avatar
      Revert "[JITLink][ORC] Major JITLinkMemoryManager refactor." · 6641d29b
      Lang Hames authored
      This reverts commit e50aea58 while I
      investigate bot failures.
      6641d29b
    • Lang Hames's avatar
      [JITLink][ORC] Major JITLinkMemoryManager refactor. · e50aea58
      Lang Hames authored
      This commit substantially refactors the JITLinkMemoryManager API to: (1) add
      asynchronous versions of key operations, (2) give memory manager implementations
      full control over link graph address layout, (3) enable more efficient tracking
      of allocated memory, and (4) support "allocation actions" and finalize-lifetime
      memory.
      
      Together these changes provide a more usable API, and enable more powerful and
      efficient memory manager implementations.
      
      To support these changes the JITLinkMemoryManager::Allocation inner class has
      been split into two new classes: InFlightAllocation, and FinalizedAllocation.
      The allocate method returns an InFlightAllocation that tracks memory (both
      working and executor memory) prior to finalization. The finalize method returns
      a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
      Allocation into InFlightAllocation and FinalizedAllocation allows
      InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
      to be implemented and used efficiently (see (3) below).
      
      In addition to the memory manager changes this commit also introduces a new
      MemProt type to represent memory protections (MemProt replaces use of
      sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
      can be used to indicate when a section should be deallocated (see (4) below).
      
      Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
      switch to MemProt -- this should be straightworward. Clients with out-of-tree
      memory managers will need to update their implementations. Clients using
      in-tree memory managers should mostly be able to ignore it.
      
      Major features:
      
      (1) More asynchrony:
      
      The allocate and deallocate methods are now asynchronous by default, with
      synchronous convenience wrappers supplied. The asynchronous versions allow
      clients (including JITLink) to request and deallocate memory without blocking.
      
      (2) Improved control over graph address layout:
      
      Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
      reference to the LinkGraph to be allocated. The memory manager is responsible
      for calculating the memory requirements for the graph, and laying out the graph
      (setting working and executor memory addresses) within the allocated memory.
      This gives memory managers full control over JIT'd memory layout. For clients
      that don't need or want this degree of control the new "BasicLayout" utility can
      be used to get a segment-based view of the graph, similar to the one provided by
      SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
      method can be used to automatically lay out the graph.
      
      (3) Efficient tracking of allocated memory.
      
      The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
      64-bits to store in the controller. The meaning of the address held by the
      FinalizedAlloc is left up to the memory manager implementation, but the
      FinalizedAlloc type enforces a requirement that deallocate be called on any
      non-default values prior to destruction. The deallocate method takes a
      vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
      single call.
      
      Memory manager implementations will typically store the address of some
      allocation metadata in the executor in the FinalizedAlloc, as holding this
      metadata in the executor is often cheaper and may allow for clean deallocation
      even in failure cases where the connection with the controller is lost.
      
      (4) Support for "allocation actions" and finalize-lifetime memory.
      
      Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
      triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
      finalize request. At finalization time, after memory protections have been
      applied, each of the "finalize_act" elements will be called in order (skipping
      any elements whose fn value is zero) as
      
      ((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
                                           (size_t)arg_buffer_size);
      
      At deallocation time the deallocate elements will be run in reverse order (again
      skipping any elements where fn is zero).
      
      The returned char * should be null to indicate success, or a non-null
      heap-allocated string error message to indicate failure.
      
      These actions allow finalization and deallocation to be extended to include
      operations like registering and deregistering eh-frames, TLS sections,
      initializer and deinitializers, and language metadata sections. Previously these
      operations required separate callWrapper invocations. Compared to callWrapper
      invocations, actions require no extra IPC/RPC, reducing costs and eliminating
      a potential source of errors.
      
      Finalize lifetime memory can be used to support finalize actions: Sections with
      finalize lifetime should be destroyed by memory managers immediately after
      finalization actions have been run. Finalize memory can be used to support
      finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
      without incurring permanent memory overhead.
      e50aea58
    • Amara Emerson's avatar
      [AArch64][GlobalISel] Fix combiner assertion in matchConstantOp(). · 53ebfa7c
      Amara Emerson authored
      We shouldn't call APInt::getSExtValue() on a >64b value.
      53ebfa7c
    • Guozhi Wei's avatar
      [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation · 6599961c
      Guozhi Wei authored
      This patch contains following enhancements to SrcRegMap and DstRegMap:
      
        1 In findOnlyInterestingUse not only check if the Reg is two address usage,
          but also check after commutation can it be two address usage.
      
        2 If a physical register is clobbered, remove SrcRegMap entries that are
          mapped to it.
      
        3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
          entry only when the COPY instruction is coalescable. (The COPY src is
          killed)
      
      With these enhancements isProfitableToCommute can do better commute decision,
      and finally more register copies are removed.
      
      Differential Revision: https://reviews.llvm.org/D108731
      6599961c
  2. Oct 11, 2021
Loading