Skip to content
  1. May 21, 2020
  2. May 20, 2020
    • Nico Weber's avatar
      Give microsoftDemangle() an outparam for how many input bytes were consumed. · bc1c3655
      Nico Weber authored
      Demangling Itanium symbols either consumes the whole input or fails,
      but Microsoft symbols can be successfully demangled with just some
      of the input.
      
      Add an outparam that enables clients to know how much of the input was
      consumed, and use this flag to give llvm-undname an opt-in warning
      on partially consumed symbols.
      
      Differential Revision: https://reviews.llvm.org/D80173
      bc1c3655
    • Roman Lebedev's avatar
      [InstCombine] `insertelement` is negatible if both sources are negatible · 55430f53
      Roman Lebedev authored
      ----------------------------------------
      define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %src
        %t1 = sub i4 0, %a
        %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
        %t3 = sub <2 x i4> %b, %t2
        ret <2 x i4> %t3
      }
      =>
      define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
      %0:
        %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
        %t3 = add <2 x i4> %t2.neg, %b
        ret <2 x i4> %t3
      }
      Transformation seems to be correct!
      55430f53
    • Roman Lebedev's avatar
      [InstCombine] Negator: `extractelement` is negatible if src is negatible · ebed96fd
      Roman Lebedev authored
      ----------------------------------------
      define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %x
        call void @use_v2i4(<2 x i4> %t0)
        %t1 = extractelement <2 x i4> %t0, i32 %y
        %t2 = sub i4 %z, %t1
        ret i4 %t2
      }
      =>
      define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %x
        call void @use_v2i4(<2 x i4> %t0)
        %t1.neg = extractelement <2 x i4> %x, i32 %y
        %t2 = add i4 %t1.neg, %z
        ret i4 %t2
      }
      Transformation seems to be correct!
      ebed96fd
    • aartbik's avatar
      [llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection · 645bba8d
      aartbik authored
      Summary:
      Fixes issue
      https://bugs.llvm.org/show_bug.cgi?id=45995
      
      Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer
      
      Reviewed By: craig.topper
      
      Subscribers: RKSimon, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D80231
      645bba8d
    • Arthur Eubanks's avatar
      Reland [X86] Codegen for preallocated · 8a887556
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      8a887556
    • Arthur Eubanks's avatar
      Revert "[X86] Codegen for preallocated" · b8cbff51
      Arthur Eubanks authored
      This reverts commit 810567dc.
      
      Some tests are unexpectedly passing
      b8cbff51
    • Hiroshi Yamauchi's avatar
      [ProfileSummary] Refactor getFromMD to prepare for another optional field. NFC. · f9a6163f
      Hiroshi Yamauchi authored
      Summary:
      Rename 'i' to 'I'.
      Factor out the optional field handling to getOptionalVal().
      Split out of D79951.
      
      Reviewers: davidxl
      
      Subscribers: eraman, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D80230
      f9a6163f
    • Arthur Eubanks's avatar
      [X86] Codegen for preallocated · 810567dc
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      810567dc
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Fix splitting 64-bit extensions · e8f6b0e5
      Matt Arsenault authored
      This was replicating the low bits into the high bits for G_ZEXT,
      rather than using 0.
      e8f6b0e5
    • Pierre-vh's avatar
      [Target][ARM] Make Low Overhead Loops coexist with VPT blocks. · 835251f7
      Pierre-vh authored
      Previously, the LowOverheadLoops pass couldn't handle VPT blocks
      with conditions, or with multiple VCTPs. This patch improves the
      LowOverheadLoops pass so it can handle those cases.
      
      It also adds support for VCMPs before the VCTP.
      
      Differential Revision: https://reviews.llvm.org/D78206
      835251f7
    • Sam Parker's avatar
      [NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa
      Sam Parker authored
      Combine the two API calls into one by introducing a structure to hold
      the relevant data. This has the added benefit of moving the boiler
      plate code for arguments and flags, into the constructors. This is
      intended to be a non-functional change, but the complicated web of
      logic involved here makes it very hard to guarantee.
      
      Differential Revision: https://reviews.llvm.org/D79941
      8cc911fa
    • Georgii Rymar's avatar
      [yaml2obj] - Implement the "Offset" property for the Fill Chunk. · baf32259
      Georgii Rymar authored
      Similar to a regular section chunk, a Fill should have this property.
      This patch implements it.
      
      Differential revision: https://reviews.llvm.org/D80190
      baf32259
    • Florian Hahn's avatar
      [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). · bcbd26bf
      Florian Hahn authored
      SCEVExpander modifies the underlying function so it is more suitable in
      Transforms/Utils, rather than Analysis. This allows using other
      transform utils in SCEVExpander.
      
      This patch was originally committed as b8a3c34e, but broke the
      modules build, as LoopAccessAnalysis was using the Expander.
      
      The code-gen part of LAA was moved to lib/Transforms recently, so this
      patch can be landed again.
      
      Reviewers: sanjoy.google, efriedma, reames
      
      Reviewed By: sanjoy.google
      
      Differential Revision: https://reviews.llvm.org/D71537
      bcbd26bf
    • Kang Zhang's avatar
      [PowerPC] Enable machine verification for 3 passes · 3f376eca
      Kang Zhang authored
      Summary:
      For PowerPC, there are 3 passes has disabled the machine verification.
      ```
      PPCTargetMachine.cpp:    addPass(&LiveVariablesID, false);
      PPCTargetMachine.cpp:    addPass(createPPCEarlyReturnPass(), false);
      PPCTargetMachine.cpp:  addPass(createPPCBranchSelectionPass(), false);
      ```
      This patch is to enable machine verification for above three passes.
      
      Reviewed By: steven.zhang
      
      Differential Revision: https://reviews.llvm.org/D79840
      3f376eca
    • Simon Pilgrim's avatar
      CommandFlags.h - remove unnecessary includes. NFC. · d9b9ce6c
      Simon Pilgrim authored
      Replace with forward declarations and move necessary includes down to source files.
      
      Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp
      d9b9ce6c
    • Jay Foad's avatar
      [IR] Simplify BasicBlock::removePredecessor. NFCI. · e5fc9a36
      Jay Foad authored
      This is the second attempt at landing this patch, after fixing the
      KeepOneInputPHIs behaviour to also keep zero input PHIs.
      
      Differential Revision: https://reviews.llvm.org/D80141
      e5fc9a36
    • Jay Foad's avatar
      Revert "[IR] Simplify BasicBlock::removePredecessor. NFCI." · b42b30c3
      Jay Foad authored
      This reverts commit 59f49f7e.
      
      It was causing buildbot failures.
      b42b30c3
    • Stanislav Mekhanoshin's avatar
    • QingShan Zhang's avatar
      [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression · 2b59e9f1
      QingShan Zhang authored
      We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
      However, during negating the expression, the cost might change as we are changing the DAG,
      and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
      
      This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
      and check the cost during negating the expression. It also reduce the duplicated code between
      getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
      
      Reviewed By: RKSimon, spatel
      
      Differential Revision: https://reviews.llvm.org/D77319
      2b59e9f1
    • Matt Arsenault's avatar
      AMDGPU: Annotate functions that have stack objects · 21d2884a
      Matt Arsenault authored
      Relying on any MachineFunction state in the MachineFunctionInfo
      constructor is hazardous, because the construction time is unclear and
      determined by the first use. The function may be only partially
      constructed, which is part of why we have many of these hacky string
      attributes to track what we need for ABI lowering.
      
      For SelectionDAG, all stack objects are created up-front before
      calling convention lowering so stack objects are visible at
      construction time. For GlobalISel, none of the IR function has been
      visited yet and the allocas haven't been added to the MachineFrameInfo
      yet. This should fix failing to set flat_scratch_init in GlobalISel
      when needed.
      
      This pass really needs to be turned into some kind of analysis, but I
      haven't found a nice way use one here.
      21d2884a
    • Matt Arsenault's avatar
      GlobalISel: Copy correct flags to select · 08ae9453
      Matt Arsenault authored
      This was looking for a compare condition, and copying the compare
      flags. I don't think this was ever correct outside of certain min/max
      patterns which aren't checked, but this probably predates select
      instructions having fast math flags.
      08ae9453
    • Matt Arsenault's avatar
      AMDGPU: Fix DAG divergence for implicit function arguments · 074b8026
      Matt Arsenault authored
      This should be directly implied from the register class, and there's
      no need to special case live ins here. This was getting the wrong
      answer for the queue ptr argument in callable functions, since it's
      not an explicit IR argument and is always uniform.
      
      Fixes not using scalar loads for the aperture in addrspacecast
      lowering, and any other places that use implicit SGPR arguments.
      074b8026
    • Matt Arsenault's avatar
      AMDGPU: Use member initializers in MFI · 61813b80
      Matt Arsenault authored
      61813b80
    • Brian Cain's avatar
      [Hexagon] pX.new cannot be used with p3:0 as producer · cfba1a96
      Brian Cain authored
      Writes to p3:0 do not produce new values, we should bar any .new
      consumer trying to use it as a producer.
      cfba1a96
  3. May 19, 2020
Loading