Skip to content
  1. May 21, 2020
    • Yevgeny Rouban's avatar
      [BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights() · 81384874
      Yevgeny Rouban authored
      Hide the method that allows setting probability for particular edge
      and introduce a public method that sets probabilities for all
      outgoing edges at once.
      Setting individual edge probability is error prone. More over it is
      difficult to check that the total probability is 1.0 because there is
      no easy way to know when the user finished setting all
      the probabilities.
      
      Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
      Changing unreachable branch probabilities to raw(1) and distributing
      the rest (oldProbability - raw(1)) over the reachable branches could
      introduce total probability inaccuracy bigger than 1/numOfBranches.
      
      Reviewers: yamauchi, ebrevnov
      Tags: #llvm
      Differential Revision: https://reviews.llvm.org/D79396
      81384874
    • Craig Topper's avatar
      [LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian... · ae5ab2f4
      Craig Topper authored
      [LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers.
      
      Will make it easier to pass the pointer info and alignment
      correctly to the loads/stores.
      
      While there also make the i32 stores independent and use a token
      factor to join before the load.
      ae5ab2f4
    • Juneyoung Lee's avatar
      Add CanonicalizeFreezeInLoops pass · d9a4a244
      Juneyoung Lee authored
      Summary:
      If an induction variable is frozen and used, SCEV yields imprecise result
      because it doesn't say anything about frozen variables.
      
      Due to this reason, performance degradation happened after
      https://reviews.llvm.org/D76483 is merged, causing
      SCEV yield imprecise result and preventing LSR to optimize a loop.
      
      The suggested solution here is to add a pass which canonicalizes frozen variables
      inside a loop. To be specific, it pushes freezes out of the loop by freezing
      the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
      This solution was also mentioned at https://reviews.llvm.org/D70623 .
      
      Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert
      
      Reviewed By: fhahn
      
      Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77523
      d9a4a244
    • Eli Friedman's avatar
      [AArch64] Fix unwind info generated by outliner. · b4f9b347
      Eli Friedman authored
      The offsets were wrong. The result is now the same as what the compiler
      would generate for a function that spills lr normally.
      
      Differential Revision: https://reviews.llvm.org/D80238
      b4f9b347
    • Eli Friedman's avatar
      Make Value::getPointerAlignment() return an Align, not a MaybeAlign. · f26bdb53
      Eli Friedman authored
      If we don't know anything about the alignment of a pointer, Align(1) is
      still correct: all pointers are at least 1-byte aligned.
      
      Included in this patch is a bugfix for an issue discovered during this
      cleanup: pointers with "dereferenceable" attributes/metadata were
      assumed to be aligned according to the type of the pointer.  This
      wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
      stop making this assumption. Frontends may need to be updated.  I
      updated clang's handling of C++ references, and added a release note for
      this.
      
      Differential Revision: https://reviews.llvm.org/D80072
      f26bdb53
    • Francis Visoiu Mistrih's avatar
      [AArch64] Provide Darwin variants of most calling conventions · 161122ea
      Francis Visoiu Mistrih authored
      With the new SVE stack layout, we now need to provide a Darwin variant
      for all the calling conventions based on the main AAPCS CSR save order.
      
      This also changes APCS_SwiftError to have a Darwin and a non-Darwin
      version, assuming it could be used on other platforms these days, and
      restricts the AArch64_CXX_TLS calling convention to Darwin.
      
      Differential Revision: https://reviews.llvm.org/D73805
      161122ea
    • Stanislav Mekhanoshin's avatar
      [AMDGPU] Always expand ext/insertelement with divergent idx · 4eecf171
      Stanislav Mekhanoshin authored
      Even though series of cmd/cndmask can produce quite a lot of
      code that is still better than a loop. In case of doubles we
      would even produce two loops.
      
      Differential Revision: https://reviews.llvm.org/D80032
      4eecf171
    • Craig Topper's avatar
      [LegalizeVectorTypes] Create correct memoperands in SplitVecRes_INSERT_SUBVECTOR. · 17bd86bc
      Craig Topper authored
      Previously this code just used a default constructed
      MachinePointerInfo. But we know the accesses are to a fixed stack
      object or at least somewhere on the stack.
      
      While there fix the alignment passed to the full vector load/stores.
      
      I don't think this function is currently exercised in tree so I
      don't know how to test it. I just noticed it when I removed
      non-constant index support in this function.
      
      Differential Revision: https://reviews.llvm.org/D80058
      17bd86bc
  2. May 20, 2020
    • Nico Weber's avatar
      Give microsoftDemangle() an outparam for how many input bytes were consumed. · bc1c3655
      Nico Weber authored
      Demangling Itanium symbols either consumes the whole input or fails,
      but Microsoft symbols can be successfully demangled with just some
      of the input.
      
      Add an outparam that enables clients to know how much of the input was
      consumed, and use this flag to give llvm-undname an opt-in warning
      on partially consumed symbols.
      
      Differential Revision: https://reviews.llvm.org/D80173
      bc1c3655
    • Roman Lebedev's avatar
      [InstCombine] `insertelement` is negatible if both sources are negatible · 55430f53
      Roman Lebedev authored
      ----------------------------------------
      define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %src
        %t1 = sub i4 0, %a
        %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
        %t3 = sub <2 x i4> %b, %t2
        ret <2 x i4> %t3
      }
      =>
      define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
      %0:
        %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
        %t3 = add <2 x i4> %t2.neg, %b
        ret <2 x i4> %t3
      }
      Transformation seems to be correct!
      55430f53
    • Roman Lebedev's avatar
      [InstCombine] Negator: `extractelement` is negatible if src is negatible · ebed96fd
      Roman Lebedev authored
      ----------------------------------------
      define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %x
        call void @use_v2i4(<2 x i4> %t0)
        %t1 = extractelement <2 x i4> %t0, i32 %y
        %t2 = sub i4 %z, %t1
        ret i4 %t2
      }
      =>
      define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
      %0:
        %t0 = sub <2 x i4> { 0, 0 }, %x
        call void @use_v2i4(<2 x i4> %t0)
        %t1.neg = extractelement <2 x i4> %x, i32 %y
        %t2 = add i4 %t1.neg, %z
        ret i4 %t2
      }
      Transformation seems to be correct!
      ebed96fd
    • aartbik's avatar
      [llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection · 645bba8d
      aartbik authored
      Summary:
      Fixes issue
      https://bugs.llvm.org/show_bug.cgi?id=45995
      
      Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer
      
      Reviewed By: craig.topper
      
      Subscribers: RKSimon, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D80231
      645bba8d
    • Arthur Eubanks's avatar
      Reland [X86] Codegen for preallocated · 8a887556
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      8a887556
    • Arthur Eubanks's avatar
      Revert "[X86] Codegen for preallocated" · b8cbff51
      Arthur Eubanks authored
      This reverts commit 810567dc.
      
      Some tests are unexpectedly passing
      b8cbff51
    • Hiroshi Yamauchi's avatar
      [ProfileSummary] Refactor getFromMD to prepare for another optional field. NFC. · f9a6163f
      Hiroshi Yamauchi authored
      Summary:
      Rename 'i' to 'I'.
      Factor out the optional field handling to getOptionalVal().
      Split out of D79951.
      
      Reviewers: davidxl
      
      Subscribers: eraman, hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D80230
      f9a6163f
    • Arthur Eubanks's avatar
      [X86] Codegen for preallocated · 810567dc
      Arthur Eubanks authored
      See https://reviews.llvm.org/D74651 for the preallocated IR constructs
      and LangRef changes.
      
      In X86TargetLowering::LowerCall(), if a call is preallocated, record
      each argument's offset from the stack pointer and the total stack
      adjustment. Associate the call Value with an integer index. Store the
      info in X86MachineFunctionInfo with the integer index as the key.
      
      This adds two new target independent ISDOpcodes and two new target
      dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
      
      The setup ISelDAG node takes in a chain and outputs a chain and a
      SrcValue of the preallocated call Value. It is lowered to a target
      dependent node with the SrcValue replaced with the integer index key by
      looking in X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
      %esp adjustment, the exact amount determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
      call Value, and the arg index int constant. It produces a chain and the
      pointer fo the arg. It is lowered to a target dependent node with the
      SrcValue replaced with the integer index key by looking in
      X86MachineFunctionInfo. In
      X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
      lea of the stack pointer plus an offset determined by looking in
      X86MachineFunctionInfo with the integer index key.
      
      Force any function containing a preallocated call to use the frame
      pointer.
      
      Does not yet handle a setup without a call, or a conditional call.
      Does not yet handle musttail. That requires a LangRef change first.
      
      Tried to look at all references to inalloca and see if they apply to
      preallocated. I've made preallocated versions of tests testing inalloca
      whenever possible and when they make sense (e.g. not alloca related,
      inalloca edge cases).
      
      Aside from the tests added here, I checked that this codegen produces
      correct code for something like
      
      ```
      struct A {
              A();
              A(A&&);
              ~A();
      };
      
      void bar() {
              foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
      }
      ```
      
      by replacing the inalloca version of the .ll file with the appropriate
      preallocated code. Running the executable produces the same results as
      using the current inalloca implementation.
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D77689
      810567dc
    • Matt Arsenault's avatar
      AMDGPU/GlobalISel: Fix splitting 64-bit extensions · e8f6b0e5
      Matt Arsenault authored
      This was replicating the low bits into the high bits for G_ZEXT,
      rather than using 0.
      e8f6b0e5
    • Pierre-vh's avatar
      [Target][ARM] Make Low Overhead Loops coexist with VPT blocks. · 835251f7
      Pierre-vh authored
      Previously, the LowOverheadLoops pass couldn't handle VPT blocks
      with conditions, or with multiple VCTPs. This patch improves the
      LowOverheadLoops pass so it can handle those cases.
      
      It also adds support for VCMPs before the VCTP.
      
      Differential Revision: https://reviews.llvm.org/D78206
      835251f7
    • Sam Parker's avatar
      [NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa
      Sam Parker authored
      Combine the two API calls into one by introducing a structure to hold
      the relevant data. This has the added benefit of moving the boiler
      plate code for arguments and flags, into the constructors. This is
      intended to be a non-functional change, but the complicated web of
      logic involved here makes it very hard to guarantee.
      
      Differential Revision: https://reviews.llvm.org/D79941
      8cc911fa
    • Georgii Rymar's avatar
      [yaml2obj] - Implement the "Offset" property for the Fill Chunk. · baf32259
      Georgii Rymar authored
      Similar to a regular section chunk, a Fill should have this property.
      This patch implements it.
      
      Differential revision: https://reviews.llvm.org/D80190
      baf32259
    • Florian Hahn's avatar
      [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). · bcbd26bf
      Florian Hahn authored
      SCEVExpander modifies the underlying function so it is more suitable in
      Transforms/Utils, rather than Analysis. This allows using other
      transform utils in SCEVExpander.
      
      This patch was originally committed as b8a3c34e, but broke the
      modules build, as LoopAccessAnalysis was using the Expander.
      
      The code-gen part of LAA was moved to lib/Transforms recently, so this
      patch can be landed again.
      
      Reviewers: sanjoy.google, efriedma, reames
      
      Reviewed By: sanjoy.google
      
      Differential Revision: https://reviews.llvm.org/D71537
      bcbd26bf
    • Kang Zhang's avatar
      [PowerPC] Enable machine verification for 3 passes · 3f376eca
      Kang Zhang authored
      Summary:
      For PowerPC, there are 3 passes has disabled the machine verification.
      ```
      PPCTargetMachine.cpp:    addPass(&LiveVariablesID, false);
      PPCTargetMachine.cpp:    addPass(createPPCEarlyReturnPass(), false);
      PPCTargetMachine.cpp:  addPass(createPPCBranchSelectionPass(), false);
      ```
      This patch is to enable machine verification for above three passes.
      
      Reviewed By: steven.zhang
      
      Differential Revision: https://reviews.llvm.org/D79840
      3f376eca
    • Simon Pilgrim's avatar
      CommandFlags.h - remove unnecessary includes. NFC. · d9b9ce6c
      Simon Pilgrim authored
      Replace with forward declarations and move necessary includes down to source files.
      
      Exposes an implicit dependency on TargetMachine.h in llvm-opt-fuzzer.cpp
      d9b9ce6c
    • Jay Foad's avatar
      [IR] Simplify BasicBlock::removePredecessor. NFCI. · e5fc9a36
      Jay Foad authored
      This is the second attempt at landing this patch, after fixing the
      KeepOneInputPHIs behaviour to also keep zero input PHIs.
      
      Differential Revision: https://reviews.llvm.org/D80141
      e5fc9a36
    • Jay Foad's avatar
      Revert "[IR] Simplify BasicBlock::removePredecessor. NFCI." · b42b30c3
      Jay Foad authored
      This reverts commit 59f49f7e.
      
      It was causing buildbot failures.
      b42b30c3
    • Stanislav Mekhanoshin's avatar
    • QingShan Zhang's avatar
      [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression · 2b59e9f1
      QingShan Zhang authored
      We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
      However, during negating the expression, the cost might change as we are changing the DAG,
      and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.
      
      This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
      and check the cost during negating the expression. It also reduce the duplicated code between
      getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638
      
      Reviewed By: RKSimon, spatel
      
      Differential Revision: https://reviews.llvm.org/D77319
      2b59e9f1
    • Matt Arsenault's avatar
      AMDGPU: Annotate functions that have stack objects · 21d2884a
      Matt Arsenault authored
      Relying on any MachineFunction state in the MachineFunctionInfo
      constructor is hazardous, because the construction time is unclear and
      determined by the first use. The function may be only partially
      constructed, which is part of why we have many of these hacky string
      attributes to track what we need for ABI lowering.
      
      For SelectionDAG, all stack objects are created up-front before
      calling convention lowering so stack objects are visible at
      construction time. For GlobalISel, none of the IR function has been
      visited yet and the allocas haven't been added to the MachineFrameInfo
      yet. This should fix failing to set flat_scratch_init in GlobalISel
      when needed.
      
      This pass really needs to be turned into some kind of analysis, but I
      haven't found a nice way use one here.
      21d2884a
    • Matt Arsenault's avatar
      GlobalISel: Copy correct flags to select · 08ae9453
      Matt Arsenault authored
      This was looking for a compare condition, and copying the compare
      flags. I don't think this was ever correct outside of certain min/max
      patterns which aren't checked, but this probably predates select
      instructions having fast math flags.
      08ae9453
    • Matt Arsenault's avatar
      AMDGPU: Fix DAG divergence for implicit function arguments · 074b8026
      Matt Arsenault authored
      This should be directly implied from the register class, and there's
      no need to special case live ins here. This was getting the wrong
      answer for the queue ptr argument in callable functions, since it's
      not an explicit IR argument and is always uniform.
      
      Fixes not using scalar loads for the aperture in addrspacecast
      lowering, and any other places that use implicit SGPR arguments.
      074b8026
    • Matt Arsenault's avatar
      AMDGPU: Use member initializers in MFI · 61813b80
      Matt Arsenault authored
      61813b80
    • Brian Cain's avatar
      [Hexagon] pX.new cannot be used with p3:0 as producer · cfba1a96
      Brian Cain authored
      Writes to p3:0 do not produce new values, we should bar any .new
      consumer trying to use it as a producer.
      cfba1a96
  3. May 19, 2020
Loading