Skip to content
  1. Jul 03, 2018
  2. Jul 02, 2018
    • Heejin Ahn's avatar
      [WebAssembly] Support for atomic stores · 402b4908
      Heejin Ahn authored
      Summary: Add support for atomic store instructions.
      
      Reviewers: dschuff
      
      Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48839
      
      llvm-svn: 336145
      402b4908
    • Vadzim Dambrouski's avatar
      [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m. · fd10286e
      Vadzim Dambrouski authored
      Reviewers: efriedma, rogfer01, javed.absar
      
      Reviewed By: efriedma, rogfer01
      
      Subscribers: kristof.beyls, chrib, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48846
      
      llvm-svn: 336144
      fd10286e
    • Andrea Di Biagio's avatar
      [llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder before... · 9b3cb081
      Andrea Di Biagio authored
      [llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder before we start analyzing a new CodeBlock. NFCI.
      
      Different CodeBlocks don't overlap. The same MCInst cannot appear in more than
      one code block because all blocks are instantiated before the simulation is run.
      
      We should always clear the content of map VariantDescriptors before every
      simulation, since VariantDescriptors cannot possibly store useful information
      for the next blocks. It is also "safer" to clear its content because `MCInst*`
      is used as the key type for map VariantDescriptors.
      
      llvm-svn: 336142
      9b3cb081
    • Tim Shen's avatar
      [SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428). · c7cef4bc
      Tim Shen authored
      Summary:
      Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
      SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
      to i64), disabling the entire loop versioning pass. Removed the zext and
      just use i64.
      
      Reviewers: sanjoy
      
      Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48409
      
      llvm-svn: 336140
      c7cef4bc
    • Dan Gohman's avatar
      [WebAssembly] Fix fast-isel optimization of branch conditions. · b01d8762
      Dan Gohman authored
      LLVM doesn't guarantee anything about the high bits of a register holding
      an i1 value at the IR level, so don't translate LLVM IR i1 values directly
      into WebAssembly conditional branch operands. WebAssembly's conditional
      branches do demand all 32 bits be valid.
      
      Fixes PR38019.
      
      llvm-svn: 336138
      b01d8762
    • Krzysztof Parzyszek's avatar
      [X86] Add phony registers for high halves of regs with low halves · fd974949
      Krzysztof Parzyszek authored
      Add registers still missing after r328016 (D43353):
      - for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
      - for bits 31-16 of R8-R15 (*WH).
      
      Thanks to Craig Topper for pointing it out.
      
      llvm-svn: 336134
      fd974949
    • Alina Sbirlea's avatar
      Replace "Replacable" with "Replaceable". [NFC] · 0e15501f
      Alina Sbirlea authored
      llvm-svn: 336133
      0e15501f
    • Fangrui Song's avatar
      Replace unused output filenames with /dev/null in tests · f50ad6c3
      Fangrui Song authored
      Similar to rLLD336129
      
      llvm-svn: 336131
      f50ad6c3
    • Farhana Aleen's avatar
      [SLP] Recognize min/max pattern using instructions producing same values. · 3b416db1
      Farhana Aleen authored
      Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.
      
               %1 = extractelement <2 x i32> %a, i32 0
               %2 = extractelement <2 x i32> %a, i32 1
               %cond = icmp sgt i32 %1, %2
               %3 = extractelement <2 x i32> %a, i32 0
               %4 = extractelement <2 x i32> %a, i32 1
               %select = select i1 %cond, i32 %3, i32 %4
      
      Author: FarhanaAleen
      
      Reviewed By: ABataev, RKSimon, spatel
      
      Differential Revision: https://reviews.llvm.org/D47608
      
      llvm-svn: 336130
      3b416db1
    • Sanjay Patel's avatar
      [InstCombine] reverse canonicalization of add --> or to allow more shuffle folding · b999d741
      Sanjay Patel authored
      This extends D48485 to allow another pair of binops (add/or) to be combined either
      with or without a leading shuffle:
      or X, C --> add X, C (when X and C have no common bits set)
      
      Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
      and we've added general infrastructure to allow extending to other opcodes or moving 
      to where other passes could use that functionality.
      
      Differential Revision: https://reviews.llvm.org/D48662
      
      llvm-svn: 336128
      b999d741
    • Francis Visoiu Mistrih's avatar
      [MC] Error on a .zerofill directive in a non-virtual section · 4d5b1073
      Francis Visoiu Mistrih authored
      On darwin, all virtual sections have zerofill type, and having a
      .zerofill directive in a non-virtual section is not allowed. Instead of
      asserting, show a nicer error.
      
      In order to use the equivalent of .zerofill in a non-virtual section,
      the usage of .zero of .space is required.
      
      This patch replaces the assert with an error.
      
      Differential Revision: https://reviews.llvm.org/D48517
      
      llvm-svn: 336127
      4d5b1073
    • Dave Lee's avatar
      nm: Add -no-weak flag for hiding weak symbols · d4f77a52
      Dave Lee authored
      Summary:
      This adds a new -no-weak flag to nm to hide weak symbols in its output.
      This also adds a -W alias for this which is analogous to -U.
      
      Patch by Keith Smiley
      
      Reviewers: kastiglione, enderby, compnerd
      
      Reviewed By: kastiglione
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48751
      
      llvm-svn: 336126
      d4f77a52
    • Simon Pilgrim's avatar
      [SLPVectorizer][X86] Begin adding alternate tests for call operators · 35f196c1
      Simon Pilgrim authored
      Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls
      
      llvm-svn: 336125
      35f196c1
    • Vedant Kumar's avatar
      Tighten up a test for -check-debugify, NFC · 9b6c096f
      Vedant Kumar authored
      Use an -implicit-check-not to make sure an error which should not occur
      in fact does not occur before the first CHECK line.
      
      Suggested by Paul Robinson in post-commit feedback for r335897.
      
      llvm-svn: 336123
      9b6c096f
    • Simon Pilgrim's avatar
      [CostModel][X86] Add cost tests for fp rounding intrinsics · ac193d4b
      Simon Pilgrim authored
      Add cost tests for fp ceil, floor, nearbyint, rint and trunc.
      
      llvm-svn: 336122
      ac193d4b
    • Craig Topper's avatar
      [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned. · 56440b97
      Craig Topper authored
      Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.
      
      Should fix PR38001
      
      llvm-svn: 336121
      56440b97
    • Amara Emerson's avatar
      [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin. · 846f2436
      Amara Emerson authored
      We currently don't any-extend vararg parameters before storing them to the stack
      locations on Darwin. However, SelectionDAG however does this, and so user code
      is in the wild which inadvertently relies on this extension. This can manifest
      in cases where the value stored is (int)0, but the actual parameter is interpreted
      by va_arg as a pointer, and so not extending to 64 bits causes the callee to
      load additional undefined bits.
      
      llvm-svn: 336120
      846f2436
    • Jakub Kuderski's avatar
      Revert "[Dominators] Add the DomTreeUpdater class" · 198f3b16
      Jakub Kuderski authored
      Temporary revert because of a failing test on some buildbots.
      
      This reverts commit r336114.
      
      llvm-svn: 336117
      198f3b16
    • Sam Clegg's avatar
      [WebAssembly] Convert remaining tests from elf to wasm output format · 7fecdef5
      Sam Clegg authored
      Differential Revision: https://reviews.llvm.org/D48748
      
      llvm-svn: 336116
      7fecdef5
    • Sjoerd Meijer's avatar
      Follow up of r335953 - [ARM][AArch64] Armv8.4-A Enablement · b0004b83
      Sjoerd Meijer authored
      Imply dotprod for armv8.4-a, because it is mandatory from v8.4.
      
      llvm-svn: 336115
      b0004b83
    • Jakub Kuderski's avatar
      [Dominators] Add the DomTreeUpdater class · e813a9b3
      Jakub Kuderski authored
      Summary:
      This patch is the first in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].
      
      This patch introduces the DomTreeUpdater class, which provides a cleaner API to perform updates on available dominator trees (none, only DomTree, only PostDomTree, both) using different update strategies (eagerly or lazily) to simplify the updating process.
      
      —Prior to the patch—
      
         - Directly calling update functions of DominatorTree updates the data structure eagerly while DeferredDominance does updates lazily.
         - DeferredDominance class cannot be used when a PostDominatorTree also needs to be updated.
         - Functions receiving DT/DDT need to branch a lot which is currently necessary.
         - Functions using both DomTree and PostDomTree need to call the update function separately on both trees.
         - People need to construct an additional DeferredDominance class to use functions only receiving DDT.
      
      —After the patch—
      
      Patch by Chijun Sima <simachijun@gmail.com>.
      
      Reviewers: kuhar, brzycki, dmgreen, grosser, davide
      
      Reviewed By: kuhar, brzycki
      
      Subscribers: vsk, mgorny, llvm-commits
      
      Author: NutshellySima
      
      Differential Revision: https://reviews.llvm.org/D48383
      
      llvm-svn: 336114
      e813a9b3
    • Simon Pilgrim's avatar
      [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values · 2bc8e079
      Simon Pilgrim authored
      We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.
      
      llvm-svn: 336113
      2bc8e079
    • Simon Pilgrim's avatar
      [X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend · a6be2437
      Simon Pilgrim authored
      We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends.
      
      llvm-svn: 336112
      a6be2437
    • Sanjay Patel's avatar
      [ValueTracking] allow undef elements when matching vector abs · 284ba0c1
      Sanjay Patel authored
      llvm-svn: 336111
      284ba0c1
    • Yaron Keren's avatar
      Disable failing test on x86_64-pc-windows-gnu, see PR38006. · d414c6c1
      Yaron Keren authored
      llvm-svn: 336110
      d414c6c1
    • David Stenberg's avatar
      [CodeGen] Make block removal order deterministic in CodeGenPrepare · 23bba56f
      David Stenberg authored
      Summary:
      Replace use of a SmallPtrSet with a SmallSetVector to make the worklist
      iteration order deterministic. This is done as the order the blocks are
      removed may affect whether or not PHI nodes in successor blocks are
      removed.
      
      For example, consider the following case where %bb1 and %bb2 are
      removed:
      
          bb1:
            br i1 undef, label %bb3, label %bb4
          bb2:
            br i1 undef, label %bb4, label %bb3
          bb3:
            pv1 = phi type [ undef, %bb1 ], [ undef, %bb2], [ v0, %other ]
            br label %bb4
          bb4:
            pv2 = phi type [ undef, %bb1 ], [ undef, %bb2 ],
                           [ pv1, %bb3 ], [ v0, %other ]
      
      If %bb2 is removed before %bb1, the incoming values from %bb1 and %bb2
      to pv1 will be removed before %bb1 is removed as a predecessor to %bb4.
      The pv1 node will thus be optimized out (to v0) at the time %bb1 is
      removed as a predecessor to %bb4, leaving the blocks as following when
      the incoming value from %bb1 has been removed:
      
          bb3: ; pv1 optimized out, incoming value to pv2 is v0
            br label %bb4
          bb4:
            pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
      
      The pv2 PHI node will be optimized away by removePredecessor() as all
      incoming values are identical.
      
      In case %bb2 is removed after %bb1, pv1 will not be optimized out at the
      time %bb2 is removed as a predecessor to %bb4, leaving the blocks as
      following when the incoming value from %bb2 to pv2 has been removed:
      
          bb3:
            pv1 = phi type [ undef, %bb2 ], [ v0, %other ]
            br label %bb4
          bb4:
            pv2 = phi type [ pv1, %bb3 ], [ v0, %other ]
      
      The pv2 PHI node will thus not be removed in this case, ultimately
      leading to the following output
      
          bb3: ; pv1 optimized out, incoming value to pv2 is v0
            br label %bb4
          bb4:
            pv2 = phi type [ v0, %bb3 ], [ v0, %other ]
      
      I have not looked into changing DeleteDeadBlock() so that the redundant
      PHI nodes are removed.
      
      I have not added a test case, as I was not able to create a particularly
      small and (not messy) reproducer. This is likely due to SmallPtrSet
      behaving deterministically when in small mode.
      
      Reviewers: void, dexonsmith, spatel, skatkov, fhahn, bkramer, nhaehnle
      
      Reviewed By: fhahn
      
      Subscribers: mgrang, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48369
      
      llvm-svn: 336109
      23bba56f
    • Alex Bradbury's avatar
      [X86] Fix test/MC/AsmParser/exprs-invalid.s after rL336104 · 07ef10cc
      Alex Bradbury authored
      This was my mistake for only running test/MC/X86 and test/CodeGen/X86. 
      Arguably .word should be removed from this test, as it is not supported 
      universally.
      
      llvm-svn: 336107
      07ef10cc
    • John Brawn's avatar
      [llvm-exegesis] Change how the native architecture is determined · 346856dc
      John Brawn authored
      Currently the llvm-exegesis native architecture is determined by comparing the
      llvm native architecture with X86, so to add a new target would mean adding a
      new check. Change this to building up a list of the targets llvm-exegesis
      supports then using that, as this means that when adding a new target you just
      add the target to the list of supported targets.
      
      Differential Revision: https://reviews.llvm.org/D48778
      
      llvm-svn: 336105
      346856dc
    • Alex Bradbury's avatar
      [X86] Use addAliasForDirective to support the .word directive (reland) · c4890878
      Alex Bradbury authored
      The X86 asm parser currently has custom parsing logic for .word. Rather than
      use this custom logic, we can just use addAliasForDirective to enable the
      reuse of AsmParser::parseDirectiveValue.
      
      See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
      (rL332607) backends.
      
      Differential Revision: https://reviews.llvm.org/D47004
      
      This is a fixed reland of rL336100. This should have been caught in 
      pre-commit testing so apologies for the noise.
      
      llvm-svn: 336104
      c4890878
    • Alex Bradbury's avatar
      Revert r336100 · c000e4dc
      Alex Bradbury authored
      This was a bad change. .word == 2byte on x86.
      
      llvm-svn: 336103
      c000e4dc
    • Simon Pilgrim's avatar
      [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost · d5fb50e3
      Simon Pilgrim authored
      This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.
      
      llvm-svn: 336102
      d5fb50e3
    • Sanjay Patel's avatar
      [InstCombine] adjust shuffle tests with IR flags; NFC · 951f617e
      Sanjay Patel authored
      Due to current limitations in constant analysis, we need flags
      on add or mul to show propagation for the potential transform
      suggested in these tests (no other binops currently report 
      identity constants).
      
      llvm-svn: 336101
      951f617e
    • Alex Bradbury's avatar
      [X86] Use addAliasForDirective to support the .word directive · 42485ec9
      Alex Bradbury authored
      The X86 asm parser currently has custom parsing logic for .word. Rather than 
      use this custom logic, we can just use addAliasForDirective to enable the 
      reuse of AsmParser::parseDirectiveValue.
      
      See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon 
      (rL332607) backends.
      
      Differential Revision: https://reviews.llvm.org/D47004
      
      llvm-svn: 336100
      42485ec9
    • John Brawn's avatar
      [llvm-exegesis] Delegate the decision of cycle counter name to the target · 8fc5ec78
      John Brawn authored
      Currently the cycle counter is taken from the subtarget schedule model, which
      isn't any use if the subtarget doesn't have one. Delegate the decision to the
      target benchmark runner, as it may know better what to do in that case, with
      the default being the current behaviour.
      
      Differential Revision: https://reviews.llvm.org/D48779
      
      llvm-svn: 336099
      8fc5ec78
    • Florian Hahn's avatar
      Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters. · 4ebba909
      Florian Hahn authored
      This version contains a fix to add values for which the state in ParamState change
      to the worklist if the state in ValueState did not change. To avoid adding the
      same value multiple times, mergeInValue returns true, if it added the value to
      the worklist. The value is added to the worklist depending on its state in
      ValueState.
      
      Original message:
      For comparisons with parameters, we can use the ParamState lattice
      elements which also provide constant range information. This improves
      the code for PR33253 further and gets us closer to use
      ValueLatticeElement for all values.
      
      Also, as we are using the range information in the solver directly, we
      do not need tryToReplaceWithConstantRange afterwards anymore.
      
      Reviewers: dberlin, mssimpso, davide, efriedma
      
      Reviewed By: mssimpso
      
      Differential Revision: https://reviews.llvm.org/D43762
      
      llvm-svn: 336098
      4ebba909
    • Sanjay Patel's avatar
      [InstCombine] add tests for shuffle-binop; NFC · d9800845
      Sanjay Patel authored
      This is another pattern mentioned in PR37806.
      
      llvm-svn: 336096
      d9800845
    • Simon Pilgrim's avatar
      [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. · 265793d5
      Simon Pilgrim authored
      We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.
      
      This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...
      
      llvm-svn: 336095
      265793d5
    • Simon Pilgrim's avatar
      [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for... · 409bd5f4
      Simon Pilgrim authored
      [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
      
      Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.
      
      llvm-svn: 336092
      409bd5f4
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector) · 8d4c01a7
      Sander de Smalen authored
      Increments/decrements the result with the number of active bits
      from the predicate.
      
      The inc/dec variants added are:
      - incp   x0, p0.h     (scalar)
      - incp   z0.h, p0     (vector)
      
      The unsigned saturating inc/dec variants added are:
      - uqincp x0, p0.h     (scalar)
      - uqincp w0, p0.h     (scalar, 32bit)
      - uqincp z0.h, p0     (vector)
      
      The signed saturating inc/dec variants added are:
      - sqincp x0, p0.h     (scalar)
      - sqincp x0, p0.h, w0 (scalar, 32bit)
      - sqincp z0.h, p0     (vector)
      
      llvm-svn: 336091
      8d4c01a7
Loading