Skip to content
  1. Jul 11, 2018
  2. Jul 10, 2018
    • Craig Topper's avatar
      [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI · dea0b88b
      Craig Topper authored
      These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.
      
      There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.
      
      We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.
      
      So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.
      
      llvm-svn: 336728
      dea0b88b
    • Scott Linder's avatar
      [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC) · 01ce144d
      Scott Linder authored
      llvm-svn: 336722
      01ce144d
    • Teresa Johnson's avatar
      [ThinLTO] Use std::map to get determistic imports files · c0320ef4
      Teresa Johnson authored
      Summary:
      I noticed that the .imports files emitted for distributed ThinLTO
      backends do not have consistent ordering. This is because StringMap
      iteration order is not guaranteed to be deterministic. Since we already
      have a std::map with this information, used when emitting the individual
      index files (ModuleToSummariesForIndex), use it for the imports files as
      well.
      
      This issue is likely causing some unnecessary rebuilds of the ThinLTO
      backends in our distributed build system as the imports files are inputs
      to those backends.
      
      Reviewers: pcc, steven_wu, mehdi_amini
      
      Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D48783
      
      llvm-svn: 336721
      c0320ef4
    • Craig Topper's avatar
      [X86] Remove dead SDNode object from X86InstrFragmentsSIMD.td. NFC · fb302d01
      Craig Topper authored
      It points to an opcode that doesn't exist.
      
      llvm-svn: 336720
      fb302d01
    • Craig Topper's avatar
      [X86] Remove AddedComplexity from register form of NOT. NFCI · 04ded1ac
      Craig Topper authored
      I believe isProfitableToFold will stop the load folding that this was intended to overcome.
      
      Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.
      
      llvm-svn: 336712
      04ded1ac
    • Craig Topper's avatar
      [X86] Remove AddedComplexity from MMX_X86movw2d patterns. · 0f6275ab
      Craig Topper authored
      There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything.
      
      llvm-svn: 336711
      0f6275ab
    • Scott Linder's avatar
      [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC) · 2ad2c18b
      Scott Linder authored
      Move all metadata construction into AMDGPUHSAMetadataStreamer.
      
      Differential Revision: https://reviews.llvm.org/D48176
      
      llvm-svn: 336707
      2ad2c18b
    • Alexander Ivchenko's avatar
      [GlobalISel][X86_64] Support for G_SITOFP · 48ca0550
      Alexander Ivchenko authored
      The instruction selection is automatically handled by tablegen
      
      llvm-svn: 336703
      48ca0550
    • Eugene Leviant's avatar
      [Evaluator] Examine alias when evaluating function call · 6a572b8e
      Eugene Leviant authored
      This fixes PR38120
      
      llvm-svn: 336702
      6a572b8e
    • Simon Pilgrim's avatar
      [DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1 · 4cb46093
      Simon Pilgrim authored
      udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts.
      
      llvm-svn: 336701
      4cb46093
    • Jonas Devlieghere's avatar
      Revert "[AccelTable] Provide abstraction for emitting DWARF5 accelerator tables." · 3192e35b
      Jonas Devlieghere authored
      This reverts r336529 because an alternative approach turned out to be a
      better fit for dsymuil.
      
      llvm-svn: 336698
      3192e35b
    • Konstantin Zhuravlyov's avatar
      AMDGPU: Make hidden argument metadata consistent with · f0badd5a
      Konstantin Zhuravlyov authored
      amdgpu-implicitarg-num-bytes attribute
      
      Differential Revision: https://reviews.llvm.org/D49096
      
      llvm-svn: 336697
      f0badd5a
    • Sanjay Patel's avatar
      [InstCombine] allow flag propagation when using safe constant · c8d9d812
      Sanjay Patel authored
      This corresponds with the code for the single binop pattern
      added in rL336684.
      
      llvm-svn: 336696
      c8d9d812
    • Ulrich Weigand's avatar
      [gcov] Fix ABI when calling llvm_gcov_... routines from instrumentation code · b961fdc5
      Ulrich Weigand authored
      The llvm_gcov_... routines in compiler-rt are regular C functions that
      need to be called using the proper C ABI for the target. The current
      code simply calls them using plain LLVM IR types. Since the type are
      mostly simple, this happens to just work on certain targets. But other
      targets still need special handling; in particular, it may be necessary
      to sign- or zero-extended sub-word values to comply with the ABI. This
      caused gcov failures on SystemZ in particular.
      
      Now the very same problem was already fixed for the llvm_profile_ calls
      here: https://reviews.llvm.org/D21736
      
      This patch uses the same method to fix the llvm_gcov_ calls, in
      particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and
      llvm_gcda_emit_arcs.
      
      Reviewed By: marco-c
      
      Differential Revision: https://reviews.llvm.org/D49134
      
      llvm-svn: 336692
      b961fdc5
    • Jonas Devlieghere's avatar
      [MC] Add interface to finish pending labels. · e13e6dbe
      Jonas Devlieghere authored
      When manually finishing the object writer in dsymutil, it's possible
      that there are pending labels that haven't been resolved. This results
      in an assertion when the assembler tries to fixup a label that doesn't
      have an address yet.
      
      Differential revision: https://reviews.llvm.org/D49131
      
      llvm-svn: 336688
      e13e6dbe
    • Sanjay Patel's avatar
      [InstCombine] safely allow non-commutative binop identity constant folds · 509a1e7a
      Sanjay Patel authored
      This was originally intended with D48893, but as discussed there, we
      have to make the folds safe from producing extra poison. This should
      give the single binop folds the same capabilities as the existing
      folds for 2-binops+shuffle.
      
      LLVM binary opcode review: there are a total of 18 binops. There are 7 
      commutative binops (add, mul, and, or, xor, fadd, fmul) which we already 
      fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
      fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't 
      bother with sub/fsub with constant operand 1 because those are 
      canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.
      
      llvm-svn: 336684
      509a1e7a
    • Paul Robinson's avatar
      Support -fdebug-prefix-map in llvm-mc. This is useful to omit the · c17c8bf7
      Paul Robinson authored
      debug compilation dir when compiling assembly files with -g.
      Part of PR38050.
      
      Patch by Siddhartha Bagaria!
      
      Differential Revision: https://reviews.llvm.org/D48988
      
      llvm-svn: 336680
      c17c8bf7
    • Sanjay Patel's avatar
      3333106a
    • Sander de Smalen's avatar
      [AArch64][SVE] Asm: Support for predicated unary operations. · 53108d48
      Sander de Smalen authored
      This patch adds support for the following instructions:
        CLS  (Count Leading Sign bits)
        CLZ  (Count Leading Zeros)
        CNT  (Count non-zero bits)
        CNOT (Logically invert boolean condition in vector)
        NOT  (Bitwise invert vector)
        FABS (Floating-point absolute value)
        FNEG (Floating-point negate)
      
      All operations are predicated and unary, e.g.
        clz  z0.s, p0/m, z1.s
      
      - CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
        and 64 bit elements.
      
      - FABS and FNEG have variants for 16, 32 and 64 bit elements.
      
      llvm-svn: 336677
      53108d48
    • Matt Arsenault's avatar
      Reapply "AMDGPU: Force inlining if LDS global address is used" · a680199a
      Matt Arsenault authored
      This reverts commit r336623
      
      llvm-svn: 336675
      a680199a
    • Sanjay Patel's avatar
      [InstCombine] allow more shuffle-binop folds with safe constants · 06ea4206
      Sanjay Patel authored
      The case with 2 variables is more complicated than the case where
      we eliminate the shuffle entirely because a shuffle with an undef 
      mask element creates an undef result. 
      
      I'm not aware of any current analysis/transform that recognizes that 
      undef propagating to a div/rem/shift, but we have to guard against 
      the possibility.
      
      llvm-svn: 336668
      06ea4206
    • Anastasis Grammenos's avatar
      [DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add · 612bf7ca
      Anastasis Grammenos authored
      Differential Revision: https://reviews.llvm.org/D48968
      
      llvm-svn: 336667
      612bf7ca
    • Simon Pilgrim's avatar
      [DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid recursive combining. · 641097d5
      Simon Pilgrim authored
      As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.
      
      llvm-svn: 336664
      641097d5
    • Krzysztof Parzyszek's avatar
      [Hexagon] Add implicit uses even when untied explicit uses are present · c052451a
      Krzysztof Parzyszek authored
      An explicit untied use is not sufficient to maintain liveness of a
      register redefined in a predicated instruction. For example
        %1 = COPY %0
        ...
        %1 = A2_paddif %2, %1, 1
      could become
        $r1 = COPY $r0
        ...
        $r1 = A2_paddif $p0, $r1, 1
      and later
        $r1 = COPY $r0                ;; this is not really dead!
        ...
        $r1 = A2_paddif $p0, $r0, 1
      
      llvm-svn: 336662
      c052451a
    • Karl-Johan Karlsson's avatar
      [LowerSwitch] Fixed faulty PHI nodes · 1ffeb5d7
      Karl-Johan Karlsson authored
      Summary:
      Fixed two cases of where PHI nodes need to be updated by lowerswitch.
      
      When lowerswitch find out that the switch default branch is not
      reachable it remove the old default and replace it with the most
      popular block from the cases, but it forget to update the PHI
      nodes in the default block.
      
      The PHI nodes also need to be updated when the switch is replaced
      with a single branch.
      
      Reviewers: hans, reames, arsenm
      
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D47203
      
      llvm-svn: 336659
      1ffeb5d7
    • Sam McCall's avatar
      [Support] Harded JSON against invalid UTF-8. · e6057bc6
      Sam McCall authored
      Parsing invalid UTF-8 input is now a parse error.
      Creating JSON values from invalid UTF-8 now triggers an assertion, and
      (in no-assert builds) substitutes the unicode replacement character.
      Strings retrieved from json::Value are always valid UTF-8.
      
      llvm-svn: 336657
      e6057bc6
    • Simon Pilgrim's avatar
      [DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI. · ce5c19b6
      Simon Pilgrim authored
      As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.
      
      llvm-svn: 336656
      ce5c19b6
    • Chandler Carruth's avatar
      [PM/Unswitch] Fix unused variable in r336646. · 148861f5
      Chandler Carruth authored
      llvm-svn: 336647
      148861f5
    • Chandler Carruth's avatar
      [PM/Unswitch] Fix a collection of closely related issues with trivial · 47dc3a34
      Chandler Carruth authored
      switch unswitching.
      
      The core problem was that the way we handled unswitching trivial exit
      edges through the default successor of a switch. For some reason
      I thought the right way to do this was to add a block containing
      unreachable and point the default successor at this block. In
      retrospect, this has an amazing number of problems.
      
      The first issue is the one that this pass has always worked around -- we
      have to *detect* such edges and avoid unswitching them again. This
      seemed pretty easy really. You juts look for an edge to a block
      containing unreachable. However, this pattern is woefully unsound. So
      many things can break it. The amazing thing is that I found a test case
      where *simple-loop-unswitch itself* breaks this! When we do
      a *non-trivial* unswitch of a switch we will end up splitting this exit
      edge. The result will be a default successor that is an exit and
      terminates in ... a perfectly normal branch. So the first test case that
      I started trying to fix is added to the nontrivial test cases. This is
      a ridiculous example that did just amazing things previously. With just
      unswitch, it would create 10+ copies of this stuff stamped out. But if
      you combine it *just right* with a bunch of other passes (like
      simplify-cfg, loop rotate, and some LICM) you can get it to do this
      infinitely. Or at least, I never got it to finish. =[
      
      This, in turn, uncovered another related issue. When we are manipulating
      these switches after doing a trivial unswitch we never correctly updated
      PHI nodes to reflect our edits. As soon as I started changing how these
      edges were managed, it became obvious there were more issues that
      I couldn't realistically leave unaddressed, so I wrote more test cases
      around PHI updates here and ensured all of that works now.
      
      And this, in turn, required some adjustment to how we collect and manage
      the exit successor when it is the default successor. That showed a clear
      bug where we failed to include it in our search for the outer-most loop
      reached by an unswitched exit edge. This was actually already tested and
      the test case didn't work. I (wrongly) thought that was due to SCEV
      failing to analyze the switch. In fact, it was just a simple bug in the
      code that skipped the default successor. While changing this, I handled
      it correctly and have updated the test to reflect that we now get
      precise SCEV analysis of trip counts for the outer loop in one of these
      cases.
      
      llvm-svn: 336646
      47dc3a34
    • Simon Pilgrim's avatar
      [X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3) · d32ca2c0
      Simon Pilgrim authored
      Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.
      
      This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.
      
      Differential Revision: https://reviews.llvm.org/D48936
      
      llvm-svn: 336642
      d32ca2c0
    • Craig Topper's avatar
      [X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg... · 08b81a55
      Craig Topper authored
      [X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority.
      
      This is a much more direct way to solve the issue than just giving extra priority.
      
      llvm-svn: 336639
      08b81a55
    • Craig Topper's avatar
      [X86] Remove some seemingly unnecessary patterns. · db73f564
      Craig Topper authored
      We're missing the EVEX equivalents of these patterns and seem to get along fine.
      
      I think we end up with X86vzload for the obvious IR cases that would produce this DAG.
      
      llvm-svn: 336638
      db73f564
    • Craig Topper's avatar
      [X86] Correct vfixupimm load patterns to look for an integer load, not a... · 866a377e
      Craig Topper authored
      [X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.
      
      DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.
      
      llvm-svn: 336626
      866a377e
Loading