Skip to content
  1. Jan 26, 2016
    • Dan Gohman's avatar
      [MC] Use .p2align instead of .align · 61d15ae4
      Dan Gohman authored
      For historic reasons, the behavior of .align differs between targets.
      Fortunately, there are alternatives, .p2align and .balign, which make the
      interpretation of the parameter explicit, and which behave consistently across
      targets.
      
      This patch teaches MC to use .p2align instead of .align, so that people reading
      code for multiple architectures don't have to remember which way each platform
      does its .align directive.
      
      Differential Revision: http://reviews.llvm.org/D16549
      
      llvm-svn: 258750
      61d15ae4
  2. Jan 25, 2016
  3. Jan 24, 2016
  4. Jan 23, 2016
  5. Jan 22, 2016
    • Sanjay Patel's avatar
      fixed to test features, not CPU models · 908ea731
      Sanjay Patel authored
      llvm-svn: 258568
      908ea731
    • Matt Arsenault's avatar
      AMDGPU: Add new name for barrier intrinsic · 10ca39ca
      Matt Arsenault authored
      llvm-svn: 258558
      10ca39ca
    • Matt Arsenault's avatar
      AMDGPU: Rename intrinsics to use amdgcn prefix · bef34e21
      Matt Arsenault authored
      The intrinsic target prefix should match the target name
      as it appears in the triple.
      
      This is not yet complete, but gets most of the important ones.
      llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
      for compatability for now.
      
      llvm-svn: 258557
      bef34e21
    • Ahmed Bougacha's avatar
      [AArch64] Cleanup ccmp test check labels. NFC. · 8e491e2d
      Ahmed Bougacha authored
      llvm-svn: 258541
      8e491e2d
    • Matt Arsenault's avatar
      AMDGPU: Fix crash with invariant markers · 0b783ef0
      Matt Arsenault authored
      The promote alloca pass didn't handle these intrinsics and crashed.
      These intrinsics should accept any address space, but for now just
      erase them to avoid breaking.
      
      llvm-svn: 258537
      0b783ef0
    • Jingyue Wu's avatar
      [NVPTX] expand mul_lohi to mul_lo and mul_hi · 585ec867
      Jingyue Wu authored
      Summary: Fixes PR26186.
      
      Reviewers: grosser, jholewinski
      
      Subscribers: jholewinski, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D16479
      
      llvm-svn: 258536
      585ec867
    • Ahmed Bougacha's avatar
      [AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs. · 99209b90
      Ahmed Bougacha authored
      The current behavior is incorrect, as the two CCs returned by
      changeFPCCToAArch64CC, intended to be OR'ed, are instead used
      in an AND ccmp chain.
      
      Consider:
      define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
        %cc1 = fcmp one float %a, %b
        %cc2 = fcmp olt float %c, %d
        %and = and i1 %cc1, %cc2
        %r = select i1 %and, i32 %e, i32 %f
        ret i32 %r
      }
      
      Assuming (%a < %b) and (%c < %d); we used to do:
        fcmp  s0, s1            # nzcv <- 1000
        orr   w8, wzr, #0x1     # w8 <- 1
        csel  w9, w8, wzr, mi   # w9 <- 1
        csel  w8, w8, w9, gt    # w8 <- 1
        fcmp  s2, s3            # nzcv <- 1000
        cset   w9, mi           # w9 <- 1
        tst    w8, w9           # (w8 & w9) == 1, so: nzcv <- 0000
        csel  w0, w0, w1, ne    # w0 <- w0
      
      We now do:
        fcmp  s2, s3            # nzcv <- 1000
        fccmp s0, s1, #0, mi    #  mi, so: nzcv <- 1000
        fccmp s0, s1, #8, le    # !le, so: nzcv <- 1000
        csel  w0, w0, w1, pl    # !pl, so: w0 <- w1
      
      In other words, we transformed:
        (c < d) &&  ((a < b) || (a > b))
      into:
        (c < d) &&   (a u>= b) && (a u<= b)
      whereas, per De Morgan's, we wanted:
        (c < d) && !((a u>= b) && (a u<= b))
      
      Note that this problem doesn't occur in the test-suite.
      
      changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
      We can't represent that in the fccmp chain; it can't express
      arbitrary OR sequences, as one comment explains:
        In general we can create code for arbitrary "... (and (and A B) C)"
        sequences.  We can also implement some "or" expressions, because
        "(or A B)" is equivalent to "not (and (not A) (not B))" and we can
        implement some  negation operations. [...] However there is no way
        to negate the result of a partial sequence.
      
      Instead, introduce changeFPCCToANDAArch64CC, which produces the
      conjunct cond codes:
      - (a one b)
          == ((a olt b) || (a ogt b))
          == ((a ord b) && (a une b))
      - (a ueq b)
          == ((a uno b) || (a oeq b))
          == ((a ule b) && (a uge b))
      
      Note that, at first, one might think that, when PushNegate is true,
      we should use the disjunct CCs, in effect doing:
        (a || b)
        = !(!a && !(b))
        = !(!a && !(b1 || b2))  <- changeFPCCToAArch64CC(b, b1, b2)
        = !(!a && !b1 && !b2)
      
      However, we can take advantage of the fact that the CC is already
      negated, which lets us avoid special-casing PushNegate and doing
      the simpler to reason about:
      
        (a || b)
        = !(!a && (!b))
        = !(!a && (b1 && b2))   <- changeFPCCToANDAArch64CC(!b, b1, b2)
        = !(!a && b1 && b2)
      
      This makes both emitConditionalCompare cases behave identically,
      and produces correct ccmp sequences for the 2-CC fcmps.
      
      llvm-svn: 258533
      99209b90
    • Krzysztof Parzyszek's avatar
      [Hexagon] Use general purpose registers to spill pred/mod registers into · 7b413c6c
      Krzysztof Parzyszek authored
      Patch by Tobias Edler Von Koch.
      
      llvm-svn: 258527
      7b413c6c
    • Matt Arsenault's avatar
      AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix · 59bd3014
      Matt Arsenault authored
      These ones aren't directly emitted by mesa and inserted by a pass.
      
      llvm-svn: 258523
      59bd3014
    • Matt Arsenault's avatar
      AMDGPU: Remove AMDGPU.fract intrinsic · 0cbaa176
      Matt Arsenault authored
      Mesa doesn't use this, and this is pattern matched already
      from fsub x, (ffloor x)
      
      llvm-svn: 258513
      0cbaa176
    • Dan Gohman's avatar
      [SelectionDAG] Fold more offsets into GlobalAddresses · 0bf3ae84
      Dan Gohman authored
      This reapplies r258296 and r258366, and also fixes an existing bug in
      SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
      offset in a GlobalAddressSDNode, which is uncovered by those patches.
      
      llvm-svn: 258482
      0bf3ae84
    • Pirama Arumuga Nainar's avatar
      Do not lower VSETCC if operand is an f16 vector · 71e9a2a4
      Pirama Arumuga Nainar authored
      Summary:
      SETCC with f16 vectors has OperationAction set to Expand but still gets
      lowered to FCM* intrinsics based on its result type.  This patch skips
      lowering of VSETCC if the operand is an f16 vector.
      
      v4 and v8 tests included.
      
      Reviewers: ab, jmolloy
      
      Subscribers: srhines, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D15361
      
      llvm-svn: 258471
      71e9a2a4
    • Reid Kleckner's avatar
      Revert "[SelectionDAG] Fold more offsets into GlobalAddresses" · b7ecfa5b
      Reid Kleckner authored
      This reverts r258296 and the follow up r258366. With this change, we
      miscompiled the following program on Windows:
        #include <string>
        #include <iostream>
        static const char kData[] = "asdf jkl;";
        int main() {
          std::string s(kData + 3, sizeof(kData) - 3);
          std::cout << s << '\n';
        }
      
      llvm-svn: 258465
      b7ecfa5b
  6. Jan 21, 2016
    • Reid Kleckner's avatar
      Avoid unnecessary stack realignment in musttail thunks with SSE2 enabled · 18ec96f0
      Reid Kleckner authored
      The X86 musttail implementation finds register parameters to forward by
      running the calling convention algorithm until a non-register location
      is returned. However, assigning a vector memory location has the side
      effect of increasing the function's stack alignment. We shouldn't
      increase the stack alignment when we are only looking for register
      parameters, so this change conditionalizes it.
      
      llvm-svn: 258442
      18ec96f0
    • Simon Pilgrim's avatar
      [X86][SSE] Improve i16 splatting shuffles · 5ba1c127
      Simon Pilgrim authored
      Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.
      
      Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.
      
      Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.
      
      Differential Revision: http://reviews.llvm.org/D14901
      
      llvm-svn: 258440
      5ba1c127
Loading