Skip to content
  1. Aug 04, 2014
    • Chandler Carruth's avatar
      [x86] Implement more aggressive use of PACKUS chains for lowering common · 06e6f1ca
      Chandler Carruth authored
      patterns of v16i8 shuffles.
      
      This implements one of the more important FIXMEs for the SSE2 support in
      the new shuffle lowering. We now generate the optimal shuffle sequence
      for truncate-derived shuffles which show up essentially everywhere.
      
      Unfortunately, this exposes a weakness in other parts of the shuffle
      logic -- we can no longer form PSHUFB here. I'll add the necessary
      support for that and other things in a subsequent commit.
      
      llvm-svn: 214702
      06e6f1ca
    • Kevin Qin's avatar
      Revert "r214669 - MachineCombiner Pass for selecting faster instruction" · f31ecf3f
      Kevin Qin authored
      This commit broke "make check" for several hours, so get it reverted.
      
      llvm-svn: 214697
      f31ecf3f
    • Chandler Carruth's avatar
      [x86] Handle single input shuffles in the SSSE3 case more intelligently. · 37a18821
      Chandler Carruth authored
      I spent some time looking into a better or more principled way to handle
      this. For example, by detecting arbitrary "unneeded" ORs... But really,
      there wasn't any point. We just shouldn't build blatantly wrong code so
      late in the pipeline rather than adding more stages and logic later on
      to fix it. Avoiding this is just too simple.
      
      llvm-svn: 214680
      37a18821
    • Chandler Carruth's avatar
      [x86] Fix the test case added in r214670 and tweaked in r214674 further. · 7bbfd245
      Chandler Carruth authored
      Fundamentally, there isn't a really portable way to test the constant
      pool contents. Instead, pin this test to the bare-metal triple. This
      also makes it a 64-bit triple which allows us to only match a single
      constant pool rather than two. It can also just hard code the '.' prefix
      as the format should be stable now that it has a fixed triple. Finally,
      I've switched it to use CHECK-NEXT to be more precise in the instruction
      sequence expected and to use variables rather than hard coding decisions
      by the register allocator.
      
      llvm-svn: 214679
      7bbfd245
    • Sanjay Patel's avatar
      Account for possible leading '.' in label string. · 065cabf4
      Sanjay Patel authored
      llvm-svn: 214674
      065cabf4
    • Sanjay Patel's avatar
      fix for PR20354 - Miscompile of fabs due to vectorization · 2ef67440
      Sanjay Patel authored
      This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.
      
      This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.
      
      There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.
      
      llvm-svn: 214670
      2ef67440
    • Gerolf Hoflehner's avatar
      MachineCombiner Pass for selecting faster instruction · 35ba4671
      Gerolf Hoflehner authored
       sequence -  AArch64 target support
      
       This patch turns off madd/msub generation in the DAGCombiner and generates
       them in the MachineCombiner instead. It replaces the original code sequence
       with the combined sequence when it is beneficial to do so.
      
       When there is no machine model support it always generates the madd/msub
       instruction. This is true also when the objective is to optimize for code
       size: when the combined sequence is shorter is always chosen and does not
       get evaluated.
      
       When there is a machine model the combined instruction sequence
       is evaluated for critical path and resource length using machine
       trace metrics and the original code sequence is replaced when it is
       determined to be faster.
      
       rdar://16319955
      
      llvm-svn: 214669
      35ba4671
  2. Aug 03, 2014
  3. Aug 02, 2014
    • James Molloy's avatar
      6b999ae6
    • James Molloy's avatar
      [AArch64] Teach DAGCombiner that converting two consecutive loads into a... · ce45be04
      James Molloy authored
      [AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available.
      
      The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!
      
      llvm-svn: 214634
      ce45be04
    • Chandler Carruth's avatar
      [x86] Give this test a bare metal triple so it doesn't use the weird · bec57b40
      Chandler Carruth authored
      Darwin x86 asm comment prefix designed to work around GAS on that
      platform. That makes the comment-matching of the test much more stable.
      
      llvm-svn: 214629
      bec57b40
    • Chandler Carruth's avatar
      [x86] Largely complete the use of PSHUFB in the new vector shuffle · 4c57955f
      Chandler Carruth authored
      lowering with a small addition to it and adding PSHUFB combining.
      
      There is one obvious place in the new vector shuffle lowering where we
      should form PSHUFBs directly: when without them we will unpack a vector
      of i8s across two different registers and do a potentially 4-way blend
      as i16s only to re-pack them into i8s afterward. This is the crazy
      expensive fallback path for i8 shuffles and we can just directly use
      pshufb here as it will always be cheaper (the unpack and pack are
      two instructions so even a single shuffle between them hits our
      three instruction limit for forming PSHUFB).
      
      However, this doesn't generate very good code in many cases, and it
      leaves a bunch of common patterns not using PSHUFB. So this patch also
      adds support for extracting a shuffle mask from PSHUFB in the X86
      lowering code, and uses it to handle PSHUFBs in the recursive shuffle
      combining. This allows us to combine through them, combine multiple ones
      together, and generally produce sufficiently high quality code.
      
      Extracting the PSHUFB mask is annoyingly complex because it could be
      either pre-legalization or post-legalization. At least this doesn't have
      to deal with re-materialized constants. =] I've added decode routines to
      handle the different patterns that show up at this level and we dispatch
      through them as appropriate.
      
      The two primary test cases are updated. For the v16 test case there is
      still a lot of room for improvement. Since I was going through it
      systematically I left behind a bunch of FIXME lines that I'm hoping to
      turn into ALL lines by the end of this.
      
      llvm-svn: 214628
      4c57955f
    • Chandler Carruth's avatar
      [x86] Teach the target shuffle mask extraction to recognize unary forms · 34f9a987
      Chandler Carruth authored
      of normally binary shuffle instructions like PUNPCKL and MOVLHPS.
      
      This detects cases where a single register is used for both operands
      making the shuffle behave in a unary way. We detect this and adjust the
      mask to use the unary form which allows the existing DAG combine for
      shuffle instructions to actually work at all.
      
      As a consequence, this uncovered a number of obvious bugs in the
      existing DAG combine which are fixed. It also now canonicalizes several
      shuffles even with the existing lowering. These typically are trying to
      match the shuffle to the domain of the input where before we only really
      modeled them with the floating point variants. All of the cases which
      change to an integer shuffle here have something in the integer domain, so
      there are no more or fewer domain crosses here AFAICT. Technically, it
      might be better to go from a GPR directly to the floating point domain,
      but detecting floating point *outputs* despite integer inputs is a lot
      more code and seems unlikely to be worthwhile in practice. If folks are
      seeing domain-crossing regressions here though, let me know and I can
      hack something up to fix it.
      
      Also as a consequence, a bunch of missed opportunities to form pshufb
      now can be formed. Notably, splats of i8s now form pshufb.
      Interestingly, this improves the existing splat lowering too. We go from
      3 instructions to 1. Yes, we may tie up a register, but it seems very
      likely to be worth it, especially if splatting the 0th byte (the
      common case) as then we can use a zeroed register as the mask.
      
      llvm-svn: 214625
      34f9a987
    • Akira Hatanaka's avatar
      [ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectly · dc08c30d
      Akira Hatanaka authored
      expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
      in pic mode. This patch fixes the bug.
      
      <rdar://problem/17886592>
      
      llvm-svn: 214614
      dc08c30d
    • Matt Arsenault's avatar
      R600: Cleanup fneg tests · 4de32444
      Matt Arsenault authored
      llvm-svn: 214612
      4de32444
    • Chandler Carruth's avatar
      [x86] Make some questionable tests not spew assembly to stdout, which · 063f425e
      Chandler Carruth authored
      makes a mess of the lit output when they ultimately fail.
      
      The 2012-10-02-DAGCycle test is really frustrating because the *only*
      explanation for what it is testing is a rdar link. I would really rather
      that rdar links (which are not public or part of the open source
      project) were not committed to the source code. Regardless, the actual
      problem *must* be described as the rdar link is completely opaque. The
      fact that this test didn't check for any particular output further
      exacerbates the inability of any other developer to debug failures.
      
      The mem-promote-integers test has nice comments and *seems* to be
      a great test for our lowering... except that we don't actually check
      that any of the generated code is correct or matches some pattern. We
      just avoid crashing. It would be great to go back and populate this test
      with the actual expectations.
      
      llvm-svn: 214605
      063f425e
    • Akira Hatanaka's avatar
      [X86] Simplify X87 stackifier pass. · 3516669a
      Akira Hatanaka authored
      Stop using ST registers for function returns and inline-asm instructions and use
      FP registers instead. This allows removing a large amount of code in the
      stackifier pass that was needed to track register liveness and handle copies
      between ST and FP registers and function calls returning floating point values.
      
      It also fixes a bug which manifests when an ST register defined by an
      inline-asm instruction was live across another inline-asm instruction, as shown
      in the following sequence of machine instructions:
      
      1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
      2. INLINEASM <es:fldcw $0>
      3. %FP0<def> = COPY %ST0
      
      <rdar://problem/16952634>
      
      llvm-svn: 214580
      3516669a
    • NAKAMURA Takumi's avatar
      llvm/test/CodeGen/Mips/cconv/arguments-varargs.ll: Add explicit... · 49a53507
      NAKAMURA Takumi authored
      llvm/test/CodeGen/Mips/cconv/arguments-varargs.ll: Add explicit -mtriple=(mips|mipsel)-linux on 4 lines.
      
      llvm-svn: 214578
      49a53507
  4. Aug 01, 2014
  5. Jul 31, 2014
    • Will Schmidt's avatar
      Disable IsSub subregister assert. pr18663. · 44ff8f06
      Will Schmidt authored
      This is a follow-up to the activity in the bug at
      http://llvm.org/bugs/show_bug.cgi?id=18663 .  The underlying issue has
      to do with how the KILL pseudo-instruction is handled.  I defer to
      Hal/Jakob/Uli for additional details and background.
      
      This will disable the (bad?) assert, add an associated fixme comment,
      and add a pair of tests.
      
      The code change and the pr18663-2.ll test are copied from the referenced
      bug.  That test does not immediately fail in my environment, but I have
      added the pr18663.ll test which does.
      
      (Comment from Hal)
      to provide everyone else with some context, this assert was not bad when
      it was written. At that time, we only generated KILL pseudo instructions
      around subregister copies. This logic, unfortunately, had its own problems.
      In r199797, the relevant logic in MachineCopyPropagation was replaced to
      generate KILLs for other kinds of copies too. This change in semantics broke
      this now-problematic assumption in AggressiveAntiDepBreaker. The
      AggressiveAntiDepBreaker really needs a proper cleanup to deal with the
      change, but removing the assert (which just allows the function to return
      false) is a safe conservative behavior, and should do for the time being.
      
      llvm-svn: 214429
      44ff8f06
    • Hal Finkel's avatar
      Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors · 36eff0f8
      Hal Finkel authored
      It seems that when I fixed this, almost exactly a year ago, I did not quite do
      it correctly. When we have duplicate block predecessors, we can indeed not have
      different incoming values for the same block, but we *must* have duplicate
      entries. So, instead of skipping the duplicates, we explicitly add the
      duplicate incoming values.
      
      Fixes PR20442.
      
      llvm-svn: 214423
      36eff0f8
    • Juergen Ributzka's avatar
    • Juergen Ributzka's avatar
      [FastISel][AArch64] Add sqrt intrinsic support. · 130e77e4
      Juergen Ributzka authored
      Fixes <rdar://problem/17867067>.
      
      llvm-svn: 214388
      130e77e4
Loading