Skip to content
  1. Jul 08, 2013
    • Hal Finkel's avatar
      Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors · 8cb9a0e1
      Hal Finkel authored
      This fixes a bug (found by llvm-stress) in
      DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
      type would always be larger than the original operands. This is not always
      true, however, with boolean vectors. For example, promoting a node of type v8i1
      (where the operands will be of type i32, the type to which i1 is promoted) will
      yield a node with a result vector element type of i16 (and operands of type
      i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
      to the result type.
      
      llvm-svn: 185794
      8cb9a0e1
    • Nico Rieck's avatar
      Revert "Reuse %rax after calling __chkstk on win64" · 43b51056
      Nico Rieck authored
      This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e.
      
      llvm-svn: 185781
      43b51056
  2. Jul 07, 2013
  3. Jul 06, 2013
  4. Jul 05, 2013
    • Arnold Schwaighofer's avatar
      ARM: Add a pack pattern for matching arithmetic shift right · 97c1343c
      Arnold Schwaighofer authored
      llvm-svn: 185714
      97c1343c
    • Arnold Schwaighofer's avatar
      ARM: Fix incorrect pack pattern · 50b76b52
      Arnold Schwaighofer authored
      A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
      in the bottom half of "x". An arithmetic and logic shift are only equivalent in
      this context if the shift amount is 16. We would be shifting in ones into the
      bottom 16bits instead of zeros if "y" is negative.
      
      radar://14338767
      
      llvm-svn: 185712
      50b76b52
    • Richard Sandiford's avatar
      [SystemZ] Remove no-op MVCs · c40f27b5
      Richard Sandiford authored
      The stack coloring pass has code to delete stores and loads that become
      trivially dead after coloring.  Extend it to cope with single instructions
      that copy from one frame index to another.
      
      The testcase happens to show an example of this kicking in at the moment.
      It did occur in Real Code too though.
      
      llvm-svn: 185705
      c40f27b5
    • Richard Sandiford's avatar
      Fix double renaming bug in stack coloring pass · b5d9bd6f
      Richard Sandiford authored
      The stack coloring pass renumbered frame indexes with a loop of the form:
      
        for each frame index FI
          for each instruction I that uses FI
            for each use of FI in I
              rename FI to FI'
      
      This caused problems if an instruction used two frame indexes F0 and F1
      and if F0 was renamed to F1 and F1 to F2.  The first time we visited the
      instruction we changed F0 to F1, then we changed both F1s to F2.
      
      In other words, the problem was that SSRefs recorded which instructions
      used an FI, but not which MachineOperands and MachineMemOperands within
      that instruction used it.
      
      This is easily fixed for MachineOperands by walking the instructions
      once and processing each operand in turn.  There's already a loop to
      do that for dead store elimination, so it seemed more efficient to
      fuse the two at the block level.
      
      MachineMemOperands are more tricky because they can be shared between
      instructions.  The patch handles them by making SSRefs an array of
      MachineMemOperands rather than an array of MachineInstrs.  We might end
      up processing the same MachineMemOperand twice, but that's OK because
      we always know from the SSRefs index what the original frame index was.
      
      llvm-svn: 185703
      b5d9bd6f
    • Richard Sandiford's avatar
      [SystemZ] Enable the use of MVC for frame-to-frame spills · 8976ea72
      Richard Sandiford authored
      ...now that the problem that prompted the restriction has been fixed.
      
      The original spill-02.py was a compromise because at the time I couldn't
      find an example that actually failed without the two scavenging slots.
      The version included here did.
      
      llvm-svn: 185701
      8976ea72
    • Richard Sandiford's avatar
      [SystemZ] Allocate a second register scavenging slot · 23943229
      Richard Sandiford authored
      This is another prerequisite for frame-to-frame MVC copies.
      I'll commit the patch that makes use of the slot separately.
      
      The downside of trying to test many corner cases with each of the
      available addressing modes is that a fair few tests need to account
      for the new frame layout.  I do still think it's useful to have all
      these tests though, since it's something that wouldn't get much coverage
      otherwise.
      
      llvm-svn: 185698
      23943229
    • Joey Gouly's avatar
      PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm. · 606f3fbc
      Joey Gouly authored
      In the SelectionDAG immediate operands to inline asm are constructed as
      two separate operands. The first is a constant of value InlineAsm::Kind_Imm
      and the second is a constant with the value of the immediate.
      
      In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
      should skip over the next operand too.
      
      llvm-svn: 185688
      606f3fbc
  5. Jul 03, 2013
    • Quentin Colombet's avatar
      [ARM] Improve the instruction selection of vector loads. · 04b3a0fd
      Quentin Colombet authored
      In the ARM back-end, build_vector nodes are lowered to a target specific
      build_vector that uses floating point type. 
      This works well, unless the inserted bitcasts survive until instruction
      selection. In that case, they incur moves between integer unit and floating
      point unit that may result in inefficient code.
      
      In other words, this conversion may introduce artificial dependencies when the
      code leading to the build vector cannot be completed with a floating point type.
      
      In particular, this happens when loads are not aligned.
      
      Before this patch, in that case, the compiler generates general purpose loads
      and creates the floating point vector from them, instead of directly using the
      vector unit.
      
      The patch uses a vector friendly sequence of code when the inserted bitcasts to
      floating point survived DAGCombine.
      
      This is done by a target specific DAGCombine that changes the target specific
      build_vector into a sequence of insert_vector_elt that get rid of the bitcasts.
      
      <rdar://problem/14170854>
      
      llvm-svn: 185587
      04b3a0fd
    • Ulrich Weigand's avatar
      · 49f487e6
      Ulrich Weigand authored
      [PowerPC] Use mtocrf when available
      
      Just as with mfocrf, it is also preferable to use mtocrf instead of
      mtcrf when only a single CR register is to be written.
      
      Current code however always emits mtcrf.  This probably does not matter
      when using an external assembler, since the GNU assembler will in fact
      automatically replace mtcrf with mtocrf when possible.  It does create
      inefficient code with the integrated assembler, however.
      
      To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
      uses those instead of MTCRF/MTCRF8 everything.  Just as done in the
      MFOCRF patch committed as 185556, these patterns will be converted
      back to MTCRF if MTOCRF is not available on the machine.
      
      As a side effect, this allows to modify the MTCRF pattern to accept
      the full range of mask operands for the benefit of the asm parser.
      
      llvm-svn: 185561
      49f487e6
    • Rafael Espindola's avatar
      b0fccb22
    • Rafael Espindola's avatar
      Remove another old test. · 8490bbd1
      Rafael Espindola authored
      It was only passing because 'grep andpd' was not finding any andpd, but
      we don't fail if part of a pipe fails.
      
      llvm-svn: 185552
      8490bbd1
    • Rafael Espindola's avatar
      Remove test for the old EH system. It doesn't parse anymore. · 447dbc38
      Rafael Espindola authored
      llvm-svn: 185551
      447dbc38
    • Richard Sandiford's avatar
      [SystemZ] Fold more spills · ed1fab6b
      Richard Sandiford authored
      Add a mapping from register-based <INSN>R instructions to the corresponding
      memory-based <INSN>.  Use it to cut down on the number of spill loads.
      
      Some instructions extend their operands from smaller fields, so this
      required a new TSFlags field to say how big the unextended operand is.
      
      This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
      we always combine those instructions with a branch.  Adding a test for every
      other case probably seems excessive, but it did catch a missed optimisation
      for DSGF (fixed in r185435).
      
      llvm-svn: 185529
      ed1fab6b
    • Tim Northover's avatar
      ARM: relax the atomic release barrier to "dmb ishst" on Swift · 36b2417f
      Tim Northover authored
      Swift cores implement store barriers that are stronger than the ARM
      specification but weaker than general barriers. They are, in fact, just about
      enough to provide the ordering needed for atomic operations with release
      semantics.
      
      This patch makes use of that quirk.
      
      llvm-svn: 185527
      36b2417f
    • Richard Osborne's avatar
      [XCore] Add ISel pattern for LDWCP · a1cff61d
      Richard Osborne authored
      Patch by Robert Lytton.
      
      llvm-svn: 185518
      a1cff61d
  6. Jul 02, 2013
    • Ulrich Weigand's avatar
      · 40509956
      Ulrich Weigand authored
      [PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD
      
      The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
      correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
      This causes some confusion with the asm parser, since VK_PPC_TLSGD
      is output as @tlsgd, which is then read back in as VK_TLSGD.
      
      To avoid this confusion, this patch removes the PowerPC-specific
      modifiers and uses the generic modifiers throughout.  (The only
      drawback is that the generic modifiers are printed in upper case
      while the usual convention on PowerPC is to use lower-case modifiers.
      But this is just a cosmetic issue.)
      
      llvm-svn: 185476
      40509956
    • Richard Sandiford's avatar
      [SystemZ] Use DSGFR over DSGR in more cases · e6e78855
      Richard Sandiford authored
      Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
      and (sdiv i64, i32).
      
      The "32" in "SDIVREM32" just refers to the second operand.  The first operand
      of all *DIVREM*s is a GR128.
      
      llvm-svn: 185435
      e6e78855
    • Richard Sandiford's avatar
      [SystemZ] Use MVC to spill loads and stores · f6bae1e4
      Richard Sandiford authored
      Try to use MVC when spilling the destination of a simple load or the source
      of a simple store.  As explained in the comment, this doesn't yet handle
      the case where the load or store location is also a frame index, since
      that could lead to two simultaneous scavenger spills, something the
      backend can't handle yet.  spill-02.py tests that this restriction kicks in,
      but unfortunately I've not yet found a case that would fail without it.
      The volatile trick I used for other scavenger tests doesn't work here
      because we can't use MVC for volatile accesses anyway.
      
      I'm planning on relaxing the restriction later, hopefully with a test
      that does trigger the problem...
      
      Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
      classified as SimpleBDX{Load,Store}.  It wouldn't be easy to test for
      that bug separately, which is why I didn't split out the fix as a
      separate patch.
      
      llvm-svn: 185434
      f6bae1e4
    • Richard Osborne's avatar
      [XCore] Fix instruction selection for zext, mkmsk instructions. · e4cc9868
      Richard Osborne authored
      r182680 replaced CountLeadingZeros_32 with a template function
      countLeadingZeros that relies on using the correct argument type to give
      the right result. The type passed in the XCore backend after this
      revision was incorrect in a couple of places.
      
      Patch by Robert Lytton.
      
      llvm-svn: 185430
      e4cc9868
    • Tim Northover's avatar
      DAGCombiner: fix use-counting issue when forming zextload · 6823900e
      Tim Northover authored
      DAGCombiner was counting all uses of a load node  when considering whether it's
      worth combining into a zextload. Really, it wants to ignore the chain and just
      count real uses.
      
      rdar://problem/13896307
      
      llvm-svn: 185419
      6823900e
    • Hal Finkel's avatar
      Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling · 52727c6b
      Hal Finkel authored
      There are a couple of (small) related changes here:
      
      1. The printed name of the VRSAVE register has been changed from VRsave to
      vrsave in order to match the name accepted by GNU binutils.
      
      2. Support for parsing vrsave has been added to the asm parser (it seems that
      there was no test case specifically covering this code, so I've added one).
      
      3. The list of Altivec registers, which was common to all calling conventions,
      has been separated out. This allows us to define the base CSR lists, and then
      lists for each ABI with Altivec included. This allows SjLj, for example, to
      work correctly on non-Altivec targets without using unnatural definitions of
      the NoRegs CSR list.
      
      4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
      registers are reserved when Altivec is disabled.
      
      With these changes, it is now possible to compile a function containing
      __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
      work previously because GNU binutils assumes that all .cfi_offset offsets will
      be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
      offset). This is not true for the vrsave register, however, because this
      register is used only on Darwin, GCC does not bother printing a .cfi_offset
      entry for it (even though there is a slot in the stack frame for it as
      specified by the ABI). This change allows us to do the same: we will also not
      print .cfi_offset directives for vrsave.
      
      llvm-svn: 185409
      52727c6b
  7. Jul 01, 2013
    • Bill Schmidt's avatar
      Index: test/CodeGen/PowerPC/reloc-align.ll · 48fc20a0
      Bill Schmidt authored
      ===================================================================
      --- test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
      +++ test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
      @@ -0,0 +1,34 @@
      +; RUN: llc -mcpu=pwr7 -O1 < %s | FileCheck %s
      +
      +; This test verifies that the peephole optimization of address accesses
      +; does not produce a load or store with a relocation that can't be
      +; satisfied for a given instruction encoding.  Reduced from a test supplied
      +; by Hal Finkel.
      +
      +target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
      +target triple = "powerpc64-unknown-linux-gnu"
      +
      +%struct.S1 = type { [8 x i8] }
      +
      +@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1
      +
      +; Function Attrs: nounwind readonly
      +define signext i32 @main() #0 {
      +entry:
      +  %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1*))
      +; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l
      +  ret i32 %call
      +}
      +
      +; Function Attrs: nounwind readonly
      +define internal fastcc signext i32 @func_90(%struct.S1* byval nocapture %p_91) #0 {
      +entry:
      +  %0 = bitcast %struct.S1* %p_91 to i64*
      +  %bf.load = load i64* %0, align 1
      +  %bf.shl = shl i64 %bf.load, 26
      +  %bf.ashr = ashr i64 %bf.shl, 54
      +  %bf.cast = trunc i64 %bf.ashr to i32
      +  ret i32 %bf.cast
      +}
      +
      +attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
      Index: lib/Target/PowerPC/PPCAsmPrinter.cpp
      ===================================================================
      --- lib/Target/PowerPC/PPCAsmPrinter.cpp	(revision 185327)
      +++ lib/Target/PowerPC/PPCAsmPrinter.cpp	(working copy)
      @@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI
             OutStreamer.EmitRawText(StringRef("\tmsync"));
             return;
           }
      +    break;
      +  case PPC::LD:
      +  case PPC::STD:
      +  case PPC::LWA: {
      +    // Verify alignment is legal, so we don't create relocations
      +    // that can't be supported.
      +    // FIXME:  This test is currently disabled for Darwin.  The test
      +    // suite shows a handful of test cases that fail this check for
      +    // Darwin.  Those need to be investigated before this sanity test
      +    // can be enabled for those subtargets.
      +    if (!Subtarget.isDarwin()) {
      +      unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
      +      const MachineOperand &MO = MI->getOperand(OpNum);
      +      if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4)
      +        llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
      +    }
      +    // Now process the instruction normally.
      +    break;
         }
      +  }
       
         LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
         OutStreamer.EmitInstruction(TmpInst);
      Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp
      ===================================================================
      --- lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(revision 185327)
      +++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(working copy)
      @@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() {
             if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
               SDLoc dl(GA);
               const GlobalValue *GV = GA->getGlobal();
      +        // We can't perform this optimization for data whose alignment
      +        // is insufficient for the instruction encoding.
      +        if (GV->getAlignment() < 4 &&
      +            (StorageOpcode == PPC::LD || StorageOpcode == PPC::STD ||
      +             StorageOpcode == PPC::LWA)) {
      +          DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
      +          continue;
      +        }
               ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags);
             } else if (ConstantPoolSDNode *CP =
                        dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {
      
      llvm-svn: 185380
      48fc20a0
    • Akira Hatanaka's avatar
      8b5b1e07
    • Anton Korobeynikov's avatar
      Really fix the test. Sorry for the breakage... · ba8f4c5e
      Anton Korobeynikov authored
      llvm-svn: 185369
      ba8f4c5e
    • Anton Korobeynikov's avatar
      Fix the test which relies on uncommitted change · 02678370
      Anton Korobeynikov authored
      llvm-svn: 185368
      02678370
    • Anton Korobeynikov's avatar
      Add jump tables handling for MSP430. · 82bedb1f
      Anton Korobeynikov authored
      Patch by Job Noorman!
      
      llvm-svn: 185364
      82bedb1f
    • Cameron Zwarich's avatar
      Fix PR16508. · 867bfcd5
      Cameron Zwarich authored
      When phis get lowered, destination copies are inserted using an iterator that is
      determined once for all phis in the block, which BuildMI interprets as a request
      to insert an instruction directly before the iterator. In the case of a cyclic
      phi, source copies may also be inserted directly before this iterator, which can
      cause source copies to be inserted before destination copies. The fix is to keep
      an iterator to the last phi and then advance it while lowering each phi in order
      to insert destination copies directly after the phis.
      
      llvm-svn: 185363
      867bfcd5
    • Hal Finkel's avatar
      Don't form PPC CTR loops for over-sized exit counts · 25e4a0d4
      Hal Finkel authored
      Although you can't generate this from C on PPC64, if you have a loop using a
      64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had
      been cauing the PPCCTRLoops pass to assert.
      
      Thanks to Joerg Sonnenberger for providing a test case!
      
      llvm-svn: 185361
      25e4a0d4
    • Tim Northover's avatar
      AArch64: correct CodeGen of MOVZ/MOVK combinations. · 8625fd8c
      Tim Northover authored
      According to the AArch64 ELF specification (4.6.8), it's the
      assembler's responsibility to make sure the shift amount is correct in
      relocated MOVZ/MOVK instructions.
      
      This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF
      (which happened to work out well for JIT tests). This commit should
      make us compliant in this area.
      
      llvm-svn: 185360
      8625fd8c
    • Tim Northover's avatar
      Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst") · 7f3d9e1f
      Tim Northover authored
      Turns out I'd misread the architecture reference manual and thought
      that was a load/store-store barrier, when it's not.
      
      Thanks for pointing it out Eli!
      
      llvm-svn: 185356
      7f3d9e1f
    • Tim Northover's avatar
      ARM: relax the atomic release barrier to "dmb ishst" · 953abab4
      Tim Northover authored
      I believe the full "dmb ish" barrier is not required to guarantee release
      semantics for atomic operations. The weaker "dmb ishst" prevents previous
      operations being reordered with a store executed afterwards, which is enough.
      
      A key point to note (fortunately already correct) is that this barrier alone is
      *insufficient* for sequential consistency, no matter how liberally placed.
      
      llvm-svn: 185339
      953abab4
    • Justin Holewinski's avatar
      [NVPTX] Add support for module-scope inline asm · d2bbdf05
      Justin Holewinski authored
      Since we were explicitly not calling AsmPrinter::doInitialization,
      any module-scope inline asm was not being printed.
      
      llvm-svn: 185336
      d2bbdf05
    • Justin Holewinski's avatar
      [NVPTX] 64-bit ADDC/ADDE are not legal · 51cb1349
      Justin Holewinski authored
      llvm-svn: 185333
      51cb1349
    • Justin Holewinski's avatar
Loading