Skip to content
  1. Jul 08, 2013
    • Hal Finkel's avatar
      Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors · 8cb9a0e1
      Hal Finkel authored
      This fixes a bug (found by llvm-stress) in
      DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result
      type would always be larger than the original operands. This is not always
      true, however, with boolean vectors. For example, promoting a node of type v8i1
      (where the operands will be of type i32, the type to which i1 is promoted) will
      yield a node with a result vector element type of i16 (and operands of type
      i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands
      to the result type.
      
      llvm-svn: 185794
      8cb9a0e1
  2. Jul 03, 2013
    • Ulrich Weigand's avatar
      · 49f487e6
      Ulrich Weigand authored
      [PowerPC] Use mtocrf when available
      
      Just as with mfocrf, it is also preferable to use mtocrf instead of
      mtcrf when only a single CR register is to be written.
      
      Current code however always emits mtcrf.  This probably does not matter
      when using an external assembler, since the GNU assembler will in fact
      automatically replace mtcrf with mtocrf when possible.  It does create
      inefficient code with the integrated assembler, however.
      
      To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and
      uses those instead of MTCRF/MTCRF8 everything.  Just as done in the
      MFOCRF patch committed as 185556, these patterns will be converted
      back to MTCRF if MTOCRF is not available on the machine.
      
      As a side effect, this allows to modify the MTCRF pattern to accept
      the full range of mask operands for the benefit of the asm parser.
      
      llvm-svn: 185561
      49f487e6
  3. Jul 02, 2013
    • Ulrich Weigand's avatar
      · 40509956
      Ulrich Weigand authored
      [PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD
      
      The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
      correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
      This causes some confusion with the asm parser, since VK_PPC_TLSGD
      is output as @tlsgd, which is then read back in as VK_TLSGD.
      
      To avoid this confusion, this patch removes the PowerPC-specific
      modifiers and uses the generic modifiers throughout.  (The only
      drawback is that the generic modifiers are printed in upper case
      while the usual convention on PowerPC is to use lower-case modifiers.
      But this is just a cosmetic issue.)
      
      llvm-svn: 185476
      40509956
    • Hal Finkel's avatar
      Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling · 52727c6b
      Hal Finkel authored
      There are a couple of (small) related changes here:
      
      1. The printed name of the VRSAVE register has been changed from VRsave to
      vrsave in order to match the name accepted by GNU binutils.
      
      2. Support for parsing vrsave has been added to the asm parser (it seems that
      there was no test case specifically covering this code, so I've added one).
      
      3. The list of Altivec registers, which was common to all calling conventions,
      has been separated out. This allows us to define the base CSR lists, and then
      lists for each ABI with Altivec included. This allows SjLj, for example, to
      work correctly on non-Altivec targets without using unnatural definitions of
      the NoRegs CSR list.
      
      4. VRSAVE is now always reserved on non-Darwin targets and all Altivec
      registers are reserved when Altivec is disabled.
      
      With these changes, it is now possible to compile a function containing
      __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not
      work previously because GNU binutils assumes that all .cfi_offset offsets will
      be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned
      offset). This is not true for the vrsave register, however, because this
      register is used only on Darwin, GCC does not bother printing a .cfi_offset
      entry for it (even though there is a slot in the stack frame for it as
      specified by the ABI). This change allows us to do the same: we will also not
      print .cfi_offset directives for vrsave.
      
      llvm-svn: 185409
      52727c6b
  4. Jul 01, 2013
    • Bill Schmidt's avatar
      Index: test/CodeGen/PowerPC/reloc-align.ll · 48fc20a0
      Bill Schmidt authored
      ===================================================================
      --- test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
      +++ test/CodeGen/PowerPC/reloc-align.ll	(revision 0)
      @@ -0,0 +1,34 @@
      +; RUN: llc -mcpu=pwr7 -O1 < %s | FileCheck %s
      +
      +; This test verifies that the peephole optimization of address accesses
      +; does not produce a load or store with a relocation that can't be
      +; satisfied for a given instruction encoding.  Reduced from a test supplied
      +; by Hal Finkel.
      +
      +target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
      +target triple = "powerpc64-unknown-linux-gnu"
      +
      +%struct.S1 = type { [8 x i8] }
      +
      +@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1
      +
      +; Function Attrs: nounwind readonly
      +define signext i32 @main() #0 {
      +entry:
      +  %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1*))
      +; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l
      +  ret i32 %call
      +}
      +
      +; Function Attrs: nounwind readonly
      +define internal fastcc signext i32 @func_90(%struct.S1* byval nocapture %p_91) #0 {
      +entry:
      +  %0 = bitcast %struct.S1* %p_91 to i64*
      +  %bf.load = load i64* %0, align 1
      +  %bf.shl = shl i64 %bf.load, 26
      +  %bf.ashr = ashr i64 %bf.shl, 54
      +  %bf.cast = trunc i64 %bf.ashr to i32
      +  ret i32 %bf.cast
      +}
      +
      +attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
      Index: lib/Target/PowerPC/PPCAsmPrinter.cpp
      ===================================================================
      --- lib/Target/PowerPC/PPCAsmPrinter.cpp	(revision 185327)
      +++ lib/Target/PowerPC/PPCAsmPrinter.cpp	(working copy)
      @@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI
             OutStreamer.EmitRawText(StringRef("\tmsync"));
             return;
           }
      +    break;
      +  case PPC::LD:
      +  case PPC::STD:
      +  case PPC::LWA: {
      +    // Verify alignment is legal, so we don't create relocations
      +    // that can't be supported.
      +    // FIXME:  This test is currently disabled for Darwin.  The test
      +    // suite shows a handful of test cases that fail this check for
      +    // Darwin.  Those need to be investigated before this sanity test
      +    // can be enabled for those subtargets.
      +    if (!Subtarget.isDarwin()) {
      +      unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1;
      +      const MachineOperand &MO = MI->getOperand(OpNum);
      +      if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4)
      +        llvm_unreachable("Global must be word-aligned for LD, STD, LWA!");
      +    }
      +    // Now process the instruction normally.
      +    break;
         }
      +  }
       
         LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
         OutStreamer.EmitInstruction(TmpInst);
      Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp
      ===================================================================
      --- lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(revision 185327)
      +++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp	(working copy)
      @@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() {
             if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
               SDLoc dl(GA);
               const GlobalValue *GV = GA->getGlobal();
      +        // We can't perform this optimization for data whose alignment
      +        // is insufficient for the instruction encoding.
      +        if (GV->getAlignment() < 4 &&
      +            (StorageOpcode == PPC::LD || StorageOpcode == PPC::STD ||
      +             StorageOpcode == PPC::LWA)) {
      +          DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n");
      +          continue;
      +        }
               ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags);
             } else if (ConstantPoolSDNode *CP =
                        dyn_cast<ConstantPoolSDNode>(ImmOpnd)) {
      
      llvm-svn: 185380
      48fc20a0
    • Cameron Zwarich's avatar
      Fix PR16508. · 867bfcd5
      Cameron Zwarich authored
      When phis get lowered, destination copies are inserted using an iterator that is
      determined once for all phis in the block, which BuildMI interprets as a request
      to insert an instruction directly before the iterator. In the case of a cyclic
      phi, source copies may also be inserted directly before this iterator, which can
      cause source copies to be inserted before destination copies. The fix is to keep
      an iterator to the last phi and then advance it while lowering each phi in order
      to insert destination copies directly after the phis.
      
      llvm-svn: 185363
      867bfcd5
    • Hal Finkel's avatar
      Don't form PPC CTR loops for over-sized exit counts · 25e4a0d4
      Hal Finkel authored
      Although you can't generate this from C on PPC64, if you have a loop using a
      64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had
      been cauing the PPCCTRLoops pass to assert.
      
      Thanks to Joerg Sonnenberger for providing a test case!
      
      llvm-svn: 185361
      25e4a0d4
  5. Jun 29, 2013
    • Hal Finkel's avatar
      PPC: Ignore spill/restore requests for VRSAVE (except on Darwin) · ac1a24b5
      Hal Finkel authored
      This fixes PR16418, which reports that a function calling
      __builtin_unwind_init() asserts. The cause is that this generates a
      spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is
      only really used on Darwin).
      
      The test case checks only that we don't crash. We can add correctness checks
      once someone verifies what behavior the function is supposed to have.
      
      llvm-svn: 185235
      ac1a24b5
  6. Jun 28, 2013
    • Hal Finkel's avatar
      Fix CodeGen/PowerPC/stack-protector.ll on OpenBSD · 147c287d
      Hal Finkel authored
      On OpenBSD, the stack-smash protection transform uses "__guard_local"
      and "__stack_smash_handler" instead of "__stack_chk_guard" and
      "__stack_chk_fail".  However, CodeGen/PowerPC/stack-protector.ll
      doesn't specify a target OS, so on OpenBSD it fails.
      
      Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While
      there, convert to FileCheck.
      
      Patch by Matthew Dempsky.
      
      llvm-svn: 185206
      147c287d
    • Hal Finkel's avatar
      Fix a PPC rlwimi instruction-selection bug · 4ca70100
      Hal Finkel authored
      Under certain (evidently rare) circumstances, this code used to convert OR(a,
      AND(x, y)) into OR(a, x). This was incorrect.
      
      While there, I've added a comment to the code immediately above.
      
      llvm-svn: 185201
      4ca70100
  7. Jun 13, 2013
    • Bill Schmidt's avatar
      [PowerPC] Disable fast-isel for existing -O0 tests for PowerPC. · 4a28e827
      Bill Schmidt authored
      This is a preliminary patch for fast instruction selection on
      PowerPC.  Code generation can differ between DAG isel and fast isel.
      Existing tests that specify -O0 were written to expect DAG isel.  Make
      this explicit by adding -fast-isel=false to the tests.
      
      In some cases specifying -fast-isel=false produces different code even
      when there isn't a fast instruction selector specified.  This is
      because TM.Options.EnableFastISel = 1 at -O0 whether or not a FastISel
      object exists.  Thus disabling fast isel can actually produce less
      conservative code.  Because of this, some of the expected code
      generation in the -O0 tests needs to be adjusted.
      
      In particular, handling of function arguments is less conservative
      with -fast-isel=false (see isOnlyUsedInEntryBlock() in
      SelectionDAGBuilder.cpp).  This results in fewer stack accesses and,
      in some cases, reduced stack size as uselessly loaded values are no
      longer stored back to spill locations in the stack.
      
      No functional change with this patch; test case adjustments only.
      
      llvm-svn: 183939
      4a28e827
  8. Jun 08, 2013
    • Hal Finkel's avatar
      Disallow i64 div/rem in PPC32 counter loops · fa5f6f74
      Hal Finkel authored
      On PPC32, [su]div,rem on i64 types are transformed into runtime library
      function calls. As a result, they are not allowed in counter-based loops (the
      counter-loops verification pass caught this error; this change fixes PR16169).
      
      llvm-svn: 183581
      fa5f6f74
  9. May 30, 2013
    • Rafael Espindola's avatar
      Change how we iterate over relocations on ELF. · 4f60a38f
      Rafael Espindola authored
      For COFF and MachO, sections semantically have relocations that apply to them.
      That is not the case on ELF.
      
      In relocatable objects (.o), a section with relocations in ELF has offsets to
      another section where the relocations should be applied.
      
      In dynamic objects and executables, relocations don't have an offset, they have
      a virtual address. The section sh_info may or may not point to another section,
      but that is not actually used for resolving the relocations.
      
      This patch exposes that in the ObjectFile API. It has the following advantages:
      
      * Most (all?) clients can handle this more efficiently. They will normally walk
      all relocations, so doing an effort to iterate in a particular order doesn't
      save time.
      
      * llvm-readobj now prints relocations in the same way the native readelf does.
      
      * probably most important, relocations that don't point to any section are now
      visible. This is the case of relocations in the rela.dyn section. See the
      updated relocation-executable.test for example.
      
      llvm-svn: 182908
      4f60a38f
  10. May 26, 2013
    • Hal Finkel's avatar
      Prefer to duplicate PPC Altivec loads when expanding unaligned loads · 7d8a691b
      Hal Finkel authored
      When expanding unaligned Altivec loads, we use the decremented offset trick to
      prevent page faults. Unfortunately, if we have a sequence of consecutive
      unaligned loads, this leads to suboptimal code generation because the 'extra'
      load from the first unaligned load can be combined with the base load from the
      second (but only if the decremented offset trick is not used for the first).
      Search up and down the chain, through loads and token factors, looking for
      consecutive loads, and if one is found, don't use the offset reduction trick.
      These duplicate loads are later combined to yield the desired sequence (in the
      future, we might want a more-powerful chain search, but that will require some
      changes to allow the combiner routines to access the AA object).
      
      This should complete the initial implementation of the optimized unaligned
      Altivec load expansion. There is some refactoring that should be done, but
      that will happen when the unaligned store expansion is added.
      
      llvm-svn: 182719
      7d8a691b
  11. May 25, 2013
    • Hal Finkel's avatar
      PPC: Combine duplicate (offset) lvsl Altivec intrinsics · bc2ee4c4
      Hal Finkel authored
      The lvsl permutation control instruction is a function only of the alignment of
      the pointer operand (relative to the 16-byte natural alignment of Altivec
      vectors). As a result, multiple lvsl intrinsics where the operands differ by a
      multiple of 16 can be combined.
      
      llvm-svn: 182708
      bc2ee4c4
    • Hal Finkel's avatar
      PPC: Initial support for permutation-based unaligned Altivec loads · cf2e9080
      Hal Finkel authored
      Altivec only directly supports aligned loads, but the loads have a strange
      property: If given an unaligned address, they truncate the address to the next
      lower aligned address, and load from there.  This property, along with an extra
      load and some special-purpose permutation-control instructions that generate
      the appropriate permutations from the original unaligned address, allow
      efficient lowering of aligned loads. This code uses the trick explained in the
      Apple Velocity Engine optimization overview document to prevent the needed
      extra load from possibly causing a page fault if the original address happens
      to be aligned.
      
      As noted in the FIXMEs, there are several additional optimizations that can be
      performed to reduce the cost of these loads even more. These will be
      implemented in future commits.
      
      llvm-svn: 182691
      cf2e9080
  12. May 18, 2013
    • Hal Finkel's avatar
      Check InlineAsm clobbers in PPCCTRLoops · 2f474f0e
      Hal Finkel authored
      We don't need to reject all inline asm as using the counter register (most does
      not). Only those that explicitly clobber the counter register need to prevent
      the transformation.
      
      llvm-svn: 182191
      2f474f0e
  13. May 16, 2013
    • Hal Finkel's avatar
      Fix cpu on test CodeGen/PowerPC/ctrloop-fp64.ll · 778c73c5
      Hal Finkel authored
      We need ppc instead of generic to override native features on ppc machines.
      
      llvm-svn: 182049
      778c73c5
    • Hal Finkel's avatar
      Create an new preheader in PPCCTRLoops to avoid counter register clobbers · 5f587c59
      Hal Finkel authored
      Some IR-level instructions (such as FP <-> i64 conversions) are not chained
      w.r.t. the mtctr intrinsic and yet may become function calls that clobber the
      counter register. At the selection-DAG level, these might be reordered with the
      mtctr intrinsic causing miscompiles. To avoid this situation, if an existing
      preheader has instructions that might use the counter register, create a new
      preheader for the mtctr intrinsic. This extra block will be remerged with the
      old preheader at the MI level, but will prevent unwanted reordering at the
      selection-DAG level.
      
      llvm-svn: 182045
      5f587c59
    • Ulrich Weigand's avatar
      · 9d980cbd
      Ulrich Weigand authored
      [PowerPC] Use true offset value in "memrix" machine operands
      
      This is the second part of the change to always return "true"
      offset values from getPreIndexedAddressParts, tackling the
      case of "memrix" type operands.
      
      This is about instructions like LD/STD that only have a 14-bit
      field to encode immediate offsets, which are implicitly extended
      by two zero bits by the machine, so that in effect we can access
      16-bit offsets as long as they are a multiple of 4.
      
      The PowerPC back end currently handles such instructions by
      carrying the 14-bit value (as it will get encoded into the
      actual machine instructions) in the machine operand fields
      for such instructions.  This means that those values are
      in fact not the true offset, but rather the offset divided
      by 4 (and then truncated to an unsigned 14-bit value).
      
      Like in the case fixed in r182012, this makes common code
      operations on such offset values not work as expected.
      Furthermore, there doesn't really appear to be any strong
      reason why we should encode machine operands this way.
      
      This patch therefore changes the encoding of "memrix" type
      machine operands to simply contain the "true" offset value
      as a signed immediate value, while enforcing the rules that
      it must fit in a 16-bit signed value and must also be a
      multiple of 4.
      
      This change must be made simultaneously in all places that
      access machine operands of this type.  However, just about
      all those changes make the code simpler; in many cases we
      can now just share the same code for memri and memrix
      operands.
      
      llvm-svn: 182032
      9d980cbd
    • Hal Finkel's avatar
      PPC32 cannot form counter loops around i64 FP conversions · 47db66d4
      Hal Finkel authored
      On PPC32, i64 FP conversions are implemented using runtime calls (which clobber
      the counter register). These must be excluded.
      
      llvm-svn: 182023
      47db66d4
    • Bill Schmidt's avatar
      Use new CHECK-DAG support to stabilize CodeGen/PowerPC/recipest.ll · 22f91919
      Bill Schmidt authored
      While testing some experimental code to add vector-scalar registers to
      PowerPC, I noticed that a couple of independent instructions were
      flipped by the scheduler.  The new CHECK-DAG support is perfect for
      avoiding this problem.
      
      llvm-svn: 182020
      22f91919
    • Ulrich Weigand's avatar
      · 7aa76b6a
      Ulrich Weigand authored
      [PowerPC] Report true displacement value from getPreIndexedAddressParts
      
      DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to
      decompose a memory address into a base/offset pair.  It expects the
      offset (if constant) to be the true displacement value in order to
      perform optional additional optimizations; in particular, to convert
      other uses of the original pointer into uses of the new base pointer
      after pre-increment.
      
      The PowerPC implementation of getPreIndexedAddressParts, however,
      simply calls SelectAddressRegImm, which returns a TargetConstant.
      This value is appropriate for encoding into the instruction, but
      it is not always usable as true displacement value:
      
      - Its type is always MVT::i32, even on 64-bit, where addresses
        ought to be i64 ... this causes the optimization to simply
        always fail on 64-bit due to this line in DAGCombiner:
      
            // FIXME: In some cases, we can be smarter about this.
            if (Op1.getValueType() != Offset.getValueType()) {
      
      - Its value is truncated to an unsigned 16-bit value if negative.
        This causes the above opimization to generate wrong code.
      
      This patch fixes both problems by simply returning the true
      displacement value (in its original type).  This doesn't
      affect any other user of the displacement.
      
      llvm-svn: 182012
      7aa76b6a
    • Rafael Espindola's avatar
      Extend test to check the .cfi instructions. · c533b559
      Rafael Espindola authored
      I am about to refactor the calls to addFrameMove and some of the ppc
      ones were not being tested.
      
      llvm-svn: 182009
      c533b559
    • Rafael Espindola's avatar
      Extend test for better coverage. · a5c7ceed
      Rafael Espindola authored
      Without this change nothing was covering this addFrameMove:
      
      // For 64-bit SVR4 when we have spilled CRs, the spill location
      // is SP+8, not a frame-relative slot.
      if (Subtarget.isSVR4ABI()
          && Subtarget.isPPC64()
          && (PPC::CR2 <= Reg && Reg <= PPC::CR4)) {
        MachineLocation CSDst(PPC::X1, 8);
        MachineLocation CSSrc(PPC::CR2);
        MMI.addFrameMove(Label, CSDst, CSSrc);
        continue;
      }
      
      llvm-svn: 181976
      a5c7ceed
  14. May 15, 2013
    • Hal Finkel's avatar
      Implement PPC counter loops as a late IR-level pass · 25c1992b
      Hal Finkel authored
      The old PPCCTRLoops pass, like the Hexagon pass version from which it was
      derived, could only handle some simple loops in canonical form. We cannot
      directly adapt the new Hexagon hardware loops pass, however, because the
      Hexagon pass contains a fundamental assumption that non-constant-trip-count
      loops will contain a guard, and this is not always true (the result being that
      incorrect negative counts can be generated). With this commit, we replace the
      pass with a late IR-level pass which makes use of SE to calculate the
      backedge-taken counts and safely generate the loop-count expressions (including
      any necessary max() parts). This IR level pass inserts custom intrinsics that
      are lowered into the desired decrement-and-branch instructions.
      
      The most fragile part of this new implementation is that interfering uses of
      the counter register must be detected on the IR level (and, on PPC, this also
      includes any indirect branches in addition to function calls). Also, to make
      all of this work, we need a variant of the mtctr instruction that is marked
      as having side effects. Without this, machine-code level CSE, DCE, etc.
      illegally transform the resulting code. Hopefully, this can be improved
      in the future.
      
      This new pass is smaller than the original (and much smaller than the new
      Hexagon hardware loops pass), and can handle many additional cases correctly.
      In addition, the preheader-creation code has been copied from LoopSimplify, and
      after we decide on where it belongs, this code will be refactored so that it
      can be explicitly shared (making this implementation even smaller).
      
      The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
      the new Hexagon pass. There are a few classes of loops that this pass does not
      transform (noted by FIXMEs in the files), but these deficiencies can be
      addressed within the SE infrastructure (thus helping many other passes as well).
      
      llvm-svn: 181927
      25c1992b
  15. May 14, 2013
    • Bill Schmidt's avatar
      PPC32: Fix stack collision between FP and CR save areas. · ef3d1a24
      Bill Schmidt authored
      The changes to CR spill handling missed a case for 32-bit PowerPC.
      The code in PPCFrameLowering::processFunctionBeforeFrameFinalized()
      checks whether CR spill has occurred using a flag in the function
      info.  This flag is only set by storeRegToStackSlot and
      loadRegFromStackSlot.  spillCalleeSavedRegisters does not call
      storeRegToStackSlot, but instead produces MI directly.  Thus we don't
      see the CR is spilled when assigning frame offsets, and the CR spill
      ends up colliding with some other location (generally the FP slot).
      
      This patch sets the flag in spillCalleeSavedRegisters for PPC32 so
      that the CR spill is properly detected and gets its own slot in the
      stack frame.
      
      llvm-svn: 181800
      ef3d1a24
  16. May 13, 2013
    • Bill Schmidt's avatar
      PPC64: Constant initializers with dynamic relocations go in .data.rel.ro. · 22d40dcf
      Bill Schmidt authored
      This fixes warning messages observed in the oggenc application test in
      projects/test-suite.  Special handling is needed for the 64-bit
      PowerPC SVR4 ABI when a constant is initialized with a pointer to a
      function in a shared library.  Because a function address is
      implemented as the address of a function descriptor, the use of copy
      relocations can lead to problems with initialization.  GNU ld
      therefore replaces copy relocations with dynamic relocations to be
      resolved by the dynamic linker.  This means the constant cannot reside
      in the read-only data section, but instead belongs in .data.rel.ro,
      which is designed for constants containing dynamic relocations.
      
      The implementation creates a class PPC64LinuxTargetObjectFile
      inheriting from TargetLoweringObjectFileELF, which behaves like its
      parent except to place constants of this sort into .data.rel.ro.
      
      The test case is reduced from the oggenc application.
      
      llvm-svn: 181723
      22d40dcf
  17. May 08, 2013
  18. Apr 30, 2013
    • Hal Finkel's avatar
      LocalStackSlotAllocation improvements · 7153251a
      Hal Finkel authored
      First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created.
      
      Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway.
      
      Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll
      
      Jim has okayed this off-list.
      
      llvm-svn: 180799
      7153251a
    • Manman Ren's avatar
      TBAA: remove !tbaa from testing cases if not used. · 1a5ff287
      Manman Ren authored
      This will make it easier to turn on struct-path aware TBAA since the metadata
      format will change.
      
      llvm-svn: 180796
      1a5ff287
  19. Apr 27, 2013
  20. Apr 20, 2013
    • Hal Finkel's avatar
      Fix PPC optimizeCompareInstr swapped-sub argument handling · e632239d
      Hal Finkel authored
      When matching a compare with a subtract where the arguments of the compare are
      swapped w.r.t. the arguments of the subtract, we need to negate the predicates
      (or CR bit indices) of the users. This, however, is not the same as inverting
      the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
      backend seems to do this correctly, but when I adapted the code for the PPC
      backend, I introduced an error in this logic.
      
      Comparison optimization is now enabled again by default.
      
      llvm-svn: 179899
      e632239d
  21. Apr 19, 2013
    • Hal Finkel's avatar
      Disable PPC comparison optimization by default · b12da6be
      Hal Finkel authored
      This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do
      I'm disabling this for now.
      
      llvm-svn: 179807
      b12da6be
    • Hal Finkel's avatar
      Implement optimizeCompareInstr for PPC · 82656cb2
      Hal Finkel authored
      Many PPC instructions have a so-called 'record form' which stores to a specific
      condition register the result of comparing the result of the instruction with
      zero (always as a signed comparison). For integer operations on PPC64, this is
      always a 64-bit comparison.
      
      This implementation is derived from the implementation in the ARM backend;
      there are some differences because PPC condition registers are allocatable
      virtual registers (although the record forms always use a specific one), and we
      look for a matching subtraction instruction after the compare (but before the
      first use) in addition to before it.
      
      llvm-svn: 179802
      82656cb2
  22. Apr 15, 2013
    • Hal Finkel's avatar
      Fix PPC64 CR spill location for callee-saved registers · 6736988a
      Hal Finkel authored
      This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition
      registers, the spill location is specified relative to the stack pointer (SP +
      8). However, this is not relative to the SP after the new stack frame is
      established, but instead relative to the caller's stack pointer (it is stored
      into the linkage area of the parent's stack frame).
      
      So, like with the link register, we don't directly spill the CRs with other
      callee-saved registers, but just mark them to be spilled during prologue
      generation.
      
      In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32).
      
      llvm-svn: 179500
      6736988a
  23. Apr 13, 2013
    • Hal Finkel's avatar
      Spill and restore PPC CR registers using the FP when we have one · d85a04b3
      Hal Finkel authored
      For functions that need to spill CRs, and have dynamic stack allocations, the
      value of the SP during the restore is not what it was during the save, and so
      we need to use the FP in these cases (as for all of the other spills and
      restores, but the CR restore has a special code path because its reserved slot,
      like the link register, is specified directly relative to the adjusted SP).
      
      llvm-svn: 179457
      d85a04b3
  24. Apr 12, 2013
  25. Apr 11, 2013
Loading