Skip to content
  1. May 16, 2013
  2. May 15, 2013
    • Hal Finkel's avatar
      Implement PPC counter loops as a late IR-level pass · 25c1992b
      Hal Finkel authored
      The old PPCCTRLoops pass, like the Hexagon pass version from which it was
      derived, could only handle some simple loops in canonical form. We cannot
      directly adapt the new Hexagon hardware loops pass, however, because the
      Hexagon pass contains a fundamental assumption that non-constant-trip-count
      loops will contain a guard, and this is not always true (the result being that
      incorrect negative counts can be generated). With this commit, we replace the
      pass with a late IR-level pass which makes use of SE to calculate the
      backedge-taken counts and safely generate the loop-count expressions (including
      any necessary max() parts). This IR level pass inserts custom intrinsics that
      are lowered into the desired decrement-and-branch instructions.
      
      The most fragile part of this new implementation is that interfering uses of
      the counter register must be detected on the IR level (and, on PPC, this also
      includes any indirect branches in addition to function calls). Also, to make
      all of this work, we need a variant of the mtctr instruction that is marked
      as having side effects. Without this, machine-code level CSE, DCE, etc.
      illegally transform the resulting code. Hopefully, this can be improved
      in the future.
      
      This new pass is smaller than the original (and much smaller than the new
      Hexagon hardware loops pass), and can handle many additional cases correctly.
      In addition, the preheader-creation code has been copied from LoopSimplify, and
      after we decide on where it belongs, this code will be refactored so that it
      can be explicitly shared (making this implementation even smaller).
      
      The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for
      the new Hexagon pass. There are a few classes of loops that this pass does not
      transform (noted by FIXMEs in the files), but these deficiencies can be
      addressed within the SE infrastructure (thus helping many other passes as well).
      
      llvm-svn: 181927
      25c1992b
    • Hal Finkel's avatar
      Fix legalization of SETCC with promoted integer intrinsics · 1f6a7f53
      Hal Finkel authored
      If the input operands to SETCC are promoted, we need to make sure that we
      either use the promoted form of both operands (or neither); a mixture is not
      allowed. This can happen, for example, if a target has a custom promoted
      i1-returning intrinsic (where i1 is not a legal type). In this case, we need to
      use the promoted form of both operands.
      
      This change only augments the behavior of the existing logic in the case where
      the input types (which may or may not have already been legalized) disagree,
      and should not affect existing target code because this case would otherwise
      cause an assert in the SETCC operand promotion code.
      
      This will be covered by (essentially all of the) tests for the new PPCCTRLoops
      infrastructure.
      
      llvm-svn: 181926
      1f6a7f53
    • Derek Schuff's avatar
      Fix miscompile due to StackColoring incorrectly merging stack slots (PR15707) · d2c42d76
      Derek Schuff authored
      IR optimisation passes can result in a basic block that contains:
      
        llvm.lifetime.start(%buf)
        ...
        llvm.lifetime.end(%buf)
        ...
        llvm.lifetime.start(%buf)
      
      Before this change, calculateLiveIntervals() was ignoring the second
      lifetime.start() and was regarding %buf as being dead from the
      lifetime.end() through to the end of the basic block.  This can cause
      StackColoring to incorrectly merge %buf with another stack slot.
      
      Fix by removing the incorrect Starts[pos].isValid() and
      Finishes[pos].isValid() checks.
      
      Just doing:
            Starts[pos] = Indexes->getMBBStartIdx(MBB);
            Finishes[pos] = Indexes->getMBBEndIdx(MBB);
      unconditionally would be enough to fix the bug, but it causes some
      test failures due to stack slots not being merged when they were
      before.  So, in order to keep the existing tests passing, treat LiveIn
      and LiveOut separately rather than approximating the live ranges by
      merging LiveIn and LiveOut.
      
      This fixes PR15707.
      Patch by Mark Seaborn.
      
      llvm-svn: 181922
      d2c42d76
    • Rafael Espindola's avatar
      Cleanup relocation sorting for ELF. · 0f2a6fe6
      Rafael Espindola authored
      We want the order to be deterministic on all platforms. NAKAMURA Takumi
      fixed that in r181864. This patch is just two small cleanups:
      
      * Move the function to the cpp file. It is only passed to array_pod_sort.
      * Remove the ppc implementation which is now redundant
      
      llvm-svn: 181910
      0f2a6fe6
    • NAKAMURA Takumi's avatar
      PPCISelLowering.h: Escape \@ in comments. [-Wdocumentation] · dc9f013a
      NAKAMURA Takumi authored
      llvm-svn: 181907
      dc9f013a
    • NAKAMURA Takumi's avatar
      Whitespace. · dcc66456
      NAKAMURA Takumi authored
      llvm-svn: 181906
      dcc66456
    • Michael Gottesman's avatar
      [objc-arc] Fixed a spelling error and made the statistic descriptions be... · b4e7f4d8
      Michael Gottesman authored
      [objc-arc] Fixed a spelling error and made the statistic descriptions be consistent about their usage of periods.
      
      llvm-svn: 181901
      b4e7f4d8
    • Derek Schuff's avatar
      Support unaligned load/store on more ARM targets · 72ddaba7
      Derek Schuff authored
      This patch matches GCC behavior: the code used to only allow unaligned
      load/store on ARM for v6+ Darwin, it will now allow unaligned load/store for
      v6+ Darwin as well as for v7+ on other targets.
      
      The distinction is made because v6 doesn't guarantee support (but LLVM assumes
      that Apple controls hardware+kernel and therefore have conformant v6 CPUs),
      whereas v7 does provide this guarantee (and Linux behaves sanely).
      
      Overall this should slightly improve performance in most cases because of
      reduced I$ pressure.
      
      Patch by JF Bastien
      
      llvm-svn: 181897
      72ddaba7
    • Ulrich Weigand's avatar
      · 06840768
      Ulrich Weigand authored
      Remove MCELFObjectTargetWriter::adjustFixupOffset hack
      
      Now that PowerPC no longer uses adjustFixupOffset, and no other
      back-end (ever?) did, we can remove the infrastructure itself
      (incidentally addressing a FIXME to that effect).
      
      llvm-svn: 181895
      06840768
    • Ulrich Weigand's avatar
      · 2fb140ef
      Ulrich Weigand authored
      [PowerPC] Remove need for adjustFixupOffst hack
      
      Now that applyFixup understands differently-sized fixups, we can define
      fixup_ppc_lo16/fixup_ppc_lo16_ds/fixup_ppc_ha16 to properly be 2-byte
      fixups, applied at an offset of 2 relative to the start of the 
      instruction text.
      
      This has the benefit that if we actually need to generate a real
      relocation record, its address will come out correctly automatically,
      without having to fiddle with the offset in adjustFixupOffset.
      
      Tested on both 64-bit and 32-bit PowerPC, using external and
      integrated assembler.
      
      llvm-svn: 181894
      2fb140ef
    • Richard Sandiford's avatar
      [SystemZ] Make use of SUBTRACT HALFWORD · ffd14417
      Richard Sandiford authored
      Thanks to Ulrich Weigand for noticing that this instruction was missing.
      
      llvm-svn: 181893
      ffd14417
    • Ulrich Weigand's avatar
      · 56f5b28d
      Ulrich Weigand authored
      [PowerPC] Correctly handle fixups of other than 4 byte size
      
      The PPCAsmBackend::applyFixup routine handles the case where a
      fixup can be resolved within the same object file.  However,
      this routine is currently hard-coded to assume the size of
      any fixup is always exactly 4 bytes.
      
      This is sort-of correct for fixups on instruction text; even
      though it only works because several of what really would be
      2-byte fixups are presented as 4-byte fixups instead (requiring
      another hack in PPCELFObjectWriter::adjustFixupOffset to clean
      it up).
      
      However, this assumption breaks down completely for fixups
      on data, which legitimately can be of any size (1, 2, 4, or 8).
      
      This patch makes applyFixup aware of fixups of varying sizes,
      introducing a new helper routine getFixupKindNumBytes (along
      the lines of what the ARM back end does).  Note that in order
      to handle fixups of size 8, we also need to fix the return type
      of adjustFixupValue to uint64_t to avoid truncation.
      
      Tested on both 64-bit and 32-bit PowerPC, using external and
      integrated assembler.
      
      llvm-svn: 181891
      56f5b28d
    • Richard Sandiford's avatar
      [SystemZ] Add more future work items to the README · 619859f4
      Richard Sandiford authored
      Based on an analysis by Ulrich Weigand.
      
      llvm-svn: 181882
      619859f4
    • Timur Iskhodzhanov's avatar
      Fix build on Windows · 0588513e
      Timur Iskhodzhanov authored
      llvm-svn: 181873
      0588513e
    • David Blaikie's avatar
      Use only explicit bool conversion operators · 041f1aa3
      David Blaikie authored
      BitVector/SmallBitVector::reference::operator bool remain implicit since
      they model more exactly a bool, rather than something else that can be
      boolean tested.
      
      The most common (non-buggy) case are where such objects are used as
      return expressions in bool-returning functions or as boolean function
      arguments. In those cases I've used (& added if necessary) a named
      function to provide the equivalent (or sometimes negative, depending on
      convenient wording) test.
      
      One behavior change (YAMLParser) was made, though no test case is
      included as I'm not sure how to reach that code path. Essentially any
      comparison of llvm::yaml::document_iterators would be invalid if neither
      iterator was at the end.
      
      This helped uncover a couple of bugs in Clang - test cases provided for
      those in a separate commit along with similar changes to `operator bool`
      instances in Clang.
      
      llvm-svn: 181868
      041f1aa3
    • Arnold Schwaighofer's avatar
      LoopVectorize: Fix comments · 09cee972
      Arnold Schwaighofer authored
      No functionality change.
      
      llvm-svn: 181862
      09cee972
    • Arnold Schwaighofer's avatar
      LoopVectorize: Hoist conditional loads if possible · 2d920477
      Arnold Schwaighofer authored
      InstCombine can be uncooperative to vectorization and sink loads into
      conditional blocks. This prevents vectorization.
      
      Undo this optimization if there are unconditional memory accesses to the same
      addresses in the loop.
      
      radar://13815763
      
      llvm-svn: 181860
      2d920477
    • Jakob Stoklund Olesen's avatar
      Speed up Value::isUsedInBasicBlock() for long use lists. · 0925b24d
      Jakob Stoklund Olesen authored
      This is expanding Ben's original heuristic for short basic blocks to
      also work for longer basic blocks and huge use lists.
      
      Scan the basic block and the use list in parallel, terminating the
      search when the shorter list ends. In almost all cases, either the basic
      block or the use list is short, and the function returns quickly.
      
      In one crazy test case with very long use chains, CodeGenPrepare runs
      400x faster. When compiling ARMDisassembler.cpp it is 5x faster.
      
      <rdar://problem/13840497>
      
      llvm-svn: 181851
      0925b24d
    • Sylvestre Ledru's avatar
      Fix two typo · 149e281a
      Sylvestre Ledru authored
      llvm-svn: 181848
      149e281a
    • Ahmed Bougacha's avatar
      Object: Fix Mach-O relocation printing. · 9dab0cc6
      Ahmed Bougacha authored
      There were two problems that made llvm-objdump -r crash:
      - for non-scattered relocations, the symbol/section index is actually in the
        (aptly named) symbolnum field.
      - sections are 1-indexed.
      
      llvm-svn: 181843
      9dab0cc6
    • Arnold Schwaighofer's avatar
      ARM ISel: Don't create illegal types during LowerMUL · af85f608
      Arnold Schwaighofer authored
      The transformation happening here is that we want to turn a
      "mul(ext(X), ext(X))" into a "vmull(X, X)", stripping off the extension. We have
      to make sure that X still has a valid vector type - possibly recreate an
      extension to a smaller type. In case of a extload of a memory type smaller than
      64 bit we used create a ext(load()). The problem with doing this - instead of
      recreating an extload - is that an illegal type is exposed.
      
      This patch fixes this by creating extloads instead of ext(load()) sequences.
      
      Fixes PR15970.
      
      radar://13871383
      
      llvm-svn: 181842
      af85f608
  3. May 14, 2013
Loading