Skip to content
  1. Aug 20, 2013
    • Hal Finkel's avatar
      Add a llvm.copysign intrinsic · 0c5c01aa
      Hal Finkel authored
      This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
      copysign (which is turned into the FCOPYSIGN SDAG node). In order to
      autovectorize calls to copysign in the loop vectorizer, we need a corresponding
      intrinsic as well.
      
      In addition to the expected changes to the language reference, the loop
      vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
      an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
      few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
      expanded.
      
      In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
      be Expand for vector types. This seems correct for all in-tree targets, and I
      think is the right thing to do because, previously, there was no way to generate
      vector-values FCOPYSIGN nodes (and most targets don't specify an action for
      vector-typed FCOPYSIGN).
      
      llvm-svn: 188728
      0c5c01aa
    • Hal Finkel's avatar
      Don't form PPC CTR-based loops around a copysignl call · 1cf48ab8
      Hal Finkel authored
      copysign/copysignf never become function calls (because the SDAG expansion code
      does not lower to the corresponding function call, but rather directly
      implements the associated logic), but copysignl almost always is lowered into a
      call to the requested libm functon (and, thus, might clobber CTR).
      
      llvm-svn: 188727
      1cf48ab8
  2. Aug 19, 2013
    • Akira Hatanaka's avatar
    • Mihai Popa's avatar
      Thumb2 add immediate alias for SP · 4a9df8a7
      Mihai Popa authored
      The Thumb2 add immediate is in fact defined for SP. The manual is misleading as it points to a different section for add immediate with SP, however the encoding is the same as for add immediate with register only with the SP operand hard coded. As such add immediate with SP and add immediate with register can safely be treated as the same instruction.
      
      All the patch does is adjust a register constraint on an instruction alias.
      
      llvm-svn: 188676
      4a9df8a7
    • Elena Demikhovsky's avatar
      AVX-512: added arithmetic and logical operations. · 1490c5eb
      Elena Demikhovsky authored
      ADD, SUB, MUL integer and FP types. OR, AND, XOR.
      Added embeded broadcast form for these instructions.
      
      llvm-svn: 188673
      1490c5eb
    • Richard Sandiford's avatar
      [SystemZ] Add negative integer absolute (load negative) · 784a5803
      Richard Sandiford authored
      For now this matches the equivalent of (neg (abs ...)), which did hit a few
      times in projects/test-suite.  We should probably also match cases where
      absolute-like selects are used with reversed arguments.
      
      llvm-svn: 188671
      784a5803
    • Richard Sandiford's avatar
      [SystemZ] Add integer absolute (load positive) · 4b897054
      Richard Sandiford authored
      llvm-svn: 188670
      4b897054
    • Richard Sandiford's avatar
      [SystemZ] Add support for sibling calls · 709bda66
      Richard Sandiford authored
      This first cut is pretty conservative.  The final argument register (R6)
      is call-saved, so we would need to make sure that the R6 argument to a
      sibling call is the same as the R6 argument to the calling function,
      which seems worth keeping as a separate patch.
      
      Saying that integer truncations are free means that we no longer
      use the extending instructions LGF and LLGF for spills in int-conv-09.ll
      and int-conv-10.ll.  Instead we treat the registers as 64 bits wide and
      truncate them to 32-bits where necessary.  I think it's unlikely we'd
      use LGF and LLGF for spills in other situations for the same reason,
      so I'm removing the tests rather than replacing them.  The associated
      code is generic and applies to many more instructions than just
      LGF and LLGF, so there is no corresponding code removal.
      
      llvm-svn: 188669
      709bda66
    • Hal Finkel's avatar
      Add the PPC fcpsgn instruction · dbc78e1f
      Hal Finkel authored
      Modern PPC cores support a floating-point copysign instruction, and we can use
      this to lower the FCOPYSIGN node (which is created from calls to the libm
      copysign function). A couple of extra patterns are necessary because the
      operand types of FCOPYSIGN need not agree.
      
      llvm-svn: 188653
      dbc78e1f
  3. Aug 18, 2013
  4. Aug 17, 2013
  5. Aug 16, 2013
    • Bill Schmidt's avatar
      [PowerPC] Preparatory refactoring for making prologue and epilogue · 8893a3d1
      Bill Schmidt authored
      safe on PPC32 SVR4 ABI
      
      [Patch and following text by Mark Minich; committing on his behalf.]
      
      There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
      PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
      on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
      (and any ABI without a Red Zone), no spills may be made until after
      the stackframe is claimed, which also includes the LR spill which is
      at a positive offset. The same problem exists in emitEpilogue(),
      though there's no FIXME for it. I intend to fix this issue, making
      LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
      platforms (including in particular, OS-free embedded systems & kernel
      code, where interrupts may share the same stack as user code).
      
      In preparation for making these changes, to make the diffs for the
      functional changes less cluttered, I am providing the non-functional
      refactorings in two stages:
      
      Stage 1 does some minor fluffy refactorings to pull multiple method
      calls up into a single bool, creating named bools for repeated uses of
      obscure logic, moving some code up earlier because either stage 2 or
      my final version will require it earlier, and rewording/adding some
      comments. My stage 1 changes can be characterized as primarily fluffy
      cleanup, the purpose of which may be unclear until the stage 2 or
      final changes are made.
      
      My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
      which is currently performed by largely duplicate code, into a single
      flow, with the differences handled by a group of constants initialized
      early in the methods.
      
      This submission is for my stage 1 changes. There should be no
      functional changes whatsoever; this is a pure refactoring.
      
      llvm-svn: 188573
      8893a3d1
    • Michel Danzer's avatar
      R600/SI: Add pattern for xor of i1 · 8522270d
      Michel Danzer authored
      
      
      Fixes two recent piglit regressions with radeonsi.
      
      Reviewed-by: default avatarTom Stellard <thomas.stellard@amd.com>
      llvm-svn: 188559
      8522270d
    • Michel Danzer's avatar
      R600/SI: Fix broken encoding of DS_WRITE_B32 · 20680b1c
      Michel Danzer authored
      
      
      The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
      instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
      it to corrupt the encoding of that by clobbering the first operand with
      the second one.
      
      Undo that damage and only apply the SMRD logic to that.
      
      Fixes some derivates related piglit regressions with radeonsi.
      
      Reviewed-by: default avatarTom Stellard <thomas.stellard@amd.com>
      llvm-svn: 188558
      20680b1c
    • Daniel Sanders's avatar
      Reverted test commit (r188556) · 6b32f892
      Daniel Sanders authored
      llvm-svn: 188557
      6b32f892
    • Daniel Sanders's avatar
      Test commit. Just a blank line · 7a2c9bc8
      Daniel Sanders authored
      llvm-svn: 188556
      7a2c9bc8
    • Benjamin Kramer's avatar
      a8eecee1
    • Benjamin Kramer's avatar
      When initializing the PIC global base register on ARM/ELF add pc to fix the address. · 30920666
      Benjamin Kramer authored
      This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches
      what GCC and SDag do for PIC but may not cover all of the many flavors of PIC
      that exist.
      
      llvm-svn: 188551
      30920666
    • Mihai Popa's avatar
      Add support for Thumb2 literal loads with negative zero offset · 46c1bcb4
      Mihai Popa authored
      Thumb2 literal loads use an offset encoding which allows for 
      negative zero. This fixes parsing and encoding so that #-0 
      is correctly processed. The parser represents #-0 as INT32_MIN.
      
      llvm-svn: 188549
      46c1bcb4
    • Mihai Popa's avatar
      Fix Thumb2 aliasing complementary instructions taking modified immediates · cf276b2c
      Mihai Popa authored
      There are many Thumb instructions which take 12-bit immediates encoded in a special
      8-byte value + 4-byte rotator form. Not all numbers are represented, and it's legal
      to transform an assembly instruction to be able to encode the immediate.
      
      For example: AND and BIC are complementary instructions; one can switch the AND
      to a BIC as long as the immediate is complemented. 
      
      The intent is to switch one instruction into its complementary one when the immediate
      cannot be encoded in the form requested in the original assembly and when the 
      complementary immediate is encodable.
      
      The patch addresses two issues:
      1. definition of t2SOImmNot immediate - it has to check that the orignal value is
      not encoded naturally
      2. t2AND and t2BIC instruction aliases which should use the Thumb2 SOImm operand 
      rather than the ARM one.
      
      llvm-svn: 188548
      cf276b2c
    • Richard Sandiford's avatar
      [SystemZ] Use SRST to implement strlen and strnlen · 0dec06a2
      Richard Sandiford authored
      It would also make sense to use it for memchr; I'm working on that now.
      
      llvm-svn: 188547
      0dec06a2
    • Richard Sandiford's avatar
      [SystemZ] Use MVST to implement strcpy and stpcpy · bb83a50f
      Richard Sandiford authored
      llvm-svn: 188546
      bb83a50f
    • Richard Sandiford's avatar
      [SystemZ] Use CLST to implement strcmp · ca232710
      Richard Sandiford authored
      llvm-svn: 188544
      ca232710
    • Richard Sandiford's avatar
      [SystemZ] Fix handling of 64-bit memcmp results · e3827751
      Richard Sandiford authored
      Generalize r188163 to cope with return types other than MVT::i32, just
      as the existing visitMemCmpCall code did.  I've split this out into a
      subroutine so that it can be used for other upcoming patches.
      
      I also noticed that I'd used the wrong API to record the out chain.
      It's a load that uses DAG.getRoot() rather than getRoot(), so the out
      chain should go on PendingLoads.  I don't have a testcase for that because
      we don't do any interesting scheduling on z yet.
      
      llvm-svn: 188540
      e3827751
    • Richard Sandiford's avatar
      [SystemZ] Fix sign of integer memcmp result · a5901257
      Richard Sandiford authored
      r188163 used CLC to implement memcmp.  Code that compares the result
      directly against zero can test the CC value produced by CLC, but code
      that needs an integer result must use IPM.  The sequence I'd used was:
      
         ipm <reg>
         sll <reg>, 2
         sra <reg>, 30
      
      but I'd forgotten that this inverts the order, so that CC==1 ("less")
      becomes an integer greater than zero, and CC==2 ("greater") becomes
      an integer less than zero.  This sequence should only be used if the
      CLC arguments are reversed to compensate.  The problem then is that
      the branch condition must also be reversed when testing the CLC
      result directly.
      
      Rather than do that, I went for a different sequence that works with
      the natural CLC order:
      
         ipm <reg>
         srl <reg>, 28
         rll <reg>, <reg>, 31
      
      One advantage of this is that it doesn't clobber CC.  A disadvantage
      is that any sign extension to 64 bits must be done separately,
      rather than being folded into the shifts.
      
      llvm-svn: 188538
      a5901257
    • Vladimir Medic's avatar
    • Craig Topper's avatar
    • Tom Stellard's avatar
      Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions" · dba25713
      Tom Stellard authored
      This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f.
      This is the wrong version of this fix.
      
      llvm-svn: 188523
      dba25713
    • Tom Stellard's avatar
      R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions · 82bef57f
      Tom Stellard authored
      The SIInsertWaits pass was overwriting the first operand (gds bit) of
      DS_WRITE_B32 with the second operand (value to write).  This meant that
      any time the value to write was stored in an odd number VGPR, the gds
      bit would be set causing the instruction to write to GDS instead of LDS.
      
      llvm-svn: 188522
      82bef57f
Loading