Skip to content
  1. May 05, 2015
    • Andrey Churbanov's avatar
      7ecb0714
    • David Majnemer's avatar
      [Inliner] Discard empty COMDAT groups · ac256cfe
      David Majnemer authored
      COMDAT groups which have become rendered unused because of inline are
      discardable if we can prove that we've made the group empty.
      
      This fixes PR22285.
      
      llvm-svn: 236539
      ac256cfe
    • Pete Cooper's avatar
      Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC · 7605e37a
      Pete Cooper authored
      Note, this is a reapplication of r236515 with a fix to not assert on non-register operands, but instead only handle them until the subsequent commit.  Original commit message follows.
      
      The code was basically the same here already.  Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
      
      Will be used in the next commit to also handle regmasks.
      
      llvm-svn: 236538
      7605e37a
    • Peter Collingbourne's avatar
      Thumb2SizeReduction: Check the correct set of registers for LDMIA. · 85a0e23b
      Peter Collingbourne authored
      The register set for LDMIA begins at offset 3, not 4. We were previously
      missing the short encoding of this instruction in the case where the base
      register was the first register in the register set.
      
      Also clean up some dead code:
      
      - The isARMLowRegister check is redundant with what VerifyLowRegs does;
        replace with an assert.
      - Remove handling of LDMDB instruction, which has no short encoding (and
        does not appear in ReduceTable).
      
      Differential Revision: http://reviews.llvm.org/D9485
      
      llvm-svn: 236535
      85a0e23b
    • Ulrich Weigand's avatar
      [DAGCombiner] Account for getVectorIdxTy() when narrowing vector load · 9958c489
      Ulrich Weigand authored
      This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
      the element number from getVectorIdxTy() to PtrTy before doing pointer
      arithmetic on it.  This is needed on z, where element numbers are i32
      but pointers are i64.
      
      Original patch by Richard Sandiford.
      
      llvm-svn: 236530
      9958c489
    • Ulrich Weigand's avatar
      [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE · af2c618e
      Ulrich Weigand authored
      For little-endian, the function would convert (extract_vector_elt (load X), Y)
      to X + Y*sizeof(elt).  For big-endian it would instead use
      X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
      vector index order always follows memory/array order, even for big-endian.
      (Note that the current handling has to be wrong for Y==0 since it would
      access beyond the end of the vector.)
      
      Original patch by Richard Sandiford.
      
      llvm-svn: 236529
      af2c618e
    • Ulrich Weigand's avatar
      [LegalizeVectorTypes] Allow single loads and stores for more short vectors · 2693c0a4
      Ulrich Weigand authored
      When lowering a load or store for TypeWidenVector, the type legalizer
      would use a single load or store if the associated integer type was legal.
      E.g. it would load a v4i8 as an i32 if i32 was legal.
      
      This patch extends that behavior to promoted integers as well as legal ones.
      If the integer type for the full vector width is TypePromoteInteger,
      the element type is going to be TypePromoteInteger too, and it's still
      better to use a single promoting load or truncating store rather than N
      individual promoting loads or truncating stores.  E.g. if you have a v2i8
      on a target where i16 is promoted to i32, it's better to load the v2i8 as
      an i16 rather than load both i8s individually.
      
      Original patch by Richard Sandiford.
      
      llvm-svn: 236528
      2693c0a4
    • Ulrich Weigand's avatar
      [SystemZ] Add vector intrinsics · c1708b26
      Ulrich Weigand authored
      This adds intrinsics to allow access to all of the z13 vector instructions.
      Note that instructions whose semantics can be described by standard LLVM IR
      do not get any intrinsics.
      
      For each instructions whose semantics *cannot* (fully) be described, we
      define an LLVM IR target-specific intrinsic that directly maps to this
      instruction.
      
      For instructions that also set the condition code, the LLVM IR intrinsic
      returns the post-instruction CC value as a second result.  Instruction
      selection will attempt to detect code that compares that CC value against
      constants and use the condition code directly instead.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236527
      c1708b26
    • Ulrich Weigand's avatar
      [SystemZ] Mark v1i128 and v1f128 as unsupported · 5211f9ff
      Ulrich Weigand authored
      The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
      passed in vector registers.  We do not yet support those types, and
      some infrastructure is missing before we can do so.
      
      In order to prevent accidentally generating code violating the ABI,
      this patch adds checks to detect those types and error out if user
      code attempts to use them.
      
      llvm-svn: 236526
      5211f9ff
    • Ulrich Weigand's avatar
      [SystemZ] Handle sub-128 vectors · cd2a1b53
      Ulrich Weigand authored
      The ABI allows sub-128 vectors to be passed and returned in registers,
      with the vector occupying the upper part of a register.  We therefore
      want to legalize those types by widening the vector rather than promoting
      the elements.
      
      The patch includes some simple tests for sub-128 vectors and also tests
      that we can recognize various pack sequences, some of which use sub-128
      vectors as temporary results.  One of these forms is based on the pack
      sequences generated by llvmpipe when no intrinsics are used.
      
      Signed unpacks are recognized as BUILD_VECTORs whose elements are
      individually sign-extended.  Unsigned unpacks can have the equivalent
      form with zero extension, but they also occur as shuffles in which some
      elements are zero.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236525
      cd2a1b53
    • Ulrich Weigand's avatar
      [SystemZ] Add CodeGen support for scalar f64 ops in vector registers · 49506d78
      Ulrich Weigand authored
      The z13 vector facility includes some instructions that operate only on the
      high f64 in a v2f64, effectively extending the FP register set from 16
      to 32 registers.  It's still better to use the old instructions if the
      operands happen to fit though, since the older instructions have a shorter
      encoding.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236524
      49506d78
    • Ulrich Weigand's avatar
      [SystemZ] Add CodeGen support for v4f32 · 80b3af7a
      Ulrich Weigand authored
      The architecture doesn't really have any native v4f32 operations except
      v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
      elements being used.  Even so, using vector registers for <4 x float>
      and scalarising individual operations is much better than generating
      completely scalar code, since there's much less register pressure.
      It's also more efficient to do v4f32 comparisons by extending to 2
      v2f64s, comparing those, then packing the result.
      
      This particularly helps with llvmpipe.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236523
      80b3af7a
    • Ulrich Weigand's avatar
      [SystemZ] Add CodeGen support for v2f64 · cd808237
      Ulrich Weigand authored
      This adds ABI and CodeGen support for the v2f64 type, which is natively
      supported by z13 instructions.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236522
      cd808237
    • Ulrich Weigand's avatar
      [SystemZ] Add CodeGen support for integer vector types · ce4c1095
      Ulrich Weigand authored
      This the first of a series of patches to add CodeGen support exploiting
      the instructions of the z13 vector facility.  This patch adds support
      for the native integer vector types (v16i8, v8i16, v4i32, v2i64).
      
      When the vector facility is present, we default to the new vector ABI.
      This is characterized by two major differences:
      - Vector types are passed/returned in vector registers
        (except for unnamed arguments of a variable-argument list function).
      - Vector types are at most 8-byte aligned.
      
      The reason for the choice of 8-byte vector alignment is that the hardware
      is able to efficiently load vectors at 8-byte alignment, and the ABI only
      guarantees 8-byte alignment of the stack pointer, so requiring any higher
      alignment for vectors would require dynamic stack re-alignment code.
      
      However, for compatibility with old code that may use vector types, when
      *not* using the vector facility, the old alignment rules (vector types
      are naturally aligned) remain in use.
      
      These alignment rules are not only implemented at the C language level
      (implemented in clang), but also at the LLVM IR level.  This is done
      by selecting a different DataLayout string depending on whether the
      vector ABI is in effect or not.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236521
      ce4c1095
    • Ulrich Weigand's avatar
      [SystemZ] Add z13 vector facility and MC support · a8b04e1c
      Ulrich Weigand authored
      This patch adds support for the z13 processor type and its vector facility,
      and adds MC support for all new instructions provided by that facilily.
      
      Apart from defining the new instructions, the main changes are:
      
      - Adding VR128, VR64 and VR32 register classes.
      - Making FP64 a subclass of VR64 and FP32 a subclass of VR32.
      - Adding a D(V,B) addressing mode for scatter/gather operations
      - Adding 1-, 2-, and 3-bit immediate operands for some 4-bit fields.
        Until now all immediate operands have been the same width as the
        underlying field (hence the assert->return change in decode[SU]ImmOperand).
      
      In addition, sys::getHostCPUName is extended to detect running natively
      on a z13 machine.
      
      Based on a patch by Richard Sandiford.
      
      llvm-svn: 236520
      a8b04e1c
    • Pete Cooper's avatar
      Revert "Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC" · 336d90b6
      Pete Cooper authored
      This reverts commit 963cdbccf6e5578822836fd9b2ebece0ba9a60b7 (ie r236514)
      
      This is to get the bots green while i investigate.
      
      llvm-svn: 236518
      336d90b6
    • Pete Cooper's avatar
      Revert "Fix IfConverter to handle regmask machine operands." · 05b84d41
      Pete Cooper authored
      This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).
      
      This is to get the bots green while i investigate the failures.
      
      llvm-svn: 236517
      05b84d41
    • Pete Cooper's avatar
      Fix IfConverter to handle regmask machine operands. · 6ebc2077
      Pete Cooper authored
      A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
      
      These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.
      
      Reviewed by Matthias Braun and Quentin Colombet.
      
      llvm-svn: 236515
      6ebc2077
    • Pete Cooper's avatar
      Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC · bbd1c727
      Pete Cooper authored
      The code was basically the same here already.  Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
      
      Will be used in the next commit to also handle regmasks.
      
      llvm-svn: 236514
      bbd1c727
    • Diego Novillo's avatar
      Fix typo in assert message. NFC. · 32a0bee2
      Diego Novillo authored
      llvm-svn: 236513
      32a0bee2
    • David Blaikie's avatar
      Fix the clang -Werror build, use of uninitialized variable. · b10516e4
      David Blaikie authored
      llvm-svn: 236512
      b10516e4
    • Daniel Berlin's avatar
      Update BasicAliasAnalysis to understand that nothing aliases with undef values. · 3459d6ea
      Daniel Berlin authored
      It got this in some cases (if one of them was an identified object), but not in all cases.
      
      This caused stores to undef to block load-forwarding in some cases, etc.
      
      Added test to Transforms/GVN to verify optimization occurs as expected.
      
      llvm-svn: 236511
      3459d6ea
    • David Blaikie's avatar
      73cf872a
    • Reid Kleckner's avatar
      Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" · 0738a9c0
      Reid Kleckner authored
      This reverts commit r236360.
      
      This change exposed a bug in WinEHPrepare by opting win32 code into EH
      preparation. We already knew that WinEHPrepare has bugs, and is the
      status quo for x64, so I don't think that's a reason to hold off on this
      change. I disabled exceptions in the sanitizer tests in r236505 and an
      earlier revision.
      
      llvm-svn: 236508
      0738a9c0
    • Quentin Colombet's avatar
      [ShrinkWrap] Add (a simplified version) of shrink-wrapping. · 61b305ed
      Quentin Colombet authored
      This patch introduces a new pass that computes the safe point to insert the
      prologue and epilogue of the function.
      The interest is to find safe points that are cheaper than the entry and exits
      blocks.
      
      As an example and to avoid regressions to be introduce, this patch also
      implements the required bits to enable the shrink-wrapping pass for AArch64.
      
      
      ** Context **
      
      Currently we insert the prologue and epilogue of the method/function in the
      entry and exits blocks. Although this is correct, we can do a better job when
      those are not immediately required and insert them at less frequently executed
      places.
      The job of the shrink-wrapping pass is to identify such places.
      
      
      ** Motivating example **
      
      Let us consider the following function that perform a call only in one branch of
      a if:
      define i32 @f(i32 %a, i32 %b)  {
       %tmp = alloca i32, align 4
       %tmp2 = icmp slt i32 %a, %b
       br i1 %tmp2, label %true, label %false
      
      true:
       store i32 %a, i32* %tmp, align 4
       %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
       br label %false
      
      false:
       %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
       ret i32 %tmp.0
      }
      
      On AArch64 this code generates (removing the cfi directives to ease
      readabilities):
      _f:                                     ; @f
      ; BB#0:
        stp x29, x30, [sp, #-16]!
        mov  x29, sp
        sub sp, sp, #16             ; =16
        cmp  w0, w1
        b.ge  LBB0_2
      ; BB#1:                                 ; %true
        stur  w0, [x29, #-4]
        sub x1, x29, #4             ; =4
        mov  w0, wzr
        bl  _doSomething
      LBB0_2:                                 ; %false
        mov  sp, x29
        ldp x29, x30, [sp], #16
        ret
      
      With shrink-wrapping we could generate:
      _f:                                     ; @f
      ; BB#0:
        cmp  w0, w1
        b.ge  LBB0_2
      ; BB#1:                                 ; %true
        stp x29, x30, [sp, #-16]!
        mov  x29, sp
        sub sp, sp, #16             ; =16
        stur  w0, [x29, #-4]
        sub x1, x29, #4             ; =4
        mov  w0, wzr
        bl  _doSomething
        add sp, x29, #16            ; =16
        ldp x29, x30, [sp], #16
      LBB0_2:                                 ; %false
        ret
      
      Therefore, we would pay the overhead of setting up/destroying the frame only if
      we actually do the call.
      
      
      ** Proposed Solution **
      
      This patch introduces a new machine pass that perform the shrink-wrapping
      analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
      It then stores the safe save and restore point into the MachineFrameInfo
      attached to the MachineFunction.
      This information is then used by the PrologEpilogInserter (PEI) to place the
      related code at the right place. This pass runs right before the PEI.
      
      Unlike the original paper of Chow from PLDI’88, this implementation of
      shrink-wrapping does not use expensive data-flow analysis and does not need hack
      to properly avoid frequently executed point. Instead, it relies on dominance and
      loop properties.
      
      The pass is off by default and each target can opt-in by setting the
      EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
      This setting can also be overwritten on the command line by using
      -enable-shrink-wrap.
      
      Before you try out the pass for your target, make sure you properly fix your
      emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
      necessarily the entry block.
      
      
      ** Design Decisions **
      
      1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
      for debugging and clarity I thought it was best to have its own file.
      2. Right now, we only support one save point and one restore point. At some
      point we can expand this to several save point and restore point, the impacted
      component would then be:
      - The pass itself: New algorithm needed.
      - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
        pointer.
      - PEI: Should loop over the save point and restore point.
      Anyhow, at least for this first iteration, I do not believe this is interesting
      to support the complex cases. We should revisit that when we motivating
      examples.
      
      Differential Revision: http://reviews.llvm.org/D9210
      
      <rdar://problem/3201744>
      
      llvm-svn: 236507
      61b305ed
    • Lang Hames's avatar
      [Orc] Reapply r236465 with fixes for the MSVC bots. · cd68eba3
      Lang Hames authored
      llvm-svn: 236506
      cd68eba3
    • Daniel Sanders's avatar
      [bugpoint] Increase default memory limit to 400MB to fix bugpoint tests. · 85202063
      Daniel Sanders authored
      I tracked down the bug to an unchecked malloc in SmallVectorBase::grow_pod().
      This malloc is returning NULL on my machine when running under bugpoint but not
      when -enable-valgrind is given.
      
      llvm-svn: 236504
      85202063
    • Kit Barton's avatar
      This patch adds ABI support for v1i128 data type. · d4eb73c0
      Kit Barton authored
      It adds v1i128 to the appropriate register classes and checks parameter passing
      and return values.
      
      This is related to http://reviews.llvm.org/D9081, which will add instructions
      that exploit the v1i128 datatype.
      
      Phabricator review: http://reviews.llvm.org/D9475
      
      llvm-svn: 236503
      d4eb73c0
    • Igor Laevsky's avatar
      2aa8cafa
    • Daniel Sanders's avatar
      [mips] Generate code for insert/extract operations when using the N64 ABI and MSA. · eda60d21
      Daniel Sanders authored
      Summary:
      When using the N64 ABI, element-indices use the i64 type instead of i32.
      In many cases, we can use iPTR to account for this but additional patterns
      and pseudo's are also required.
      
      This fixes most (but not quite all) failures in the test-suite when using
      N64 and MSA together.
      
      Reviewers: vkalintiris
      
      Reviewed By: vkalintiris
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D9342
      
      llvm-svn: 236494
      eda60d21
    • Ismail Donmez's avatar
      Fix regression in parsing armv{6,7}hl- triples. These are used by SUSE · 5eb52b74
      Ismail Donmez authored
      and Redhat currently.
      
      Reviewed by Jonathan Roelofs.
      
      llvm-svn: 236492
      5eb52b74
    • Daniel Sanders's avatar
      [mips][msa] Test basic operations for the N32 ABI too. · 4160c802
      Daniel Sanders authored
      Summary:
      This required adding instruction aliases for dneg.
      
      N64 will be enabled shortly but requires additional bugfixes.
      
      Reviewers: vkalintiris
      
      Reviewed By: vkalintiris
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D9341
      
      llvm-svn: 236489
      4160c802
    • Kostya Serebryany's avatar
    • Lang Hames's avatar
      [Orc] Revert r236465 - It broke the Windows bots. · ac31a1f1
      Lang Hames authored
      Looks like the usual missing explicit move-constructor issue with MSVC. I should
      have a fix shortly.
      
      llvm-svn: 236472
      ac31a1f1
    • Reid Kleckner's avatar
      [X86] Fix assertion while DAG combining offsets and ExternalSymbols · 9dad227b
      Reid Kleckner authored
      ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes.
      
      llvm-svn: 236471
      9dad227b
    • Pete Cooper's avatar
      [ARM] IT block insertion needs to update kill flags · 4dddbcfb
      Pete Cooper authored
      When forming an IT block from the first MOV here:
      
      	%R2<def> = t2MOVr %R0, pred:1, pred:%CPSR, opt:%noreg
      	%R3<def> = tMOVr %R0<kill>, pred:14, pred:%noreg
      
      the move in to R3 is moved out of the IT block so that later instructions on the same predicate can be inside this block, and we can share the IT instruction.
      
      However, when moving the R3 copy out of the IT block, we need to clear its kill flags for anything in use at this point in time, ie, R0 here.
      
      This appeases the machine verifier which thought that R0 wasn't defined when used.
      
      I have a test case, but its extremely register allocator specific.  It would be too fragile to commit a test which depends on the register allocator here.
      
      llvm-svn: 236468
      4dddbcfb
    • Pete Cooper's avatar
      Add TransformUtils dependency to lli. · b5445cce
      Pete Cooper authored
      After r236465, Orc uses ValueMaterializer and so needs to link against TransformUtils to get the ValueMaterializer::anchor().
      
      llvm-svn: 236467
      b5445cce
    • Lang Hames's avatar
      [Orc] Refactor the compile-on-demand layer to make module partitioning lazy, · a68970df
      Lang Hames authored
      and avoid cloning unused decls into every partition.
      
      Module partitioning showed up as a source of significant overhead when I
      profiled some trivial test cases. Avoiding the overhead of partitionging
      for uncalled functions helps to mitigate this.
      
      This change also means that it is no longer necessary to have a
      LazyEmittingLayer underneath the CompileOnDemand layer, since the
      CompileOnDemandLayer will not extract or emit function bodies until they are
      called.
      
      llvm-svn: 236465
      a68970df
  2. May 04, 2015
Loading