Skip to content
  1. Dec 12, 2014
  2. Dec 11, 2014
  3. Dec 10, 2014
  4. Dec 09, 2014
    • Ahmed Bougacha's avatar
      [ARM] Make testcase more explicit. NFC. · 9d2d7c1b
      Ahmed Bougacha authored
      llvm-svn: 223841
      9d2d7c1b
    • Ahmed Bougacha's avatar
      [ARM] Also support v2f64 vld1/vst1. · be0b2276
      Ahmed Bougacha authored
      It was missing from the VLD1/VST1 handling logic, even though the
      corresponding instructions exist (same form as v2i64).
      
      In preparation for a future patch.
      
      llvm-svn: 223832
      be0b2276
    • Juergen Ributzka's avatar
      [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'. · c6f314b8
      Juergen Ributzka authored
      The load/store value type is currently not available when lowering the memcpy
      intrinsic. Add the missing nullptr check to support this in 'computeAddress'.
      
      Fixes rdar://problem/19178947.
      
      llvm-svn: 223818
      c6f314b8
    • Robert Khasanov's avatar
      [AVX512] Added lowering for VBROADCASTSS/SD instructions. · 8e8c3996
      Robert Khasanov authored
      Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
      Added lowering tests.
      
      llvm-svn: 223804
      8e8c3996
    • Bill Schmidt's avatar
      [PowerPC 4/4] Enable little-endian support for VSX. · efe9ce21
      Bill Schmidt authored
      With the foregoing three patches, VSX instructions can be used for
      little endian.  This patch removes the restriction that prevented
      this, and re-enables the test cases from the first three patches.
      
      llvm-svn: 223792
      efe9ce21
    • Bill Schmidt's avatar
      [PowerPC 3/4] Little-endian adjustments for VSX vector shuffle · 3014435c
      Bill Schmidt authored
      When performing instruction selection for ISD::VECTOR_SHUFFLE, there
      is special code for handling v2f64 and v2i64 using VSX instructions.
      This code must be adjusted for little-endian.  Because the two inputs
      are treated as a double-wide register, we must swap their order for
      little endian.  To get the appropriate mask elements to use with the
      big-endian biased XXPERMDI instruction, we must reverse their order
      and invert the bits.
      
      A new test is added to test the 16 possible values of the shuffle
      mask.  It is initially disabled for reasons specified in the test.  It
      is re-enabled by patch 4/4.
      
      llvm-svn: 223791
      3014435c
    • Bill Schmidt's avatar
      41879626
    • Juergen Ributzka's avatar
      [CodeGenPrepare] Split branch conditions into multiple conditional branches. · c1bbcbbd
      Juergen Ributzka authored
      This optimization transforms code like:
      bb1:
        %0 = icmp ne i32 %a, 0
        %1 = icmp ne i32 %b, 0
        %or.cond = or i1 %0, %1
        br i1 %or.cond, label %TrueBB, label %FalseBB
      
      into a multiple branch instructions like:
      
      bb1:
        %0 = icmp ne i32 %a, 0
        br i1 %0, label %TrueBB, label %bb2
      bb2:
        %1 = icmp ne i32 %b, 0
        br i1 %1, label %TrueBB, label %FalseBB
      
      This optimization is already performed by SelectionDAG, but not by FastISel.
      FastISel cannot perform this optimization, because it cannot generate new
      MachineBasicBlocks.
      
      Performing this optimization at CodeGenPrepare time makes it available to both -
      SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
      removed. There are currenty a few differences in codegen for X86 and PPC, so
      this commmit only enables it for FastISel.
      
      Reviewed by Jim Grosbach
      
      This fixes rdar://problem/19034919.
      
      llvm-svn: 223786
      c1bbcbbd
    • Bill Schmidt's avatar
      [PowerPC 1/4] Little-endian adjustments for VSX loads/stores · fae5d715
      Bill Schmidt authored
      This patch addresses the inherent big-endian bias in the lxvd2x,
      lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
      vector elements into registers left-to-right (with the first element
      loaded into the high-order bits of the register), regardless of the
      endian setting of the processor.  However, these are the only
      vector memory instructions that permit unaligned storage accesses, so
      we want to use them for little-endian.
      
      To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
      followed by an xxswapd, which swaps the doublewords.  This works for
      lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
      vector elements are in LE order (right-to-left) within each
      doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
      appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
      0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
      with an stxvd2x preceded by an xxswapd.
      
      Introduction of extra swap instructions provides correctness, but
      obviously is not ideal from a performance perspective.  Future patches
      will address this with optimizations to remove most of the introduced
      swaps, which have proven effective in other implementations.
      
      The introduction of the swaps is performed during lowering of LOAD,
      STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
      are used to translate intrinsics that specify the VSX loads and stores
      directly into equivalent sequences for little endian.  Thus code that
      uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
      ported from BE to LE.
      
      We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
      use during this lowering step.  In PPCInstrVSX.td, we add new SDType
      and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
      These are recognized during instruction selection and mapped to the
      correct instructions.
      
      Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
      to disable VSX on LE variants because code generation changes with
      this and subsequent patches in this set.  I chose to include all of
      these in the first patch than try to rigorously sort out which tests
      were broken by one or another of the patches.  Sorry about that.
      
      The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
      are disabled until LE support is enabled because of breakages that
      occur as noted in those tests.  They are re-enabled in patch 4/4.
      
      llvm-svn: 223783
      fae5d715
    • Chandler Carruth's avatar
      [x86] Fix the test to actually test things for the CPU names, add the · f57ac3bd
      Chandler Carruth authored
      missing barcelona CPU which that test uncovered, and remove the 32-bit
      x86 CPUs which I really wasn't prepared to audit and test thoroughly.
      
      If anyone wants to clean up the 32-bit only x86 CPUs, go for it.
      
      Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
      be cool, but from the looks of it wouldn't save as much as it did for
      the Intel CPUs.
      
      llvm-svn: 223774
      f57ac3bd
    • Chandler Carruth's avatar
      5303c6fc
    • Michael Kuperstein's avatar
      [X86] Convert esp-relative movs of function arguments into pushes, step 1 · c69bb43f
      Michael Kuperstein authored
      This handles the simplest case for mov -> push conversion:
      1. x86-32 calling convention, everything is passed through the stack.
      2. There is no reserved call frame.
      3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.
      
      Differential Revision: http://reviews.llvm.org/D6503
      
      llvm-svn: 223757
      c69bb43f
    • Hal Finkel's avatar
      Handle early-clobber registers in the aggressive anti-dep breaker · c8cf2b88
      Hal Finkel authored
      The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
      scheduling (but is available to all targets), did not handle early-clobber MI
      operands (at all). When constructing the list of available registers for the
      replacement of some def operand, check the using instructions, and remove
      registers assigned to early-clobbered defs from the set.
      
      Fixes PR21452.
      
      llvm-svn: 223727
      c8cf2b88
    • Tom Stellard's avatar
      MISched: Fix moving stores across barriers · 3e01d47d
      Tom Stellard authored
      This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
      where stores without an underlying object would not be added
      as a predecessor to the current BarrierChain.
      
      llvm-svn: 223717
      3e01d47d
  5. Dec 08, 2014
    • Hal Finkel's avatar
      [PowerPC] Don't use a non-allocatable register to implement the 'cc' alias · aa10b3ca
      Hal Finkel authored
      GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
      processing inline asm constraints. This had previously been implemented using a
      non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
      the infrastructure does not seem to support this properly (neither the register
      allocator nor the scheduler properly accounts for the alias). Instead, we can
      just process this as a naming alias inside of the inline asm
      constraint-processing code, so we'll do that instead.
      
      There are two regression tests, one where the post-RA scheduler did the wrong
      thing with the non-allocatable alias, and one where the register allocator did
      the wrong thing. Fixes PR21742.
      
      llvm-svn: 223708
      aa10b3ca
    • Bruno Cardoso Lopes's avatar
      [CompactUnwind] Fix register encoding logic · 27de9b0f
      Bruno Cardoso Lopes authored
      Fix a compact unwind encoding logic bug which would try to encode
      more callee saved registers than it should, leading to early bail out
      in the encoding logic and abusive use of DWARF frame mode unnecessarily.
      
      Also remove no-compact-unwind.ll which was testing the wrong thing
      based on this bug and move it to valid 'compact unwind' tests. Added
      other few more tests too.
      
      llvm-svn: 223676
      27de9b0f
    • Tim Northover's avatar
      AArch64: treat HFAs containing "half" types as blocks too. · 67be569a
      Tim Northover authored
      llvm-svn: 223669
      67be569a
    • Andrea Di Biagio's avatar
      [X86] Improved tablegen patters for matching TZCNT/LZCNT. · d80836ed
      Andrea Di Biagio authored
      Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
      condition code is X86_COND_NE.
      Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
      X86cond with condition code equal to X86_COND_E. To avoid introducing
      extra rules, I added an 'ImmLeaf' definition that checks if the
      condition code is COND_E or COND_NE.
      
      llvm-svn: 223668
      d80836ed
Loading