Skip to content
  1. Jul 27, 2011
  2. Jul 26, 2011
  3. Jul 25, 2011
  4. Jul 22, 2011
  5. Jul 21, 2011
    • Bruno Cardoso Lopes's avatar
      - Register v16i16 as valid VR256 register class · 178fb406
      Bruno Cardoso Lopes authored
      - Add more bitcasts for v16i16
      - Since 135661 and 135662 already added the splat logic,
      just add one more splat test for v16i16
      
      llvm-svn: 135663
      178fb406
    • Bruno Cardoso Lopes's avatar
      Add support for 256-bit versions of VPERMIL instruction. This is a new · b878caa5
      Bruno Cardoso Lopes authored
      instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
      It considers a 256-bit vector as two independent 128-bit lanes. It can permute
      any 32 or 64 elements inside a lane, and restricts the second lane to
      have the same permutation of the first one. With the improved splat support
      introduced early today, adding codegen for this instruction enable more
      efficient 256-bit code:
      
      Instead of:
        vextractf128  $0, %ymm0, %xmm0
        punpcklbw %xmm0, %xmm0
        punpckhbw %xmm0, %xmm0
        vinsertf128 $0, %xmm0, %ymm0, %ymm1
        vinsertf128 $1, %xmm0, %ymm1, %ymm0
        vextractf128  $1, %ymm0, %xmm1
        shufps  $1, %xmm1, %xmm1
        movss %xmm1, 28(%rsp)
        movss %xmm1, 24(%rsp)
        movss %xmm1, 20(%rsp)
        movss %xmm1, 16(%rsp)
        vextractf128  $0, %ymm0, %xmm0
        shufps  $1, %xmm0, %xmm0
        movss %xmm0, 12(%rsp)
        movss %xmm0, 8(%rsp)
        movss %xmm0, 4(%rsp)
        movss %xmm0, (%rsp)
        vmovaps (%rsp), %ymm0
      We get:
        vextractf128  $0, %ymm0, %xmm0
        punpcklbw %xmm0, %xmm0
        punpckhbw %xmm0, %xmm0
        vinsertf128 $0, %xmm0, %ymm0, %ymm1
        vinsertf128 $1, %xmm0, %ymm1, %ymm0
        vpermilps $85, %ymm0, %ymm0
      
      llvm-svn: 135662
      b878caa5
    • Bruno Cardoso Lopes's avatar
      Improve splat promotion to handle AVX types: v32i8 and v16i16. Also · fb4920eb
      Bruno Cardoso Lopes authored
      refactor the code and add a bunch of comments. The final shuffle
      emitted by handling 256-bit types is suitable for the VPERM shuffle
      instruction which is going to be introduced in a next commit (with
      a testcase which cover this commit)
      
      llvm-svn: 135661
      fb4920eb
    • Bruno Cardoso Lopes's avatar
      Tidy up code · 0bdeacf0
      Bruno Cardoso Lopes authored
      llvm-svn: 135656
      0bdeacf0
  6. Jul 20, 2011
  7. Jul 18, 2011
  8. Jul 16, 2011
    • Bruno Cardoso Lopes's avatar
      Fix a couple of things: · 8df9cfc2
      Bruno Cardoso Lopes authored
      1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us
      canonize the loads and handle things the same way we use to handle
      for 128-bit registers. Despite of what one of the removed comments
      explained, the load promotion would not mess with VPERM, it's only a
      matter of doing the appropriate bitcasts when this instructions comes
      to be introduced. Also make LOAD v8i32 legal.
      
      2) Doing 1) exposed two bugs:
      - v4i64 was being promoted to itself for several opcodes (introduced
      in r124447 by David Greene) causing endless recursion and the stack to
      explode.
      - there was no support for allOnes BUILD_VECTORs and ANDNP would fail to
      match because it was generating early target constant pools during
      lowering.
      
      3) The testcases are already checked-in, doing 1) exposed the
      bugs in the current testcases.
      
      4) Tidy up code to be more clear and explicit about AVX.
      
      llvm-svn: 135313
      8df9cfc2
  9. Jul 14, 2011
    • Eric Christopher's avatar
      Check register class matching instead of width of type matching · 92464be2
      Eric Christopher authored
      when determining validity of matching constraint. Allow i1
      types access to the GR8 reg class for x86.
      
      Fixes PR10352 and rdar://9777108
      
      llvm-svn: 135180
      92464be2
    • Nadav Rotem's avatar
      · 771f2967
      Nadav Rotem authored
      [VECTOR-SELECT]
      During type legalization we often use the SIGN_EXTEND_INREG SDNode.
      When this SDNode is legalized during the LegalizeVector phase, it is
      scalarized because non-simple types are automatically marked to be expanded.
      In this patch we add support for lowering SIGN_EXTEND_INREG manually.
      This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
      flag.
      
      llvm-svn: 135144
      771f2967
  10. Jul 13, 2011
  11. Jul 08, 2011
  12. Jun 29, 2011
  13. Jun 28, 2011
    • Jakob Stoklund Olesen's avatar
      Clean up the handling of the x87 fp stack to make it more robust. · 7297e7e2
      Jakob Stoklund Olesen authored
      Drop the FpMov instructions, use plain COPY instead.
      
      Drop the FpSET/GET instruction for accessing fixed stack positions.
      Instead use normal COPY to/from ST registers around inline assembly, and
      provide a single new FpPOP_RETVAL instruction that can access the return
      value(s) from a call. This is still necessary since you cannot tell from
      the CALL instruction alone if it returns anything on the FP stack. Teach
      fast isel to use this.
      
      This provides a much more robust way of handling fixed stack registers -
      we can tolerate arbitrary FP stack instructions inserted around calls
      and inline assembly. Live range splitting could sometimes break x87 code
      by inserting spill code in unfortunate places.
      
      As a bonus we handle floating point inline assembly correctly now.
      
      llvm-svn: 134018
      7297e7e2
  14. Jun 25, 2011
Loading