Skip to content
  1. Aug 13, 2013
  2. Aug 11, 2013
  3. Aug 08, 2013
  4. Aug 07, 2013
  5. Aug 06, 2013
    • Tim Northover's avatar
      Refactor isInTailCallPosition handling · a4415854
      Tim Northover authored
      This change came about primarily because of two issues in the existing code.
      Niether of:
      
      define i64 @test1(i64 %val) {
        %in = trunc i64 %val to i32
        tail call i32 @ret32(i32 returned %in)
        ret i64 %val
      }
      
      define i64 @test2(i64 %val) {
        tail call i32 @ret32(i32 returned undef)
        ret i32 42
      }
      
      should be tail calls, and the function sameNoopInput is responsible. The main
      problem is that it is completely symmetric in the "tail call" and "ret" value,
      but in reality different things are allowed on each side.
      
      For these cases:
      1. Any truncation should lead to a larger value being generated by "tail call"
         than needed by "ret".
      2. Undef should only be allowed as a source for ret, not as a result of the
         call.
      
      Along the way I noticed that a mismatch between what this function treats as a
      valid truncation and what the backends see can lead to invalid calls as well
      (see x86-32 test case).
      
      This patch refactors the code so that instead of being based primarily on
      values which it recurses into when necessary, it starts by inspecting the type
      and considers each fundamental slot that the backend will see in turn. For
      example, given a pathological function that returned {{}, {{}, i32, {}}, i32}
      we would consider each "real" i32 in turn, and ask if it passes through
      unchanged. This is much closer to what the backend sees as a result of
      ComputeValueVTs.
      
      Aside from the bug fixes, this eliminates the recursion that's going on and, I
      believe, makes the bulk of the code significantly easier to understand. The
      trade-off is the nasty iterators needed to find the real types inside a
      returned value.
      
      llvm-svn: 187787
      a4415854
  6. Aug 05, 2013
  7. Aug 04, 2013
    • Benjamin Kramer's avatar
      X86: Turn fp selects into mask operations. · 5bc180c1
      Benjamin Kramer authored
      double test(double a, double b, double c, double d) { return a<b ? c : d; }
      
      before:
      _test:
      	ucomisd	%xmm0, %xmm1
      	ja	LBB0_2
      	movaps	%xmm3, %xmm2
      LBB0_2:
      	movaps	%xmm2, %xmm0
      
      after:
      _test:
      	cmpltsd	%xmm1, %xmm0
      	andpd	%xmm0, %xmm2
      	andnpd	%xmm3, %xmm0
      	orpd	%xmm2, %xmm0
      
      Small speedup on Benchmarks/SmallPT
      
      llvm-svn: 187706
      5bc180c1
  8. Jul 31, 2013
    • Elena Demikhovsky's avatar
      Added INSERT and EXTRACT intructions from AVX-512 ISA. · 67b05fc0
      Elena Demikhovsky authored
      All insertf*/extractf* functions replaced with insert/extract since we have insertf and inserti forms.
      Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors.
      Added lowering for EXTRACT/INSERT subvector for 512-bit vectors.
      Added a test.
      
      llvm-svn: 187491
      67b05fc0
  9. Jul 09, 2013
    • Stephen Lin's avatar
      AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all · 73de7bf5
      Stephen Lin authored
      in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
      order to resolve the following issues with fmuladd (i.e. optional FMA)
      intrinsics:
      
      1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
      intrinsics even if the subtarget does not support FMA instructions, leading
      to laughably bad code generation in some situations.
      
      2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
      resulting in a call to a software fp128 FMA implementation.
      
      3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
      like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
      etc. to types that support hardware FMAs.
      
      The function has also been slightly renamed for consistency and to force a
      merge/build conflict for any out-of-tree target implementing it. To resolve,
      see comments and fixed in-tree examples.
      
      llvm-svn: 185956
      73de7bf5
  10. Jun 22, 2013
  11. Jun 07, 2013
  12. May 25, 2013
  13. May 18, 2013
  14. Apr 05, 2013
  15. Mar 29, 2013
  16. Mar 26, 2013
  17. Mar 01, 2013
    • Michael Liao's avatar
      Fix PR10475 · 6af16fc3
      Michael Liao authored
      - ISD::SHL/SRL/SRA must have either both scalar or both vector operands
        but TLI.getShiftAmountTy() so far only return scalar type. As a
        result, backend logic assuming that breaks.
      - Rename the original TLI.getShiftAmountTy() to
        TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
        return target-specificed scalar type or the same vector type as the
        1st operand.
      - Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
        type.
      
      llvm-svn: 176364
      6af16fc3
  18. Feb 15, 2013
  19. Jan 29, 2013
    • Evan Cheng's avatar
      Teach SDISel to combine fsin / fcos into a fsincos node if the following · 0e88c7d8
      Evan Cheng authored
      conditions are met:
      1. They share the same operand and are in the same BB.
      2. Both outputs are used.
      3. The target has a native instruction that maps to ISD::FSINCOS node or
         the target provides a sincos library call.
      
      Implemented the generic optimization in sdisel and enabled it for
      Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
      using an alternative entry point __sincos_stret which returns the two
      results in xmm0 / xmm1.
      
      rdar://13087969
      PR13204
      
      llvm-svn: 173755
      0e88c7d8
  20. Jan 28, 2013
  21. Jan 21, 2013
  22. Jan 20, 2013
  23. Jan 09, 2013
  24. Jan 07, 2013
    • Chandler Carruth's avatar
      Switch TargetTransformInfo from an immutable analysis pass that requires · 664e354d
      Chandler Carruth authored
      a TargetMachine to construct (and thus isn't always available), to an
      analysis group that supports layered implementations much like
      AliasAnalysis does. This is a pretty massive change, with a few parts
      that I was unable to easily separate (sorry), so I'll walk through it.
      
      The first step of this conversion was to make TargetTransformInfo an
      analysis group, and to sink the nonce implementations in
      ScalarTargetTransformInfo and VectorTargetTranformInfo into
      a NoTargetTransformInfo pass. This allows other passes to add a hard
      requirement on TTI, and assume they will always get at least on
      implementation.
      
      The TargetTransformInfo analysis group leverages the delegation chaining
      trick that AliasAnalysis uses, where the base class for the analysis
      group delegates to the previous analysis *pass*, allowing all but tho
      NoFoo analysis passes to only implement the parts of the interfaces they
      support. It also introduces a new trick where each pass in the group
      retains a pointer to the top-most pass that has been initialized. This
      allows passes to implement one API in terms of another API and benefit
      when some other pass above them in the stack has more precise results
      for the second API.
      
      The second step of this conversion is to create a pass that implements
      the TargetTransformInfo analysis using the target-independent
      abstractions in the code generator. This replaces the
      ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
      lib/Target with a single pass in lib/CodeGen called
      BasicTargetTransformInfo. This class actually provides most of the TTI
      functionality, basing it upon the TargetLowering abstraction and other
      information in the target independent code generator.
      
      The third step of the conversion adds support to all TargetMachines to
      register custom analysis passes. This allows building those passes with
      access to TargetLowering or other target-specific classes, and it also
      allows each target to customize the set of analysis passes desired in
      the pass manager. The baseline LLVMTargetMachine implements this
      interface to add the BasicTTI pass to the pass manager, and all of the
      tools that want to support target-aware TTI passes call this routine on
      whatever target machine they end up with to add the appropriate passes.
      
      The fourth step of the conversion created target-specific TTI analysis
      passes for the X86 and ARM backends. These passes contain the custom
      logic that was previously in their extensions of the
      ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
      I separated them into their own file, as now all of the interface bits
      are private and they just expose a function to create the pass itself.
      Then I extended these target machines to set up a custom set of analysis
      passes, first adding BasicTTI as a fallback, and then adding their
      customized TTI implementations.
      
      The fourth step required logic that was shared between the target
      independent layer and the specific targets to move to a different
      interface, as they no longer derive from each other. As a consequence,
      a helper functions were added to TargetLowering representing the common
      logic needed both in the target implementation and the codegen
      implementation of the TTI pass. While technically this is the only
      change that could have been committed separately, it would have been
      a nightmare to extract.
      
      The final step of the conversion was just to delete all the old
      boilerplate. This got rid of the ScalarTargetTransformInfo and
      VectorTargetTransformInfo classes, all of the support in all of the
      targets for producing instances of them, and all of the support in the
      tools for manually constructing a pass based around them.
      
      Now that TTI is a relatively normal analysis group, two things become
      straightforward. First, we can sink it into lib/Analysis which is a more
      natural layer for it to live. Second, clients of this interface can
      depend on it *always* being available which will simplify their code and
      behavior. These (and other) simplifications will follow in subsequent
      commits, this one is clearly big enough.
      
      Finally, I'm very aware that much of the comments and documentation
      needs to be updated. As soon as I had this working, and plausibly well
      commented, I wanted to get it committed and in front of the build bots.
      I'll be doing a few passes over documentation later if it sticks.
      
      Commits to update DragonEgg and Clang will be made presently.
      
      llvm-svn: 171681
      664e354d
  25. Jan 04, 2013
    • Nadav Rotem's avatar
      LoopVectorizer: · e1d5c4b8
      Nadav Rotem authored
      1. Add code to estimate register pressure.
      2. Add code to select the unroll factor based on register pressure.
      3. Add bits to TargetTransformInfo to provide the number of registers.
      
      llvm-svn: 171469
      e1d5c4b8
  26. Jan 03, 2013
  27. Dec 28, 2012
  28. Dec 27, 2012
  29. Dec 21, 2012
  30. Dec 19, 2012
  31. Dec 17, 2012
  32. Dec 15, 2012
  33. Dec 12, 2012
    • Evan Cheng's avatar
      Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I · 962711ee
      Evan Cheng authored
      mention the inline memcpy / memset expansion code is a mess?
      
      This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
      The first indicates whether it is expanding a memset or a memcpy / memmove.
      The later is whether the memset is a memset of zero. It's totally possible
      (likely even) that targets may want to do different things for memcpy and
      memset of zero.
      
      llvm-svn: 169959
      962711ee
Loading