Skip to content
  • Chris Lattner's avatar
    Significantly simplify and improve handling of FP function results on x86-32. · a91f77ea
    Chris Lattner authored
    This case returns the value in ST(0) and then has to convert it to an SSE
    register.  This causes significant codegen ugliness in some cases.  For 
    example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
    
    _bar:
    	subl	$28, %esp
    	call	L_foo$stub
    	fstpl	16(%esp)
    	movsd	16(%esp), %xmm0
    	movsd	%xmm0, 8(%esp)
    	fldl	8(%esp)
    	addl	$28, %esp
    	ret
    
    because we move the result of foo() into an XMM register, then have to
    move it back for the return of bar.
    
    Instead of hacking ever-more special cases into the call result lowering code
    we take a much simpler approach: on x86-32, fp return is modeled as always 
    returning into an f80 register which is then truncated to f32 or f64 as needed.
    Similarly for a result, we model it as an extension to f80 + return.
    
    This exposes the truncate and extensions to the dag combiner, allowing target
    independent code to hack on them, eliminating them in this case.  This gives 
    us this code for the example above:
    
    _bar:
    	subl	$12, %esp
    	call	L_foo$stub
    	addl	$12, %esp
    	ret
    
    The nasty aspect of this is that these conversions are not legal, but we want
    the second pass of dag combiner (post-legalize) to be able to hack on them.
    To handle this, we lie to legalize and say they are legal, then custom expand
    them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
    less gross than the code it is replacing :)
    
    This also allows us to generate better code in several other cases.  For 
    example on fp-stack-ret-conv.ll, we now generate:
    
    _test:
    	subl	$12, %esp
    	call	L_foo$stub
    	fstps	8(%esp)
    	movl	16(%esp), %eax
    	cvtss2sd	8(%esp), %xmm0
    	movsd	%xmm0, (%eax)
    	addl	$12, %esp
    	ret
    
    where before we produced (incidentally, the old bad code is identical to what
    gcc produces):
    
    _test:
    	subl	$12, %esp
    	call	L_foo$stub
    	fstpl	(%esp)
    	cvtsd2ss	(%esp), %xmm0
    	cvtss2sd	%xmm0, %xmm0
    	movl	16(%esp), %eax
    	movsd	%xmm0, (%eax)
    	addl	$12, %esp
    	ret
    
    Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
    deficiency that is unrelated to this patch.
    
    llvm-svn: 46307
    a91f77ea
Loading