README.txt

Target Independent Opportunities:

//===---------------------------------------------------------------------===//

Dead argument elimination should be enhanced to handle cases when an argument is
dead to an externally visible function.  Though the argument can't be removed
from the externally visible function, the caller doesn't need to pass it in.
For example in this testcase:

  void foo(int X) __attribute__((noinline));
  void foo(int X) { sideeffect(); }
  void bar(int A) { foo(A+1); }

We compile bar to:

define void @bar(i32 %A) nounwind ssp {
  %0 = add nsw i32 %A, 1                          ; <i32> [#uses=1]
  tail call void @foo(i32 %0) nounwind noinline ssp
  ret void
}

The add is dead, we could pass in 'i32 undef' instead.  This occurs for C++
templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
linkage.

//===---------------------------------------------------------------------===//

With the recent changes to make the implicit def/use set explicit in
machineinstrs, we should change the target descriptions for 'call' instructions
so that the .td files don't list all the call-clobbered registers as implicit
defs.  Instead, these should be added by the code generator (e.g. on the dag).

This has a number of uses:

1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
   for their different impdef sets.
2. Targets with multiple calling convs (e.g. x86) which have different clobber
   sets don't need copies of call instructions.
3. 'Interprocedural register allocation' can be done to reduce the clobber sets
   of calls.

//===---------------------------------------------------------------------===//

Make the PPC branch selector target independant

//===---------------------------------------------------------------------===//

Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
precision don't matter (ffastmath).  Misc/mandel will like this. :)  This isn't
safe in general, even on darwin.  See the libm implementation of hypot for
examples (which special case when x/y are exactly zero to get signed zeros etc
right).

//===---------------------------------------------------------------------===//

Solve this DAG isel folding deficiency:

int X, Y;

void fn1(void)
{
  X = X | (Y << 3);
}

compiles to

fn1:
	movl Y, %eax
	shll $3, %eax
	orl X, %eax
	movl %eax, X
	ret

The problem is the store's chain operand is not the load X but rather
a TokenFactor of the load X and load Y, which prevents the folding.

There are two ways to fix this:

1. The dag combiner can start using alias analysis to realize that y/x
   don't alias, making the store to X not dependent on the load from Y.
2. The generated isel could be made smarter in the case it can't
   disambiguate the pointers.

Number 1 is the preferred solution.

This has been "fixed" by a TableGen hack. But that is a short term workaround
which will be removed once the proper fix is made.

//===---------------------------------------------------------------------===//

On targets with expensive 64-bit multiply, we could LSR this:

for (i = ...; ++i) {
   x = 1ULL << i;

into:
 long long tmp = 1;
 for (i = ...; ++i, tmp+=tmp)
   x = tmp;

This would be a win on ppc32, but not x86 or ppc64.

//===---------------------------------------------------------------------===//

Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)

//===---------------------------------------------------------------------===//

Reassociate should turn things like:

int factorial(int X) {
 return X*X*X*X*X*X*X*X;
}

into llvm.powi calls, allowing the code generator to produce balanced
multiplication trees.

First, the intrinsic needs to be extended to support integers, and second the
code generator needs to be enhanced to lower these to multiplication trees.

//===---------------------------------------------------------------------===//

Interesting? testcase for add/shift/mul reassoc:

int bar(int x, int y) {
  return x*x*x+y+x*x*x*x*x*y*y*y*y;
}
int foo(int z, int n) {
  return bar(z, n) + bar(2*z, 2*n);
}

This is blocked on not handling X*X*X -> powi(X, 3) (see note above).  The issue
is that we end up getting t = 2*X  s = t*t   and don't turn this into 4*X*X,
which is the same number of multiplies and is canonical, because the 2*X has
multiple uses.  Here's a simple example:

define i32 @test15(i32 %X1) {
  %B = mul i32 %X1, 47   ; X1*47
  %C = mul i32 %B, %B
  ret i32 %C
}


//===---------------------------------------------------------------------===//

Reassociate should handle the example in GCC PR16157:

extern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4; 
void f () {  /* this can be optimized to four additions... */ 
        b4 = a4 + a3 + a2 + a1 + a0; 
        b3 = a3 + a2 + a1 + a0; 
        b2 = a2 + a1 + a0; 
        b1 = a1 + a0; 
} 

This requires reassociating to forms of expressions that are already available,
something that reassoc doesn't think about yet.


//===---------------------------------------------------------------------===//

This function: (derived from GCC PR19988)
double foo(double x, double y) {
  return ((x + 0.1234 * y) * (x + -0.1234 * y));
}

compiles to:
_foo:
	movapd	%xmm1, %xmm2
	mulsd	LCPI1_1(%rip), %xmm1
	mulsd	LCPI1_0(%rip), %xmm2
	addsd	%xmm0, %xmm1
	addsd	%xmm0, %xmm2
	movapd	%xmm1, %xmm0
	mulsd	%xmm2, %xmm0
	ret

Instcombine should be able to turn it into:

double foo(double x, double y) {
  return ((x + 0.1234 * y) * (x - 0.1234 * y));
}

Which allows the multiply by constant to be CSE'd, producing:

_foo:
	mulsd	LCPI1_0(%rip), %xmm1
	movapd	%xmm1, %xmm2
	addsd	%xmm0, %xmm2
	subsd	%xmm1, %xmm0
	mulsd	%xmm2, %xmm0
	ret

This doesn't need -ffast-math support at all.  This is particularly bad because
the llvm-gcc frontend is canonicalizing the later into the former, but clang
doesn't have this problem.

//===---------------------------------------------------------------------===//

These two functions should generate the same code on big-endian systems: