Newer
Older
//===---------------------------------------------------------------------===//
We should recognize idioms for add-with-carry and turn it into the appropriate
intrinsics. This example:
unsigned add32carry(unsigned sum, unsigned x) {
unsigned z = sum + x;
if (sum + x < x)
z++;
return z;
}
Compiles to: clang t.c -S -o - -O3 -fomit-frame-pointer -m64 -mkernel
_add32carry: ## @add32carry
addl %esi, %edi
cmpl %esi, %edi
sbbl %eax, %eax
andl $1, %eax
addl %edi, %eax
ret
with clang, but to:
_add32carry:
leal (%rsi,%rdi), %eax
cmpl %esi, %eax
adcl $0, %eax
ret
with gcc.
//===---------------------------------------------------------------------===//
Dead argument elimination should be enhanced to handle cases when an argument is
dead to an externally visible function. Though the argument can't be removed
from the externally visible function, the caller doesn't need to pass it in.
For example in this testcase:
void foo(int X) __attribute__((noinline));
void foo(int X) { sideeffect(); }
void bar(int A) { foo(A+1); }
We compile bar to:
define void @bar(i32 %A) nounwind ssp {
%0 = add nsw i32 %A, 1 ; <i32> [#uses=1]
tail call void @foo(i32 %0) nounwind noinline ssp
ret void
}
The add is dead, we could pass in 'i32 undef' instead. This occurs for C++
templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
linkage.
//===---------------------------------------------------------------------===//
With the recent changes to make the implicit def/use set explicit in
machineinstrs, we should change the target descriptions for 'call' instructions
so that the .td files don't list all the call-clobbered registers as implicit
defs. Instead, these should be added by the code generator (e.g. on the dag).
This has a number of uses:
1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
for their different impdef sets.
2. Targets with multiple calling convs (e.g. x86) which have different clobber
sets don't need copies of call instructions.
3. 'Interprocedural register allocation' can be done to reduce the clobber sets
of calls.
//===---------------------------------------------------------------------===//
We should recognized various "overflow detection" idioms and translate them into
llvm.uadd.with.overflow and similar intrinsics. Here is a multiply idiom:
unsigned int mul(unsigned int a,unsigned int b) {
if ((unsigned long long)a*b>0xffffffff)
exit(0);
return a*b;
}
//===---------------------------------------------------------------------===//
Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
safe in general, even on darwin. See the libm implementation of hypot for
examples (which special case when x/y are exactly zero to get signed zeros etc
right).
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
//===---------------------------------------------------------------------===//
Solve this DAG isel folding deficiency:
int X, Y;
void fn1(void)
{
X = X | (Y << 3);
}
compiles to
fn1:
movl Y, %eax
shll $3, %eax
orl X, %eax
movl %eax, X
ret
The problem is the store's chain operand is not the load X but rather
a TokenFactor of the load X and load Y, which prevents the folding.
There are two ways to fix this:
1. The dag combiner can start using alias analysis to realize that y/x
don't alias, making the store to X not dependent on the load from Y.
2. The generated isel could be made smarter in the case it can't
disambiguate the pointers.
Number 1 is the preferred solution.
This has been "fixed" by a TableGen hack. But that is a short term workaround
which will be removed once the proper fix is made.
//===---------------------------------------------------------------------===//
On targets with expensive 64-bit multiply, we could LSR this:
for (i = ...; ++i) {
x = 1ULL << i;
into:
long long tmp = 1;
for (i = ...; ++i, tmp+=tmp)
x = tmp;
This would be a win on ppc32, but not x86 or ppc64.
//===---------------------------------------------------------------------===//
Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
//===---------------------------------------------------------------------===//
Reassociate should turn things like:
int factorial(int X) {
return X*X*X*X*X*X*X*X;
}
into llvm.powi calls, allowing the code generator to produce balanced
multiplication trees.
First, the intrinsic needs to be extended to support integers, and second the
code generator needs to be enhanced to lower these to multiplication trees.
//===---------------------------------------------------------------------===//
Interesting? testcase for add/shift/mul reassoc:
int bar(int x, int y) {
return x*x*x+y+x*x*x*x*x*y*y*y*y;
}
int foo(int z, int n) {
return bar(z, n) + bar(2*z, 2*n);
}
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
This is blocked on not handling X*X*X -> powi(X, 3) (see note above). The issue
is that we end up getting t = 2*X s = t*t and don't turn this into 4*X*X,
which is the same number of multiplies and is canonical, because the 2*X has
multiple uses. Here's a simple example:
define i32 @test15(i32 %X1) {
%B = mul i32 %X1, 47 ; X1*47
%C = mul i32 %B, %B
ret i32 %C
}
//===---------------------------------------------------------------------===//
Reassociate should handle the example in GCC PR16157:
extern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4;
void f () { /* this can be optimized to four additions... */
b4 = a4 + a3 + a2 + a1 + a0;
b3 = a3 + a2 + a1 + a0;
b2 = a2 + a1 + a0;
b1 = a1 + a0;
}
This requires reassociating to forms of expressions that are already available,
something that reassoc doesn't think about yet.
//===---------------------------------------------------------------------===//
Loading
Loading full blame...