Newer
Older
//===---------------------------------------------------------------------===//
Dead argument elimination should be enhanced to handle cases when an argument is
dead to an externally visible function. Though the argument can't be removed
from the externally visible function, the caller doesn't need to pass it in.
For example in this testcase:
void foo(int X) __attribute__((noinline));
void foo(int X) { sideeffect(); }
void bar(int A) { foo(A+1); }
We compile bar to:
define void @bar(i32 %A) nounwind ssp {
%0 = add nsw i32 %A, 1 ; <i32> [#uses=1]
tail call void @foo(i32 %0) nounwind noinline ssp
ret void
}
The add is dead, we could pass in 'i32 undef' instead. This occurs for C++
templates etc, which usually have linkonce_odr/weak_odr linkage, not internal
linkage.
//===---------------------------------------------------------------------===//
With the recent changes to make the implicit def/use set explicit in
machineinstrs, we should change the target descriptions for 'call' instructions
so that the .td files don't list all the call-clobbered registers as implicit
defs. Instead, these should be added by the code generator (e.g. on the dag).
This has a number of uses:
1. PPC32/64 and X86 32/64 can avoid having multiple copies of call instructions
for their different impdef sets.
2. Targets with multiple calling convs (e.g. x86) which have different clobber
sets don't need copies of call instructions.
3. 'Interprocedural register allocation' can be done to reduce the clobber sets
of calls.
//===---------------------------------------------------------------------===//
Make the PPC branch selector target independant
//===---------------------------------------------------------------------===//
Get the C front-end to expand hypot(x,y) -> llvm.sqrt(x*x+y*y) when errno and
precision don't matter (ffastmath). Misc/mandel will like this. :) This isn't
safe in general, even on darwin. See the libm implementation of hypot for
examples (which special case when x/y are exactly zero to get signed zeros etc
right).
//===---------------------------------------------------------------------===//
Solve this DAG isel folding deficiency:
int X, Y;
void fn1(void)
{
X = X | (Y << 3);
}
compiles to
fn1:
movl Y, %eax
shll $3, %eax
orl X, %eax
movl %eax, X
ret
The problem is the store's chain operand is not the load X but rather
a TokenFactor of the load X and load Y, which prevents the folding.
There are two ways to fix this:
1. The dag combiner can start using alias analysis to realize that y/x
don't alias, making the store to X not dependent on the load from Y.
2. The generated isel could be made smarter in the case it can't
disambiguate the pointers.
Number 1 is the preferred solution.
This has been "fixed" by a TableGen hack. But that is a short term workaround
which will be removed once the proper fix is made.
//===---------------------------------------------------------------------===//
On targets with expensive 64-bit multiply, we could LSR this:
for (i = ...; ++i) {
x = 1ULL << i;
into:
long long tmp = 1;
for (i = ...; ++i, tmp+=tmp)
x = tmp;
This would be a win on ppc32, but not x86 or ppc64.
//===---------------------------------------------------------------------===//
Shrink: (setlt (loadi32 P), 0) -> (setlt (loadi8 Phi), 0)
//===---------------------------------------------------------------------===//
Reassociate should turn things like:
int factorial(int X) {
return X*X*X*X*X*X*X*X;
}
into llvm.powi calls, allowing the code generator to produce balanced
multiplication trees.
First, the intrinsic needs to be extended to support integers, and second the
code generator needs to be enhanced to lower these to multiplication trees.
//===---------------------------------------------------------------------===//
Interesting? testcase for add/shift/mul reassoc:
int bar(int x, int y) {
return x*x*x+y+x*x*x*x*x*y*y*y*y;
}
int foo(int z, int n) {
return bar(z, n) + bar(2*z, 2*n);
}
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
This is blocked on not handling X*X*X -> powi(X, 3) (see note above). The issue
is that we end up getting t = 2*X s = t*t and don't turn this into 4*X*X,
which is the same number of multiplies and is canonical, because the 2*X has
multiple uses. Here's a simple example:
define i32 @test15(i32 %X1) {
%B = mul i32 %X1, 47 ; X1*47
%C = mul i32 %B, %B
ret i32 %C
}
//===---------------------------------------------------------------------===//
Reassociate should handle the example in GCC PR16157:
extern int a0, a1, a2, a3, a4; extern int b0, b1, b2, b3, b4;
void f () { /* this can be optimized to four additions... */
b4 = a4 + a3 + a2 + a1 + a0;
b3 = a3 + a2 + a1 + a0;
b2 = a2 + a1 + a0;
b1 = a1 + a0;
}
This requires reassociating to forms of expressions that are already available,
something that reassoc doesn't think about yet.
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
//===---------------------------------------------------------------------===//
This function: (derived from GCC PR19988)
double foo(double x, double y) {
return ((x + 0.1234 * y) * (x + -0.1234 * y));
}
compiles to:
_foo:
movapd %xmm1, %xmm2
mulsd LCPI1_1(%rip), %xmm1
mulsd LCPI1_0(%rip), %xmm2
addsd %xmm0, %xmm1
addsd %xmm0, %xmm2
movapd %xmm1, %xmm0
mulsd %xmm2, %xmm0
ret
Instcombine should be able to turn it into:
double foo(double x, double y) {
return ((x + 0.1234 * y) * (x - 0.1234 * y));
}
Which allows the multiply by constant to be CSE'd, producing:
_foo:
mulsd LCPI1_0(%rip), %xmm1
movapd %xmm1, %xmm2
addsd %xmm0, %xmm2
subsd %xmm1, %xmm0
mulsd %xmm2, %xmm0
ret
This doesn't need -ffast-math support at all. This is particularly bad because
the llvm-gcc frontend is canonicalizing the later into the former, but clang
doesn't have this problem.
//===---------------------------------------------------------------------===//
These two functions should generate the same code on big-endian systems:
Loading
Loading full blame...