- Jan 24, 2011
-
-
Chris Lattner authored
occurs because instcombine sinks loads and inserts phis. This kicks in on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in spec2006 in some app that uses std::deque. This resolves the last of rdar://7339113. llvm-svn: 124090
-
- Jan 23, 2011
-
-
Chris Lattner authored
common cases. This triggers a surprising number of times in SPEC2K6 because min/max idioms end up doing this. For example, code from the STL ends up looking like this to SRoA: %202 = load i64* %__old_size, align 8, !tbaa !3 %203 = load i64* %__old_size, align 8, !tbaa !3 %204 = load i64* %__n, align 8, !tbaa !3 %205 = icmp ult i64 %203, %204 %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size %206 = load i64* %storemerge.i, align 8, !tbaa !3 We can now promote both the __n and the __old_size allocas. This addresses another chunk of rdar://7339113, poor codegen on stringswitch. llvm-svn: 124088
-
Chris Lattner authored
that have PHI or select uses of their element pointers. This can often happen when instcombine sinks two loads into a successor, inserting a phi or select. With this patch, we can scalarize the alloca, but the pinned elements are not yet promoted. This is still a win for large aggregates where only one element is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen on stringswitch). llvm-svn: 124070
-
Chris Lattner authored
No functionality change. llvm-svn: 124067
-
Chris Lattner authored
handle the "Transformation preventing inst" printing, so that -scalarrepl -debug will always print the rejected instruction. No functionality change. llvm-svn: 124066
-
Chris Lattner authored
X86 backend has been fixed. llvm-svn: 124064
-
- Jan 18, 2011
-
-
Cameron Zwarich authored
llvm-svn: 123724
-
- Jan 17, 2011
-
-
Cameron Zwarich authored
checks enabled: 1) Use '<' to compare integers in a comparison function rather than '<='. 2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize the priority queue. The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at just under 16% rather than 17%. llvm-svn: 123662
-
Cameron Zwarich authored
llvm-svn: 123618
-
Cameron Zwarich authored
eliminating a potentially quadratic data structure, this also gives a 17% speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial experiment gave a greater speedup around 25%, but I moved the dominator tree level computation from dominator tree construction to PromoteMemToReg. Since this approach to computing IDFs has a much lower overhead than the old code using precomputed DFs, it is worth looking at using this new code for the second scalarrepl pass as well. llvm-svn: 123609
-
- Jan 16, 2011
-
-
Chris Lattner authored
llvm-svn: 123590
-
Chris Lattner authored
then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571
-
Chris Lattner authored
of a constant. llvm-svn: 123570
-
Chris Lattner authored
multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568
-
- Jan 15, 2011
-
-
Chris Lattner authored
to use it. llvm-svn: 123501
-
- Jan 14, 2011
-
-
Chris Lattner authored
llvm-svn: 123457
-
Chris Lattner authored
and one that uses SSAUpdater (-scalarrepl-ssa) llvm-svn: 123436
-
Chris Lattner authored
instead of DomTree/DomFrontier. This may be interesting for reducing compile time. This is currently disabled, but seems to work just fine. When this is enabled, we eliminate two runs of dominator frontier, one in the "early per-function" optimizations and one in the "interlaced with inliner" function passes. llvm-svn: 123434
-
- Jan 13, 2011
-
-
Bob Wilson authored
llvm-svn: 123396
-
Bob Wilson authored
llvm-svn: 123383
-
Bob Wilson authored
This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381
-
Bob Wilson authored
SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380
-
- Jan 02, 2011
-
-
Chris Lattner authored
so that Dominators.h is *just* domtree. Also prune #includes a bit. llvm-svn: 122714
-
- Dec 26, 2010
-
-
Chris Lattner authored
llvm-svn: 122572
-
- Dec 23, 2010
-
-
Mon P Wang authored
llvm-svn: 122462
-
- Dec 15, 2010
-
-
Dan Gohman authored
function so that it can live in Analysis instead of VMCore. llvm-svn: 121885
-
- Nov 24, 2010
-
-
Nick Lewycky authored
whether the pointer can be replaced with the global variable it is a copy of. Fixes PR8680. llvm-svn: 120126
-
- Nov 20, 2010
-
-
Benjamin Kramer authored
llvm-svn: 119908
-
- Nov 18, 2010
-
-
Chris Lattner authored
llvm-svn: 119690
-
Chris Lattner authored
if it is passed as a byval argument. The byval argument will just be a read, so it is safe to read from the original global instead. This allows us to promote away the %agg.tmp alloca in PR8582 llvm-svn: 119686
-
Chris Lattner authored
to ignore calls that obviously can't modify the alloca because they are readonly/readnone. llvm-svn: 119683
-
Chris Lattner authored
optimization. If the alloca that is "memcpy'd from constant" also has a memcpy from *it*, ignore it: it is a load. We now optimize the testcase to: define void @test2() { %B = alloca %T %a = bitcast %T* @G to i8* %b = bitcast %T* %B to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %b, i8* %a, i64 124, i32 4, i1 false) call void @bar(i8* %b) ret void } previously we would generate: define void @test() { %B = alloca %T %b = bitcast %T* %B to i8* %G.0 = getelementptr inbounds %T* @G, i32 0, i32 0 %tmp3 = load i8* %G.0, align 4 %G.1 = getelementptr inbounds %T* @G, i32 0, i32 1 %G.15 = bitcast [123 x i8]* %G.1 to i8* %1 = bitcast [123 x i8]* %G.1 to i984* %srcval = load i984* %1, align 1 %B.0 = getelementptr inbounds %T* %B, i32 0, i32 0 store i8 %tmp3, i8* %B.0, align 4 %B.1 = getelementptr inbounds %T* %B, i32 0, i32 1 %B.12 = bitcast [123 x i8]* %B.1 to i8* %2 = bitcast [123 x i8]* %B.1 to i984* store i984 %srcval, i984* %2, align 1 call void @bar(i8* %b) ret void } llvm-svn: 119682
-
- Oct 19, 2010
-
-
Owen Anderson authored
Get rid of static constructors for pass registration. Instead, every pass exposes an initializeMyPassFunction(), which must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize the pass's dependencies. Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h before parsing commandline arguments. I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass registration/creation, please send the testcase to me directly. llvm-svn: 116820
-
- Oct 16, 2010
-
-
Benjamin Kramer authored
llvm-svn: 116670
-
- Oct 12, 2010
-
-
Owen Anderson authored
perform initialization without static constructors AND without explicit initialization by the client. For the moment, passes are required to initialize both their (potential) dependencies and any passes they preserve. I hope to be able to relax the latter requirement in the future. llvm-svn: 116334
-
- Oct 08, 2010
-
-
Owen Anderson authored
llvm-svn: 115996
-
- Oct 01, 2010
-
-
Dale Johannesen authored
The x86_mmx type is used for MMX intrinsics, parameters and return values where these use MMX registers, and is also supported in load, store, and bitcast. Only the above operations generate MMX instructions, and optimizations do not operate on or produce MMX intrinsics. MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into smaller pieces. Optimizations may occur on these forms and the result casted back to x86_mmx, provided the result feeds into a previous existing x86_mmx operation. The point of all this is prevent optimizations from introducing MMX operations, which is unsafe due to the EMMS problem. llvm-svn: 115243
-
- Sep 02, 2010
-
-
Chris Lattner authored
llvm-svn: 112763
-
- Sep 01, 2010
-
-
Chris Lattner authored
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>, which then cause random problems because the X86 backend is producing mmx stuff without inserting proper emms calls. In the short term, force off MMX datatypes. In the long term, the X86 backend should not select generic vector types to MMX registers. This is being worked on, but won't be done in time for 2.8. rdar://8380055 llvm-svn: 112696
-
- Aug 18, 2010
-
-
Chris Lattner authored
llvm-svn: 111342
-