- Jan 16, 2011
-
-
Chris Lattner authored
llvm-svn: 123573
-
Chris Lattner authored
then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571
-
Chris Lattner authored
of a constant. llvm-svn: 123570
-
Chris Lattner authored
multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568
-
- Jan 15, 2011
-
-
Chris Lattner authored
realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527
-
rdar://8785296Chris Lattner authored
The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526
-
Chris Lattner authored
llvm-svn: 123525
-
Chris Lattner authored
have objectsize folding recursively simplify away their result when it folds. It is important to catch this here, because otherwise we won't eliminate the cross-block values at isel and other times. llvm-svn: 123524
-
Chris Lattner authored
potentially invalidate it (like inline asm lowering) to be sunk into their proper place, cleaning up a ton of code. llvm-svn: 123523
-
Chris Lattner authored
to use it. llvm-svn: 123501
-
- Jan 14, 2011
-
-
Chris Lattner authored
llvm-svn: 123457
-
Chris Lattner authored
and one that uses SSAUpdater (-scalarrepl-ssa) llvm-svn: 123436
-
Chris Lattner authored
instead of DomTree/DomFrontier. This may be interesting for reducing compile time. This is currently disabled, but seems to work just fine. When this is enabled, we eliminate two runs of dominator frontier, one in the "early per-function" optimizations and one in the "interlaced with inliner" function passes. llvm-svn: 123434
-
- Jan 13, 2011
-
-
Bob Wilson authored
llvm-svn: 123396
-
Bob Wilson authored
llvm-svn: 123383
-
Bob Wilson authored
This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381
-
Bob Wilson authored
SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380
-
- Jan 12, 2011
-
-
Devang Patel authored
llvm-svn: 123318
-
Chris Lattner authored
llvm-svn: 123302
-
Chris Lattner authored
of the bootstrap miscompare issue. llvm-svn: 123299
-
Chris Lattner authored
the source of the bootstrap problem. llvm-svn: 123298
-
- Jan 11, 2011
-
-
Jakob Stoklund Olesen authored
llvm-svn: 123288
-
Cameron Zwarich authored
once at the beginning of GVN instead of once per iteration. llvm-svn: 123278
-
Cameron Zwarich authored
llvm-svn: 123270
-
Chris Lattner authored
actually reached in the testcase in PR8954, but it's safe and good practice. llvm-svn: 123224
-
Chris Lattner authored
phi nodes. It is called from MergeBlockIntoPredecessor which is called from GVN, which claims to preserve these. I'm skeptical that this is the actual problem behind PR8954, but this is a stab in the right direction. llvm-svn: 123222
-
Chris Lattner authored
llvm-svn: 123221
-
Chris Lattner authored
neccesarily an uncond branch to the header. This fixes PR8955 (the assertion tripping). llvm-svn: 123219
-
- Jan 10, 2011
-
-
Chris Lattner authored
llvm-svn: 123149
-
Chris Lattner authored
back to life. llvm-svn: 123146
-
Chris Lattner authored
llvm-svn: 123144
-
- Jan 09, 2011
-
-
Chris Lattner authored
without informing memdep. This could cause nondeterminstic weirdness based on where instructions happen to get allocated, and will hopefully breath some life into some broken testers. llvm-svn: 123124
-
Cameron Zwarich authored
llvm-svn: 123117
-
Chris Lattner authored
that have the bit set. llvm-svn: 123104
-
- Jan 08, 2011
-
-
Chris Lattner authored
updating memdep when fusing stores together. This fixes the crash optimizing the bullet benchmark. llvm-svn: 123091
-
Chris Lattner authored
llvm-svn: 123090
-
Chris Lattner authored
larger memsets. Among other things, this fixes rdar://8760394 and allows us to handle "Example 2" from http://blog.regehr.org/archives/320, compiling it into a single 4096-byte memset: _mad_synth_mute: ## @mad_synth_mute ## BB#0: ## %entry pushq %rax movl $4096, %esi ## imm = 0x1000 callq ___bzero popq %rax ret llvm-svn: 123089
-
Chris Lattner authored
P and P+1 are relative to the same base pointer. llvm-svn: 123087
-
Chris Lattner authored
memset into a single larger memset. llvm-svn: 123086
-
Chris Lattner authored
Split memset formation logic out into its own "tryMergingIntoMemset" helper function. llvm-svn: 123081
-