- Nov 20, 2011
-
-
Chandler Carruth authored
properly account for the *global* probability of the edge being taken. This manifested as a very large number of unconditional branches to blocks being merged against the CFG even though they weren't particularly hot within the CFG. The fix is to check whether the edge being merged is both locally hot relative to other successors for the source block, and globally hot compared to other (unmerged) predecessors of the destination block. This introduces a new crasher on GCC single-source, but it's currently behind a flag, and Ben has offered to work on the reduction. =] llvm-svn: 145010
-
- Nov 19, 2011
-
-
Chandler Carruth authored
formation phase and into the initial walk of the basic blocks. We essentially pre-merge all blocks where unanalyzable fallthrough exists, as we won't be able to update the terminators effectively after any reorderings. This is quite a bit more principled as there may be CFGs where the second half of the unanalyzable pair has some analyzable predecessor that gets placed first. Then it may get placed next, implicitly breaking the unanalyzable branch even though we never even looked at the part that isn't analyzable. I've included a test case that triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize some more general ones as I dig into related issues. Also, to make this new scheme work we have to be able to handle branches into the middle of a chain, so add this check. We always fallback on the incoming ordering. Finally, this starts to really underscore a known limitation of the current implementation -- we don't consider broken predecessors when merging successors. This can caused major missed opportunities, and is something I'm planning on looking at next (modulo more bug reports). llvm-svn: 144994
-
- Nov 18, 2011
-
-
Devang Patel authored
DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange. llvm-svn: 144937
-
- Nov 17, 2011
-
-
Chad Rosier authored
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs being: (1) If we can't materialize the large constant then we'll cause fast-isel to bail. (2) Too large of an offset can't be directly encoded in the ADD resulting in a MOV+ADD. Generally not a bad thing because otherwise we would have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix for that. (3) Conversely, too low of a threshold we'll miss opportunities to coalesce ADDs. rdar://10412592 llvm-svn: 144886
-
Eli Friedman authored
Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393. llvm-svn: 144863
-
- Nov 16, 2011
-
-
Chad Rosier authored
target-independent selector or the target-specific selector. llvm-svn: 144833
-
Chad Rosier authored
for a single miss and not all predecessor instructions that get selected by the selection DAG instruction selector. This is still not exact (e.g., over states misses when folded/dead instructions are present), but it is a step in the right direction. llvm-svn: 144832
-
-
Evan Cheng authored
llvm-svn: 144804
-
Evan Cheng authored
and code model. This eliminates the need to pass OptLevel flag all over the place and makes it possible for any codegen pass to use this information. llvm-svn: 144788
-
Bob Wilson authored
There may be many invokes that share one landing pad, and the previous code would record the landing pad once for each invoke. Besides the wasted effort, a pair of volatile loads gets inserted every time the landing pad is processed. The rest of the code can get optimized away when a landing pad is processed repeatedly, but the volatile loads remain, resulting in code like: LBB35_18: Ltmp483: ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r2, [r7, #-72] ldr r2, [r7, #-68] ldr r4, [r7, #-72] ldr r2, [r7, #-68] llvm-svn: 144787
-
rdar://problem/10444602Bob Wilson authored
This same basic code was in the older version of the SjLj exception handling, but it was removed in the recent revisions to that code. It needs to be there. llvm-svn: 144782
-
Evan Cheng authored
llvm-svn: 144776
-
Evan Cheng authored
If the 2addr instruction has other kills, don't move it below any other uses since we don't want to extend other live ranges. llvm-svn: 144772
-
Evan Cheng authored
RescheduleKillAboveMI() must backtrack to before the rescheduled DBG_VALUE instructions. rdar://10451185 llvm-svn: 144771
-
-
Eli Friedman authored
llvm-svn: 144768
-
Eli Friedman authored
Add a couple asserts so it will be easier to debug if we accidentally pass indexed loads/stores to the legalizer. llvm-svn: 144767
-
Owen Anderson authored
llvm-svn: 144747
-
Eric Christopher authored
failure during bootstrap with it turned on. llvm-svn: 144731
-
Chad Rosier authored
%arrayidx135 = getelementptr inbounds [4 x [4 x [4 x [4 x i32]]]]* %M0, i32 0, i64 0 %arrayidx136 = getelementptr inbounds [4 x [4 x [4 x i32]]]* %arrayidx135, i32 0, i64 %idxprom134 Prior to this commit, the GEP instruction that defines %arrayidx136 thought that %arrayidx135 was a trivial kill. The GEP that defines %arrayidx135 doesn't generate any code and thus %M0 gets folded into the second GEP. Thus, we need to look through GEPs with all zero indices. rdar://10443319 llvm-svn: 144730
-
- Nov 15, 2011
-
-
Pete Cooper authored
by later instructions. Only done for DEC64m right now. Fixes <rdar://problem/6172640> llvm-svn: 144705
-
Devang Patel authored
llvm-svn: 144696
-
Rafael Espindola authored
has a reference to it. Unfortunately, that doesn't work for codegen passes since we don't get notified of MBB's being deleted (the original BB stays). Use that fact to our advantage and after printing a function, check if any of the IL BBs corresponds to a symbol that was not printed. This fixes pr11202. llvm-svn: 144674
-
Benjamin Kramer authored
llvm-svn: 144648
-
Benjamin Kramer authored
llvm-svn: 144647
-
Jakob Stoklund Olesen authored
A function using any RC alias is enough to enable the ExeDepsFix pass. llvm-svn: 144636
-
Jay Foad authored
llvm-svn: 144635
-
Jay Foad authored
llvm-svn: 144634
-
Evan Cheng authored
Set SeenStore to true to prevent loads from being moved; also eliminates a non-deterministic behavior. llvm-svn: 144628
-
Chandler Carruth authored
block sequence when recovering from unanalyzable control flow constructs, *always* use the function sequence. I'm not sure why I ever went down the path of trying to use the loop sequence, it is fundamentally not the correct sequence to use. We're trying to preserve the incoming layout in the cases of unreasonable control flow, and that is only encoded at the function level. We already have a filter to select *exactly* the sub-set of blocks within the function that we're trying to form into a chain. The resulting code layout is also significantly better because of this. In several places we were ending up with completely unreasonable control flow constructs due to the ordering chosen by the loop structure for its internal storage. This change removes a completely wasteful vector of basic blocks, saving memory allocation in the common case even though it costs us CPU in the fairly rare case of unnatural loops. Finally, it fixes the latest crasher reduced out of GCC's single source. Thanks again to Benjamin Kramer for the reduction, my bugpoint skills failed at it. llvm-svn: 144627
-
Jakob Stoklund Olesen authored
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix about instructions with partial register updates causing false unwanted dependencies. The ExecutionDepsFix pass will break the false dependencies if the updated register was written in the previoius N instructions. The small loop added to sse-domains.ll runs twice as fast with dependency-breaking instructions inserted. llvm-svn: 144602
-
Jakob Stoklund Olesen authored
Keep track of the last instruction to define each register individually instead of per DomainValue. This lets us track more accurately when a register was last written. Also track register ages across basic blocks. When entering a new basic block, use the least stale predecessor def as a worst case estimate for register age. The register age is used to arbitrate between conflicting domains. The most recently defined register wins. llvm-svn: 144601
-
- Nov 14, 2011
-
-
Evan Cheng authored
llvm-svn: 144569
-
Evan Cheng authored
"kill". This looks like a bug upstream. Since that's going to take some time to understand, loosen the assertion and disable the optimization when multiple kills are seen. llvm-svn: 144568
-
Evan Cheng authored
instructions of the two-address operands) in order to avoid inserting copies. This fixes the few regressions introduced when the two-address hack was disabled (without regressing the improvements). rdar://10422688 llvm-svn: 144559
-
Jakob Stoklund Olesen authored
I broke this in r144515, it affected most ARM testers. <rdar://problem/10441389> llvm-svn: 144547
-
Chandler Carruth authored
cleans up all the chains allocated during the processing of each function so that for very large inputs we don't just grow memory usage without bound. llvm-svn: 144533
-
Chandler Carruth authored
tests when I forcibly enabled block placement. It is apparantly possible for an unanalyzable block to fallthrough to a non-loop block. I don't actually beleive this is correct, I believe that 'canFallThrough' is returning true needlessly for the code construct, and I've left a bit of a FIXME on the verification code to try to track down why this is coming up. Anyways, removing the assert doesn't degrade the correctness of the algorithm. llvm-svn: 144532
-
Chandler Carruth authored
this pass. We're leaving already merged blocks on the worklist, and scanning them again and again only to determine each time through that indeed they aren't viable. We can instead remove them once we're going to have to scan the worklist. This is the easy way to implement removing them. If this remains on the profile (as I somewhat suspect it will), we can get a lot more clever here, as the worklist's order is essentially irrelevant. We can use swapping and fold the two loops to reduce overhead even when there are many blocks on the worklist but only a few of them are removed. llvm-svn: 144531
-