- Mar 27, 2012
-
-
Eric Christopher authored
Fixes PR10105 llvm-svn: 153524
-
Chad Rosier authored
undefined behavior, which Rafael was kind enough to fix. Original commit message for r153423: Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loding a boolean value. llvm-svn: 153521
-
Jakob Stoklund Olesen authored
This pass tries to update kill flags, but there are still many bugs. Passes after the load/store optimizer don't need accurate liveness, so don't even try. <rdar://problem/11101911> llvm-svn: 153519
-
Jakob Stoklund Olesen authored
llvm-svn: 153518
-
Jakob Stoklund Olesen authored
Branch folding can use a register scavenger to update liveness information when required. Don't do that if liveness information is already invalid. llvm-svn: 153517
-
Jakob Stoklund Olesen authored
llvm-svn: 153516
-
Chris Lattner authored
llvm-svn: 153513
-
Jakob Stoklund Olesen authored
Late optimization passes like branch folding and tail duplication can transform the machine code in a way that makes it expensive to keep the register liveness information up to date. There is a fuzzy line between register allocation and late scheduling where the liveness information degrades. The MRI::tracksLiveness() flag makes the line clear: While true, liveness information is accurate, and can be used for register scavenging. Once the flag is false, liveness information is not accurate, and can only be used as a hint. Late passes generally don't need the liveness information, but they will sometimes use the register scavenger to help update it. The scavenger enforces strict correctness, and we have to spend a lot of code to update register liveness that may never be used. llvm-svn: 153511
-
Chandler Carruth authored
size bloat. Unfortunately, I expect this to disable the majority of the benefit from r152737. I'm hopeful at least that it will fix PR12345. To explain this requires... quite a bit of backstory I'm afraid. TL;DR: The change in r152737 actually did The Wrong Thing for linkonce-odr functions. This change makes it do the right thing. The benefits we saw were simple luck, not any actual strategy. Benchmark numbers after a mini-blog-post so that I've written down my thoughts on why all of this works and doesn't work... To understand what's going on here, you have to understand how the "bottom-up" inliner actually works. There are two fundamental modes to the inliner: 1) Standard fixed-cost bottom-up inlining. This is the mode we usually think about. It walks from the bottom of the CFG up to the top, looking at callsites, taking information about the callsite and the called function and computing th expected cost of inlining into that callsite. If the cost is under a fixed threshold, it inlines. It's a touch more complicated than that due to all the bonuses, weights, etc. Inlining the last callsite to an internal function gets higher weighth, etc. But essentially, this is the mode of operation. 2) Deferred bottom-up inlining (a term I just made up). This is the interesting mode for this patch an r152737. Initially, this works just like mode #1, but once we have the cost of inlining into the callsite, we don't just compare it with a fixed threshold. First, we check something else. Let's give some names to the entities at this point, or we'll end up hopelessly confused. We're considering inlining a function 'A' into its callsite within a function 'B'. We want to check whether 'B' has any callers, and whether it might be inlined into those callers. If so, we also check whether inlining 'A' into 'B' would block any of the opportunities for inlining 'B' into its callers. We take the sum of the costs of inlining 'B' into its callers where that inlining would be blocked by inlining 'A' into 'B', and if that cost is less than the cost of inlining 'A' into 'B', then we skip inlining 'A' into 'B'. Now, in order for #2 to make sense, we have to have some confidence that we will actually have the opportunity to inline 'B' into its callers when cheaper, *and* that we'll be able to revisit the decision and inline 'A' into 'B' if that ever becomes the correct tradeoff. This often isn't true for external functions -- we can see very few of their callers, and we won't be able to re-consider inlining 'A' into 'B' if 'B' is external when we finally see more callers of 'B'. There are two cases where we believe this to be true for C/C++ code: functions local to a translation unit, and functions with an inline definition in every translation unit which uses them. These are represented as internal linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for linkonce-odr in r152737. Unfortunately, when I did that, I also introduced a subtle bug. There was an implicit assumption that the last caller of the function within the TU was the last caller of the function in the program. We want to bonus the last caller of the function in the program by a huge amount for inlining because inlining that callsite has very little cost. Unfortunately, the last caller in the TU of a linkonce-odr function is *not* the last caller in the program, and so we don't want to apply this bonus. If we do, we can apply it to one callsite *per-TU*. Because of the way deferred inlining works, when it sees this bonus applied to one callsite in the TU for 'B', it decides that inlining 'B' is of the *utmost* importance just so we can get that final bonus. It then proceeds to essentially force deferred inlining regardless of the actual cost tradeoff. The result? PR12345: code bloat, code bloat, code bloat. Another result is getting *damn* lucky on a few benchmarks, and the over-inlining exposing critically important optimizations. I would very much like a list of benchmarks that regress after this change goes in, with bitcode before and after. This will help me greatly understand what opportunities the current cost analysis is missing. Initial benchmark numbers look very good. WebKit files that exhibited the worst of PR12345 went from growing to shrinking compared to Clang with r152737 reverted. - Bootstrapped Clang is 3% smaller with this change. - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4% faster with this change. Please let me know about any other performance impact you see. Thanks to Nico for reporting and urging me to actually fix, Richard Smith, Duncan Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues today. llvm-svn: 153506
-
Craig Topper authored
llvm-svn: 153502
-
Craig Topper authored
llvm-svn: 153500
-
Akira Hatanaka authored
MachinePointerInfo when getStore is called to create a node that stores an argument passed in register to the stack. Without this change, the post RA scheduler will fail to discover the dependencies between the stores instructions and the instructions that load from a structure passed by value. The link to the related discussion is here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048055.html llvm-svn: 153499
-
Akira Hatanaka authored
llvm-svn: 153498
-
Akira Hatanaka authored
llvm-svn: 153497
-
Akira Hatanaka authored
set it in MipsMCCodeEmitter::getMachineOpValue. Assert in getMachineOpValue if MachineOperand MO is of an unexpected type. llvm-svn: 153494
-
Akira Hatanaka authored
offset applied to it. llvm-svn: 153493
-
Evan Cheng authored
register that's read by the preheader terminator. rdar://11095580 llvm-svn: 153492
-
Akira Hatanaka authored
cleared. No functionality change. llvm-svn: 153491
-
Lang Hames authored
copies being considered for removal. Make sure to track all of the copies, rather than just the most recent encountered, by holding a DenseSet instead of an unsigned in SrcMap. No test case - couldn't reduce something with a sane size. llvm-svn: 153487
-
Akira Hatanaka authored
llvm-svn: 153486
-
Evan Cheng authored
produces a 32-bit immediate which is consumed by the use. It tries to fold the immediate by breaking it into two parts and fold them into the immmediate fields of two uses. e.g movw r2, #40885 movt r3, #46540 add r0, r0, r3 => add.w r0, r0, #3019898880 add.w r0, r0, #30146560 ; However, this transformation is incorrect if the user produces a flag. e.g. movw r2, #40885 movt r3, #46540 adds r0, r0, r3 => add.w r0, r0, #3019898880 adds.w r0, r0, #30146560 Note the adds.w may not set the carry flag even if the original sequence would. rdar://11116189 llvm-svn: 153484
-
Lang Hames authored
llvm-svn: 153483
-
Andrew Trick authored
Fixes PR11882: NULL dereference in ComputeLoadConstantCompareExitLimit. llvm-svn: 153480
-
- Mar 26, 2012
-
-
Eric Christopher authored
backtrace locations. Testcase forthcoming, but I wanted to get some testing here. Should fix: PR12323 PR12314 rdar://11091100 llvm-svn: 153471
-
Nadav Rotem authored
153465 was incorrect. In this code we wanted to check that the pointer operand is of pointer type (and not vector type). llvm-svn: 153468
-
Sean Callanan authored
relocations. The algorithm is the same as that for x86_64. Scattered relocations, a feature present in i386 but not on x86_64, are not yet supported. llvm-svn: 153466
-
Nadav Rotem authored
llvm-svn: 153465
-
Andrew Trick authored
Fixes PR11950. llvm-svn: 153463
-
Andrew Trick authored
llvm-svn: 153462
-
Chris Lattner authored
llvm-svn: 153458
-
Eric Christopher authored
llvm-svn: 153456
-
Eric Christopher authored
llvm-svn: 153455
-
Chad Rosier authored
Original commit message: Use the new range metadata in computeMaskedBits and add a new optimization to instruction simplify that lets us remove an and when loading a boolean value. llvm-svn: 153452
-
Andrew Trick authored
Thanks Andrey. llvm-svn: 153451
-
Kostya Serebryany authored
[tsan] treat vtable pointer updates in a special way (requires tbaa); fix a bug (forgot to return true after instrumenting); make sure the tsan tests are run llvm-svn: 153448
-
Benjamin Kramer authored
llvm-svn: 153438
-
Douglas Gregor authored
llvm-svn: 153436
-
Anton Korobeynikov authored
Patch by Sylvestre Ledru! llvm-svn: 153435
-
Craig Topper authored
llvm-svn: 153429
-
Eric Christopher authored
llvm-svn: 153428
-