Skip to content
  1. Mar 27, 2012
    • Eric Christopher's avatar
      Use DW_AT_low_pc for a single entry point into a routine. · 7ed2efca
      Eric Christopher authored
      Fixes PR10105
      
      llvm-svn: 153524
      7ed2efca
    • Chad Rosier's avatar
      Reapply r153423; the original commit was fine. The failing test, distray, had · 8e6dbccd
      Chad Rosier authored
      undefined behavior, which Rafael was kind enough to fix.
      
      Original commit message for r153423:
      Use the new range metadata in computeMaskedBits and add a new optimization to
      instruction simplify that lets us remove an and when loding a boolean value.
      
      llvm-svn: 153521
      8e6dbccd
    • Jakob Stoklund Olesen's avatar
      ARMLoadStoreOptimizer invalidates register liveness. · 4acbcb31
      Jakob Stoklund Olesen authored
      This pass tries to update kill flags, but there are still many bugs.
      Passes after the load/store optimizer don't need accurate liveness, so
      don't even try.
      
      <rdar://problem/11101911>
      
      llvm-svn: 153519
      4acbcb31
    • Jakob Stoklund Olesen's avatar
      Print SSA and liveness tracking flags in MF::print(). · 6c08534a
      Jakob Stoklund Olesen authored
      llvm-svn: 153518
      6c08534a
    • Jakob Stoklund Olesen's avatar
      Branch folding may invalidate liveness. · d1664a15
      Jakob Stoklund Olesen authored
      Branch folding can use a register scavenger to update liveness
      information when required. Don't do that if liveness information is
      already invalid.
      
      llvm-svn: 153517
      d1664a15
    • Jakob Stoklund Olesen's avatar
      Invalidate liveness in Thumb2ITBlockPass. · 14459cdc
      Jakob Stoklund Olesen authored
      llvm-svn: 153516
      14459cdc
    • Chris Lattner's avatar
      1cc25e8a
    • Jakob Stoklund Olesen's avatar
      Add an MRI::tracksLiveness() flag. · 9c1ad5cb
      Jakob Stoklund Olesen authored
      Late optimization passes like branch folding and tail duplication can
      transform the machine code in a way that makes it expensive to keep the
      register liveness information up to date. There is a fuzzy line between
      register allocation and late scheduling where the liveness information
      degrades.
      
      The MRI::tracksLiveness() flag makes the line clear: While true,
      liveness information is accurate, and can be used for register
      scavenging. Once the flag is false, liveness information is not
      accurate, and can only be used as a hint.
      
      Late passes generally don't need the liveness information, but they will
      sometimes use the register scavenger to help update it. The scavenger
      enforces strict correctness, and we have to spend a lot of code to
      update register liveness that may never be used.
      
      llvm-svn: 153511
      9c1ad5cb
    • Chandler Carruth's avatar
      Make a seemingly tiny change to the inliner and fix the generated code · b9e35fbc
      Chandler Carruth authored
      size bloat. Unfortunately, I expect this to disable the majority of the
      benefit from r152737. I'm hopeful at least that it will fix PR12345. To
      explain this requires... quite a bit of backstory I'm afraid.
      
      TL;DR: The change in r152737 actually did The Wrong Thing for
      linkonce-odr functions. This change makes it do the right thing. The
      benefits we saw were simple luck, not any actual strategy. Benchmark
      numbers after a mini-blog-post so that I've written down my thoughts on
      why all of this works and doesn't work...
      
      To understand what's going on here, you have to understand how the
      "bottom-up" inliner actually works. There are two fundamental modes to
      the inliner:
      
      1) Standard fixed-cost bottom-up inlining. This is the mode we usually
         think about. It walks from the bottom of the CFG up to the top,
         looking at callsites, taking information about the callsite and the
         called function and computing th expected cost of inlining into that
         callsite. If the cost is under a fixed threshold, it inlines. It's
         a touch more complicated than that due to all the bonuses, weights,
         etc. Inlining the last callsite to an internal function gets higher
         weighth, etc. But essentially, this is the mode of operation.
      
      2) Deferred bottom-up inlining (a term I just made up). This is the
         interesting mode for this patch an r152737. Initially, this works
         just like mode #1, but once we have the cost of inlining into the
         callsite, we don't just compare it with a fixed threshold. First, we
         check something else. Let's give some names to the entities at this
         point, or we'll end up hopelessly confused. We're considering
         inlining a function 'A' into its callsite within a function 'B'. We
         want to check whether 'B' has any callers, and whether it might be
         inlined into those callers. If so, we also check whether inlining 'A'
         into 'B' would block any of the opportunities for inlining 'B' into
         its callers. We take the sum of the costs of inlining 'B' into its
         callers where that inlining would be blocked by inlining 'A' into
         'B', and if that cost is less than the cost of inlining 'A' into 'B',
         then we skip inlining 'A' into 'B'.
      
      Now, in order for #2 to make sense, we have to have some confidence that
      we will actually have the opportunity to inline 'B' into its callers
      when cheaper, *and* that we'll be able to revisit the decision and
      inline 'A' into 'B' if that ever becomes the correct tradeoff. This
      often isn't true for external functions -- we can see very few of their
      callers, and we won't be able to re-consider inlining 'A' into 'B' if
      'B' is external when we finally see more callers of 'B'. There are two
      cases where we believe this to be true for C/C++ code: functions local
      to a translation unit, and functions with an inline definition in every
      translation unit which uses them. These are represented as internal
      linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
      linkonce-odr in r152737.
      
      Unfortunately, when I did that, I also introduced a subtle bug. There
      was an implicit assumption that the last caller of the function within
      the TU was the last caller of the function in the program. We want to
      bonus the last caller of the function in the program by a huge amount
      for inlining because inlining that callsite has very little cost.
      Unfortunately, the last caller in the TU of a linkonce-odr function is
      *not* the last caller in the program, and so we don't want to apply this
      bonus. If we do, we can apply it to one callsite *per-TU*. Because of
      the way deferred inlining works, when it sees this bonus applied to one
      callsite in the TU for 'B', it decides that inlining 'B' is of the
      *utmost* importance just so we can get that final bonus. It then
      proceeds to essentially force deferred inlining regardless of the actual
      cost tradeoff.
      
      The result? PR12345: code bloat, code bloat, code bloat. Another result
      is getting *damn* lucky on a few benchmarks, and the over-inlining
      exposing critically important optimizations. I would very much like
      a list of benchmarks that regress after this change goes in, with
      bitcode before and after. This will help me greatly understand what
      opportunities the current cost analysis is missing.
      
      Initial benchmark numbers look very good. WebKit files that exhibited
      the worst of PR12345 went from growing to shrinking compared to Clang
      with r152737 reverted.
      
      - Bootstrapped Clang is 3% smaller with this change.
      - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
        faster with this change.
      
      Please let me know about any other performance impact you see. Thanks to
      Nico for reporting and urging me to actually fix, Richard Smith, Duncan
      Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
      today.
      
      llvm-svn: 153506
      b9e35fbc
    • Craig Topper's avatar
      Prune some includes · 1fcf5bca
      Craig Topper authored
      llvm-svn: 153502
      1fcf5bca
    • Craig Topper's avatar
      Remove unnecessary llvm:: qualifications · f6e7e12f
      Craig Topper authored
      llvm-svn: 153500
      f6e7e12f
    • Akira Hatanaka's avatar
      Pass the llvm IR pointer value and offset to the constructor of · 8a7633c7
      Akira Hatanaka authored
      MachinePointerInfo when getStore is called to create a node that stores an
      argument passed in register to the stack. Without this change, the post RA 
      scheduler will fail to discover the dependencies between the stores
      instructions and the instructions that load from a structure passed by value. 
      
      The link to the related discussion is here:
      http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048055.html
      
      llvm-svn: 153499
      8a7633c7
    • Akira Hatanaka's avatar
      Fix bug in LowerConstantPool. · 769f69f9
      Akira Hatanaka authored
      llvm-svn: 153498
      769f69f9
    • Akira Hatanaka's avatar
      2a36c9f4
    • Akira Hatanaka's avatar
      Retrieve and add the offset of a symbol in applyFixup rather than retrieve and · fe384a2c
      Akira Hatanaka authored
      set it in MipsMCCodeEmitter::getMachineOpValue. Assert in getMachineOpValue if
      MachineOperand MO is of an unexpected type. 
      
      llvm-svn: 153494
      fe384a2c
    • Akira Hatanaka's avatar
      Define function MipsGetSymAndOffset which returns a fixup's symbol and the · a06bc1c6
      Akira Hatanaka authored
      offset applied to it.
      
      llvm-svn: 153493
      a06bc1c6
    • Evan Cheng's avatar
      Post-ra LICM should take care not to hoist an instruction that would clobber a · 7fede873
      Evan Cheng authored
      register that's read by the preheader terminator.
      
      rdar://11095580
      
      llvm-svn: 153492
      7fede873
    • Akira Hatanaka's avatar
      Rewrite computation of Value in adjustFixupValue so that the upper 48-bits are · da728197
      Akira Hatanaka authored
      cleared. No functionality change.
      
      llvm-svn: 153491
      da728197
    • Lang Hames's avatar
      During MachineCopyPropagation a register may be the source operand of multiple · 551662bf
      Lang Hames authored
      copies being considered for removal. Make sure to track all of the copies,
      rather than just the most recent encountered, by holding a DenseSet instead of
      an unsigned in SrcMap.
      
      No test case - couldn't reduce something with a sane size.
      
      llvm-svn: 153487
      551662bf
    • Akira Hatanaka's avatar
      Reserve hardware registers. · ba5100c1
      Akira Hatanaka authored
      llvm-svn: 153486
      ba5100c1
    • Evan Cheng's avatar
      ARM has a peephole optimization which looks for a def / use pair. The def · a2b48d98
      Evan Cheng authored
      produces a 32-bit immediate which is consumed by the use. It tries to 
      fold the immediate by breaking it into two parts and fold them into the
      immmediate fields of two uses. e.g
             movw    r2, #40885
             movt    r3, #46540
             add     r0, r0, r3
      =>
             add.w   r0, r0, #3019898880
             add.w   r0, r0, #30146560
      ;
      However, this transformation is incorrect if the user produces a flag. e.g.
             movw    r2, #40885
             movt    r3, #46540
             adds    r0, r0, r3
      =>
             add.w   r0, r0, #3019898880
             adds.w  r0, r0, #30146560
      Note the adds.w may not set the carry flag even if the original sequence
      would.
      
      rdar://11116189
      
      llvm-svn: 153484
      a2b48d98
    • Lang Hames's avatar
      Add a debug option to dump PBQP graphs during register allocation. · 95e021fa
      Lang Hames authored
      llvm-svn: 153483
      95e021fa
    • Andrew Trick's avatar
      SCEV fix: Handle loop invariant loads. · 7004e4b9
      Andrew Trick authored
      Fixes PR11882: NULL dereference in ComputeLoadConstantCompareExitLimit.
      
      llvm-svn: 153480
      7004e4b9
  2. Mar 26, 2012
Loading