Skip to content
  1. Feb 07, 2014
  2. Feb 06, 2014
    • Quentin Colombet's avatar
      [CodeGenPrepare] Move away sign extensions that get in the way of addressing · 3a4bf040
      Quentin Colombet authored
      mode.
      
      Basically the idea is to transform code like this:
      %idx = add nsw i32 %a, 1
      %sextidx = sext i32 %idx to i64
      %gep = gep i8* %myArray, i64 %sextidx
      load i8* %gep
      
      Into:
      %sexta = sext i32 %a to i64
      %idx = add nsw i64 %sexta, 1
      %gep = gep i8* %myArray, i64 %idx
      load i8* %gep
      
      That way the computation can be folded into the addressing mode.
      
      This transformation is done as part of the addressing mode matcher.
      If the matching fails (not profitable, addressing mode not legal, etc.), the
      matcher will revert the related promotions.
      
      <rdar://problem/15519855>
      
      llvm-svn: 200947
      3a4bf040
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF store pattern for Reg+Imm offsets · e2367945
      Tom Stellard authored
      llvm-svn: 200935
      e2367945
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF store pattern for Imm offsets · 2937cbc0
      Tom Stellard authored
      llvm-svn: 200934
      2937cbc0
    • Tom Stellard's avatar
      R600/SI: Add a MUBUF load pattern for Reg+Imm offsets · 11624bc5
      Tom Stellard authored
      llvm-svn: 200933
      11624bc5
    • Tom Stellard's avatar
      R600/SI: Use immediates offsets for SMRD instructions whenever possible · 044e418f
      Tom Stellard authored
      There was a problem with the old pattern, so we were copying some
      larger immediates into registers when we could have been encoding
      them in the instruction.
      
      llvm-svn: 200932
      044e418f
    • Tim Northover's avatar
      X86: add costs for 64-bit vector ext/trunc & rebalance · f0e21616
      Tim Northover authored
      The most important part of this is probably adding any cost at all for
      operations like zext <8 x i8> to <8 x i32>. Before they were being
      recorded as extremely costly (24, I believe) which made LLVM fall back
      on a 4-wide vectorisation of a loop.
      
      It also rebalances the values for sext, zext and trunc. Lacking any
      other sane metric that might work across CPU microarchitectures I went
      for instructions. This seems to be in reasonable accord with the rest
      of the table (sitofp, ...) though no doubt at least one value is
      sub-optimal for some bizarre reason.
      
      Finally, separate AVX and AVX2 values are provided where appropriate.
      The CodeGen is quite different in many cases.
      
      rdar://problem/15981990
      
      llvm-svn: 200928
      f0e21616
    • Nick Lewycky's avatar
      99384949
    • Chandler Carruth's avatar
      [PM] Add a new "lazy" call graph analysis pass for the new pass manager. · bf71a34e
      Chandler Carruth authored
      The primary motivation for this pass is to separate the call graph
      analysis used by the new pass manager's CGSCC pass management from the
      existing call graph analysis pass. That analysis pass is (somewhat
      unfortunately) over-constrained by the existing CallGraphSCCPassManager
      requirements. Those requirements make it *really* hard to cleanly layer
      the needed functionality for the new pass manager on top of the existing
      analysis.
      
      However, there are also a bunch of things that the pass manager would
      specifically benefit from doing differently from the existing call graph
      analysis, and this new implementation tries to address several of them:
      
      - Be lazy about scanning function definitions. The existing pass eagerly
        scans the entire module to build the initial graph. This new pass is
        significantly more lazy, and I plan to push this even further to
        maximize locality during CGSCC walks.
      - Don't use a single synthetic node to partition functions with an
        indirect call from functions whose address is taken. This node creates
        a huge choke-point which would preclude good parallelization across
        the fanout of the SCC graph when we got to the point of looking at
        such changes to LLVM.
      - Use a memory dense and lightweight representation of the call graph
        rather than value handles and tracking call instructions. This will
        require explicit update calls instead of some updates working
        transparently, but should end up being significantly more efficient.
        The explicit update calls ended up being needed in many cases for the
        existing call graph so we don't really lose anything.
      - Doesn't explicitly model SCCs and thus doesn't provide an "identity"
        for an SCC which is stable across updates. This is essential for the
        new pass manager to work correctly.
      - Only form the graph necessary for traversing all of the functions in
        an SCC friendly order. This is a much simpler graph structure and
        should be more memory dense. It does limit the ways in which it is
        appropriate to use this analysis. I wish I had a better name than
        "call graph". I've commented extensively this aspect.
      
      This is still very much a WIP, in fact it is really just the initial
      bits. But it is about the fourth version of the initial bits that I've
      implemented with each of the others running into really frustrating
      problms. This looks like it will actually work and I'd like to split the
      actual complexity across commits for the sake of my reviewers. =] The
      rest of the implementation along with lots of wiring will follow
      somewhat more rapidly now that there is a good path forward.
      
      Naturally, this doesn't impact any of the existing optimizer. This code
      is specific to the new pass manager.
      
      A bunch of thanks are deserved for the various folks that have helped
      with the design of this, especially Nick Lewycky who actually sat with
      me to go through the fundamentals of the final version here.
      
      llvm-svn: 200903
      bf71a34e
    • Juergen Ributzka's avatar
      [DAG] Don't pull the binary operation though the shift if the operands have opaque constants. · fa0eba6c
      Juergen Ributzka authored
      During DAGCombine visitShiftByConstant assumes that certain binary operations
      with only constant operands can always be folded successfully. This is no longer
      true when the constant is opaque. This commit fixes visitShiftByConstant by not
      performing the optimization for opaque constants. Otherwise we would end up in
      an infinite DAGCombine loop.
      
      llvm-svn: 200900
      fa0eba6c
    • Manman Ren's avatar
      Set default of inlinecold-threshold to 225. · d4612449
      Manman Ren authored
      225 is the default value of inline-threshold. This change will make sure
      we have the same inlining behavior as prior to r200886.
      
      As Chandler points out, even though we don't have code in our testing
      suite that uses cold attribute, there are larger applications that do
      use cold attribute.
      
      r200886 + this commit intend to keep the same behavior as prior to r200886.
      We can later on tune the inlinecold-threshold.
      
      The main purpose of r200886 is to help performance of instrumentation based
      PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
      For instrumentation based PGO, we try to increase inlining of hot functions and
      reduce inlining of cold functions by setting inlinecold-threshold.
      
      Another option suggested by Chandler is to use a boolean flag that controls
      if we should use OptSizeThreshold for cold functions. The default value
      of the boolean flag should not change the current behavior. But it gives us
      less freedom in controlling inlining of cold functions.
      
      llvm-svn: 200898
      d4612449
    • Kevin Enderby's avatar
      Update the X86 assembler for .intel_syntax to accept · d6b10713
      Kevin Enderby authored
      the << and >> bitwise operators.
      
      rdar://15975725
      
      llvm-svn: 200896
      d6b10713
    • Paul Robinson's avatar
      Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0
      Paul Robinson authored
      Ideally only those transform passes that run at -O0 remain enabled,
      in reality we get as close as we reasonably can.
      Passes are responsible for disabling themselves, it's not the job of
      the pass manager to do it for them.
      
      llvm-svn: 200892
      af4e64d0
  3. Feb 05, 2014
  4. Feb 04, 2014
Loading