Skip to content
  1. Jan 24, 2014
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 4f3df4ad
      Juergen Ributzka authored
      Retry commit r200022 with a fix for the build bot errors. Constant expressions
      have (unlike instructions) module scope use lists and therefore may have users
      in different functions. The fix is to simply ignore these out-of-function uses.
      
      llvm-svn: 200034
      4f3df4ad
    • Benjamin Kramer's avatar
      InstCombine: Don't try to use aggregate elements of ConstantExprs. · 09b0f88a
      Benjamin Kramer authored
      PR18600.
      
      llvm-svn: 200028
      09b0f88a
    • Juergen Ributzka's avatar
      Revert "Add Constant Hoisting Pass" · 50e7e80d
      Juergen Ributzka authored
      This reverts commit r200022 to unbreak the build bots.
      
      llvm-svn: 200024
      50e7e80d
    • Juergen Ributzka's avatar
      Add Constant Hoisting Pass · 38b67d0c
      Juergen Ributzka authored
      This pass identifies expensive constants to hoist and coalesces them to
      better prepare it for SelectionDAG-based code generation. This works around the
      limitations of the basic-block-at-a-time approach.
      
      First it scans all instructions for integer constants and calculates its
      cost. If the constant can be folded into the instruction (the cost is
      TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
      consider it expensive and leave it alone. This is the default behavior and
      the default implementation of getIntImmCost will always return TCC_Free.
      
      If the cost is more than TCC_BASIC, then the integer constant can't be folded
      into the instruction and it might be beneficial to hoist the constant.
      Similar constants are coalesced to reduce register pressure and
      materialization code.
      
      When a constant is hoisted, it is also hidden behind a bitcast to force it to
      be live-out of the basic block. Otherwise the constant would be just
      duplicated and each basic block would have its own copy in the SelectionDAG.
      The SelectionDAG recognizes such constants as opaque and doesn't perform
      certain transformations on them, which would create a new expensive constant.
      
      This optimization is only applied to integer constants in instructions and
      simple (this means not nested) constant cast experessions. For example:
      %0 = load i64* inttoptr (i64 big_constant to i64*)
      
      Reviewed by Eric
      
      llvm-svn: 200022
      38b67d0c
    • Alp Toker's avatar
      Fix known typos · cb402911
      Alp Toker authored
      Sweep the codebase for common typos. Includes some changes to visible function
      names that were misspelt.
      
      llvm-svn: 200018
      cb402911
    • Chandler Carruth's avatar
      [LPM] Fix a logic error in LICM spotted by inspection. · cc497b6a
      Chandler Carruth authored
      We completely skipped promotion in LICM if the loop has a preheader or
      dedicated exits, but not *both*. We hoist if there is a preheader, and
      sink if there are dedicated exits, but either hoisting or sinking can
      move loop invariant code out of the loop!
      
      I have no idea if this has a practical consequence. If anyone has ideas
      for a test case, let me know.
      
      llvm-svn: 199966
      cc497b6a
    • Chandler Carruth's avatar
      [cleanup] Use the type-based preservation method rather than a string · abfa3e56
      Chandler Carruth authored
      literal that bakes a pass name and forces parsing it in the pass
      manager.
      
      llvm-svn: 199963
      abfa3e56
  2. Jan 23, 2014
    • Rafael Espindola's avatar
      Remove tail marker when changing an argument to an alloca. · 2a05ea5c
      Rafael Espindola authored
      Argument promotion can replace an argument of a call with an alloca. This
      requires clearing the tail marker as it is very likely that the callee is now
      using an alloca in the caller.
      
      This fixes pr14710.
      
      llvm-svn: 199909
      2a05ea5c
    • Chandler Carruth's avatar
      [LPM] Make LoopSimplify no longer a LoopPass and instead both a utility · aa7fa5e4
      Chandler Carruth authored
      function and a FunctionPass.
      
      This has many benefits. The motivating use case was to be able to
      compute function analysis passes *after* running LoopSimplify (to avoid
      invalidating them) and then to run other passes which require
      LoopSimplify. Specifically passes like unrolling and vectorization are
      critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
      that they can be profile aware. For the LoopVectorize pass the only
      things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
      and LCSSA is next on my list.
      
      There are also a bunch of other benefits of doing this:
      - It is now very feasible to make more passes *preserve* LoopSimplify
        because they can simply run it after changing a loop. Because
        subsequence passes can assume LoopSimplify is preserved we can reduce
        the runs of this pass to the times when we actually mutate a loop
        structure.
      - The new pass manager should be able to more easily support loop passes
        factored in this way.
      - We can at long, long last observe that LoopSimplify is preserved
        across SCEV. This *halves* the number of times we run LoopSimplify!!!
      
      Now, getting here wasn't trivial. First off, the interfaces used by
      LoopSimplify are all over the map regarding how analysis are updated. We
      end up with weird "pass" parameters as a consequence. I'll try to clean
      at least some of this up later -- I'll have to have it all clean for the
      new pass manager.
      
      Next up I discovered a really frustrating bug. LoopUnroll *claims* to
      preserve LoopSimplify. That's actually a lie. But the way the
      LoopPassManager ends up running the passes, it always ran LoopSimplify
      on the unrolled-into loop, rectifying this oversight before any
      verification could kick in and point out that in fact nothing was
      preserved. So I've added code to the unroller to *actually* simplify the
      surrounding loop when it succeeds at unrolling.
      
      The only functional change in the test suite is that we now catch a case
      that was previously missed because SCEV and other loop transforms see
      their containing loops as simplified and thus don't miss some
      opportunities. One test case has been converted to check that we catch
      this case rather than checking that we miss it but at least don't get
      the wrong answer.
      
      Note that I have #if-ed out all of the verification logic in
      LoopSimplify! This is a temporary workaround while extracting these bits
      from the LoopPassManager. Currently, there is no way to have a pass in
      the LoopPassManager which preserves LoopSimplify along with one which
      does not. The LPM will try to verify on each loop in the nest that
      LoopSimplify holds but the now-Function-pass cannot distinguish what
      loop is being verified and so must try to verify all of them. The inner
      most loop is clearly no longer simplified as there is a pass which
      didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
      LoopVectorize and some other fixes) I'll be able to re-enable this check
      and catch any places where we are still failing to preserve
      LoopSimplify. If this causes problems I can back this out and try to
      commit *all* of this at once, but so far this seems to work and allow
      much more incremental progress.
      
      llvm-svn: 199884
      aa7fa5e4
  3. Jan 22, 2014
  4. Jan 20, 2014
    • Owen Anderson's avatar
      Fix all the remaining lost-fast-math-flags bugs I've been able to find. The... · 1664dc89
      Owen Anderson authored
      Fix all the remaining lost-fast-math-flags bugs I've been able to find.  The most important of these are cases in the generic logic for combining BinaryOperators.
      This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.
      
      llvm-svn: 199629
      1664dc89
  5. Jan 19, 2014
  6. Jan 18, 2014
  7. Jan 17, 2014
    • Kostya Serebryany's avatar
      [asan] extend asan-coverage (still experimental). · 714c67c3
      Kostya Serebryany authored
       - add a mode for collecting per-block coverage (-asan-coverage=2).
         So far the implementation is naive (all blocks are instrumented),
         the performance overhead on top of asan could be as high as 30%.
       - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
         which in turn required to copy the original debug info into the call insn.
      
      Here is the performance data on SPEC 2006
      (train data, comparing asan with asan-coverage={0,1,2}):
      
                                   asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
             400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
                 401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
                   403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
                   429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
                 445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
                 456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
                 458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
            462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
               464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
               471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
                 473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
             483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
                  433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
                  444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
                447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
                450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
                453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
                   470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
               482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06
      
      llvm-svn: 199488
      714c67c3
  8. Jan 16, 2014
  9. Jan 15, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: set threshold to 3 cases · 4744ac17
      Hans Wennborg authored
      There has been an old FIXME to find the right cut-off for when it's worth
      analyzing and potentially transforming a switch to a lookup table.
      
      The switches always have two or more cases. I could not measure any speed-up
      by transforming a switch with two cases. A switch with three cases gets a nice
      speed-up, and I couldn't measure any compile-time regression, so I think this
      is the right threshold.
      
      In a Clang self-host, this causes 480 new switches to be transformed,
      and reduces the final binary size with 8 KB.
      
      llvm-svn: 199294
      4744ac17
    • Arnold Schwaighofer's avatar
      LoopVectorize: Only strip casts from integer types when replacing symbolic · dc4c9460
      Arnold Schwaighofer authored
      strides
      
      Fixes PR18480.
      
      llvm-svn: 199291
      dc4c9460
  10. Jan 14, 2014
    • Matt Arsenault's avatar
      Do pointer cast simplifications on addrspacecast · 2d353d1a
      Matt Arsenault authored
      llvm-svn: 199254
      2d353d1a
    • Matt Arsenault's avatar
      Remove a check for an illegal condition. · f08a44f9
      Matt Arsenault authored
      Bitcasts can't be between address spaces anymore.
      
      llvm-svn: 199253
      f08a44f9
    • Matt Arsenault's avatar
      Make nocapture analysis work with addrspacecast · e55a2c2e
      Matt Arsenault authored
      llvm-svn: 199246
      e55a2c2e
    • Duncan P. N. Exon Smith's avatar
      Reapply "LTO: add API to set strategy for -internalize" · 93be7c4f
      Duncan P. N. Exon Smith authored
      Reapply r199191, reverted in r199197 because it carelessly broke
      Other/link-opts.ll.  The problem was that calling
      createInternalizePass("main") would select
      createInternalizePass(bool("main")) instead of
      createInternalizePass(ArrayRef<const char *>("main")).  This commit
      fixes the bug.
      
      The original commit message follows.
      
      Add API to LTOCodeGenerator to specify a strategy for the -internalize
      pass.
      
      This is a new attempt at Bill's change in r185882, which he reverted in
      r188029 due to problems with the gold linker.  This puts the onus on the
      linker to decide whether (and what) to internalize.
      
      In particular, running internalize before outputting an object file may
      change a 'weak' symbol into an internal one, even though that symbol
      could be needed by an external object file --- e.g., with arclite.
      
      This patch enables three strategies:
      
      - LTO_INTERNALIZE_FULL: the default (and the old behaviour).
      - LTO_INTERNALIZE_NONE: skip -internalize.
      - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
        visibility.
      
      LTO_INTERNALIZE_FULL should be used when linking an executable.
      
      Outputting an object file (e.g., via ld -r) is more complicated, and
      depends on whether hidden symbols should be internalized.  E.g., for
      ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
      LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
      LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
      eventually need to link with others.
      
      lto_codegen_set_internalize_strategy() sets the strategy for subsequent
      calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
      
      <rdar://problem/14334895>
      
      llvm-svn: 199244
      93be7c4f
    • Nico Rieck's avatar
      Decouple dllexport/dllimport from linkage · 7157bb76
      Nico Rieck authored
      Representing dllexport/dllimport as distinct linkage types prevents using
      these attributes on templates and inline functions.
      
      Instead of introducing further mixed linkage types to include linkonce and
      weak ODR, the old import/export linkage types are replaced with a new
      separate visibility-like specifier:
      
        define available_externally dllimport void @f() {}
        @Var = dllexport global i32 1, align 4
      
      Linkage for dllexported globals and functions is now equal to their linkage
      without dllexport. Imported globals and functions must be either
      declarations with external linkage, or definitions with
      AvailableExternallyLinkage.
      
      llvm-svn: 199218
      7157bb76
    • Nico Rieck's avatar
      Revert "Decouple dllexport/dllimport from linkage" · 9d2e0df0
      Nico Rieck authored
      Revert this for now until I fix an issue in Clang with it.
      
      This reverts commit r199204.
      
      llvm-svn: 199207
      9d2e0df0
    • Nico Rieck's avatar
      Decouple dllexport/dllimport from linkage · e43aaf79
      Nico Rieck authored
      Representing dllexport/dllimport as distinct linkage types prevents using
      these attributes on templates and inline functions.
      
      Instead of introducing further mixed linkage types to include linkonce and
      weak ODR, the old import/export linkage types are replaced with a new
      separate visibility-like specifier:
      
        define available_externally dllimport void @f() {}
        @Var = dllexport global i32 1, align 4
      
      Linkage for dllexported globals and functions is now equal to their linkage
      without dllexport. Imported globals and functions must be either
      declarations with external linkage, or definitions with
      AvailableExternallyLinkage.
      
      llvm-svn: 199204
      e43aaf79
    • NAKAMURA Takumi's avatar
      Revert r199191, "LTO: add API to set strategy for -internalize" · 23c0ab53
      NAKAMURA Takumi authored
      Please update also Other/link-opts.ll, in next time.
      
      llvm-svn: 199197
      23c0ab53
Loading