Skip to content
  1. Jan 22, 2014
  2. Jan 20, 2014
    • Owen Anderson's avatar
      Fix all the remaining lost-fast-math-flags bugs I've been able to find. The... · 1664dc89
      Owen Anderson authored
      Fix all the remaining lost-fast-math-flags bugs I've been able to find.  The most important of these are cases in the generic logic for combining BinaryOperators.
      This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.
      
      llvm-svn: 199629
      1664dc89
  3. Jan 19, 2014
  4. Jan 18, 2014
  5. Jan 17, 2014
    • Kostya Serebryany's avatar
      [asan] extend asan-coverage (still experimental). · 714c67c3
      Kostya Serebryany authored
       - add a mode for collecting per-block coverage (-asan-coverage=2).
         So far the implementation is naive (all blocks are instrumented),
         the performance overhead on top of asan could be as high as 30%.
       - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
         which in turn required to copy the original debug info into the call insn.
      
      Here is the performance data on SPEC 2006
      (train data, comparing asan with asan-coverage={0,1,2}):
      
                                   asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
             400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
                 401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
                   403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
                   429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
                 445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
                 456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
                 458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
            462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
               464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
               471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
                 473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
             483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
                  433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
                  444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
                447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
                450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
                453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
                   470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
               482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06
      
      llvm-svn: 199488
      714c67c3
  6. Jan 16, 2014
  7. Jan 15, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: set threshold to 3 cases · 4744ac17
      Hans Wennborg authored
      There has been an old FIXME to find the right cut-off for when it's worth
      analyzing and potentially transforming a switch to a lookup table.
      
      The switches always have two or more cases. I could not measure any speed-up
      by transforming a switch with two cases. A switch with three cases gets a nice
      speed-up, and I couldn't measure any compile-time regression, so I think this
      is the right threshold.
      
      In a Clang self-host, this causes 480 new switches to be transformed,
      and reduces the final binary size with 8 KB.
      
      llvm-svn: 199294
      4744ac17
    • Arnold Schwaighofer's avatar
      LoopVectorize: Only strip casts from integer types when replacing symbolic · dc4c9460
      Arnold Schwaighofer authored
      strides
      
      Fixes PR18480.
      
      llvm-svn: 199291
      dc4c9460
  8. Jan 14, 2014
    • Matt Arsenault's avatar
      Do pointer cast simplifications on addrspacecast · 2d353d1a
      Matt Arsenault authored
      llvm-svn: 199254
      2d353d1a
    • Matt Arsenault's avatar
      Remove a check for an illegal condition. · f08a44f9
      Matt Arsenault authored
      Bitcasts can't be between address spaces anymore.
      
      llvm-svn: 199253
      f08a44f9
    • Matt Arsenault's avatar
      Make nocapture analysis work with addrspacecast · e55a2c2e
      Matt Arsenault authored
      llvm-svn: 199246
      e55a2c2e
    • Duncan P. N. Exon Smith's avatar
      Reapply "LTO: add API to set strategy for -internalize" · 93be7c4f
      Duncan P. N. Exon Smith authored
      Reapply r199191, reverted in r199197 because it carelessly broke
      Other/link-opts.ll.  The problem was that calling
      createInternalizePass("main") would select
      createInternalizePass(bool("main")) instead of
      createInternalizePass(ArrayRef<const char *>("main")).  This commit
      fixes the bug.
      
      The original commit message follows.
      
      Add API to LTOCodeGenerator to specify a strategy for the -internalize
      pass.
      
      This is a new attempt at Bill's change in r185882, which he reverted in
      r188029 due to problems with the gold linker.  This puts the onus on the
      linker to decide whether (and what) to internalize.
      
      In particular, running internalize before outputting an object file may
      change a 'weak' symbol into an internal one, even though that symbol
      could be needed by an external object file --- e.g., with arclite.
      
      This patch enables three strategies:
      
      - LTO_INTERNALIZE_FULL: the default (and the old behaviour).
      - LTO_INTERNALIZE_NONE: skip -internalize.
      - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
        visibility.
      
      LTO_INTERNALIZE_FULL should be used when linking an executable.
      
      Outputting an object file (e.g., via ld -r) is more complicated, and
      depends on whether hidden symbols should be internalized.  E.g., for
      ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
      LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
      LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
      eventually need to link with others.
      
      lto_codegen_set_internalize_strategy() sets the strategy for subsequent
      calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
      
      <rdar://problem/14334895>
      
      llvm-svn: 199244
      93be7c4f
    • Nico Rieck's avatar
      Decouple dllexport/dllimport from linkage · 7157bb76
      Nico Rieck authored
      Representing dllexport/dllimport as distinct linkage types prevents using
      these attributes on templates and inline functions.
      
      Instead of introducing further mixed linkage types to include linkonce and
      weak ODR, the old import/export linkage types are replaced with a new
      separate visibility-like specifier:
      
        define available_externally dllimport void @f() {}
        @Var = dllexport global i32 1, align 4
      
      Linkage for dllexported globals and functions is now equal to their linkage
      without dllexport. Imported globals and functions must be either
      declarations with external linkage, or definitions with
      AvailableExternallyLinkage.
      
      llvm-svn: 199218
      7157bb76
    • Nico Rieck's avatar
      Revert "Decouple dllexport/dllimport from linkage" · 9d2e0df0
      Nico Rieck authored
      Revert this for now until I fix an issue in Clang with it.
      
      This reverts commit r199204.
      
      llvm-svn: 199207
      9d2e0df0
    • Nico Rieck's avatar
      Decouple dllexport/dllimport from linkage · e43aaf79
      Nico Rieck authored
      Representing dllexport/dllimport as distinct linkage types prevents using
      these attributes on templates and inline functions.
      
      Instead of introducing further mixed linkage types to include linkonce and
      weak ODR, the old import/export linkage types are replaced with a new
      separate visibility-like specifier:
      
        define available_externally dllimport void @f() {}
        @Var = dllexport global i32 1, align 4
      
      Linkage for dllexported globals and functions is now equal to their linkage
      without dllexport. Imported globals and functions must be either
      declarations with external linkage, or definitions with
      AvailableExternallyLinkage.
      
      llvm-svn: 199204
      e43aaf79
    • NAKAMURA Takumi's avatar
      Revert r199191, "LTO: add API to set strategy for -internalize" · 23c0ab53
      NAKAMURA Takumi authored
      Please update also Other/link-opts.ll, in next time.
      
      llvm-svn: 199197
      23c0ab53
    • Duncan P. N. Exon Smith's avatar
      LTO: add API to set strategy for -internalize · 43ea3478
      Duncan P. N. Exon Smith authored
      Add API to LTOCodeGenerator to specify a strategy for the -internalize
      pass.
      
      This is a new attempt at Bill's change in r185882, which he reverted in
      r188029 due to problems with the gold linker.  This puts the onus on the
      linker to decide whether (and what) to internalize.
      
      In particular, running internalize before outputting an object file may
      change a 'weak' symbol into an internal one, even though that symbol
      could be needed by an external object file --- e.g., with arclite.
      
      This patch enables three strategies:
      
      - LTO_INTERNALIZE_FULL: the default (and the old behaviour).
      - LTO_INTERNALIZE_NONE: skip -internalize.
      - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
        visibility.
      
      LTO_INTERNALIZE_FULL should be used when linking an executable.
      
      Outputting an object file (e.g., via ld -r) is more complicated, and
      depends on whether hidden symbols should be internalized.  E.g., for
      ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
      LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
      LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
      eventually need to link with others.
      
      lto_codegen_set_internalize_strategy() sets the strategy for subsequent
      calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
      
      <rdar://problem/14334895>
      
      llvm-svn: 199191
      43ea3478
  9. Jan 13, 2014
    • Chandler Carruth's avatar
      [PM] Split DominatorTree into a concrete analysis result object which · 73523021
      Chandler Carruth authored
      can be used by both the new pass manager and the old.
      
      This removes it from any of the virtual mess of the pass interfaces and
      lets it derive cleanly from the DominatorTreeBase<> template. In turn,
      tons of boilerplate interface can be nuked and it turns into a very
      straightforward extension of the base DominatorTree interface.
      
      The old analysis pass is now a simple wrapper. The names and style of
      this split should match the split between CallGraph and
      CallGraphWrapperPass. All of the users of DominatorTree have been
      updated to match using many of the same tricks as with CallGraph. The
      goal is that the common type remains the resulting DominatorTree rather
      than the pass. This will make subsequent work toward the new pass
      manager significantly easier.
      
      Also in numerous places things became cleaner because I switched from
      re-running the pass (!!! mid way through some other passes run!!!) to
      directly recomputing the domtree.
      
      llvm-svn: 199104
      73523021
    • Chandler Carruth's avatar
      [PM] Pull the generic graph algorithms and data structures for dominator · e509db41
      Chandler Carruth authored
      trees into the Support library.
      
      These are all expressed in terms of the generic GraphTraits and CFG,
      with no reliance on any concrete IR types. Putting them in support
      clarifies that and makes the fact that the static analyzer in Clang uses
      them much more sane. When moving the Dominators.h file into the IR
      library I claimed that this was the right home for it but not something
      I planned to work on. Oops.
      
      So why am I doing this? It happens to be one step toward breaking the
      requirement that IR verification can only be performed from inside of
      a pass context, which completely blocks the implementation of
      verification for the new pass manager infrastructure. Fixing it will
      also allow removing the concept of the "preverify" step (WTF???) and
      allow the verifier to cleanly flag functions which fail verification in
      a way that precludes even computing dominance information. Currently,
      that results in a fatal error even when you ask the verifier to not
      fatally error. It's awesome like that.
      
      The yak shaving will continue...
      
      llvm-svn: 199095
      e509db41
    • Chandler Carruth's avatar
      [cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c
      Chandler Carruth authored
      directory. These passes are already defined in the IR library, and it
      doesn't make any sense to have the headers in Analysis.
      
      Long term, I think there is going to be a much better way to divide
      these matters. The dominators code should be fully separated into the
      abstract graph algorithm and have that put in Support where it becomes
      obvious that evn Clang's CFGBlock's can use it. Then the verifier can
      manually construct dominance information from the Support-driven
      interface while the Analysis library can provide a pass which both
      caches, reconstructs, and supports a nice update API.
      
      But those are very long term, and so I don't want to leave the really
      confusing structure until that day arrives.
      
      llvm-svn: 199082
      5ad5f15c
    • Chandler Carruth's avatar
      Re-sort #include lines again, prior to moving headers around. · 07baed53
      Chandler Carruth authored
      llvm-svn: 199080
      07baed53
  10. Jan 12, 2014
    • Hans Wennborg's avatar
      Switch-to-lookup tables: Don't require a result for the default · ac114a3c
      Hans Wennborg authored
      case when the lookup table doesn't have any holes.
      
      This means we can build a lookup table for switches like this:
      
        switch (x) {
          case 0: return 1;
          case 1: return 2;
          case 2: return 3;
          case 3: return 4;
          default: exit(1);
        }
      
      The default case doesn't yield a constant result here, but that doesn't matter,
      since a default result is only necessary for filling holes in the lookup table,
      and this table doesn't have any holes.
      
      This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
      off the resulting clang binary.
      
      llvm-svn: 199025
      ac114a3c
  11. Jan 11, 2014
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Enable strided memory accesses versioning per default · 66c742ae
      Arnold Schwaighofer authored
      I saw no compile or execution time regressions on x86_64 -mavx -O3.
      
      radar://13075509
      
      llvm-svn: 199015
      66c742ae
    • NAKAMURA Takumi's avatar
      LoopVectorize.cpp: Appease MSC16. · 41c409ce
      NAKAMURA Takumi authored
      Excuse me, I hope msc16 builders would be fine till its end day.
      Introduce nullptr then. ;)
      
      llvm-svn: 199001
      41c409ce
    • Diego Novillo's avatar
      Extend and simplify the sample profile input file. · 9518b63b
      Diego Novillo authored
      1- Use the line_iterator class to read profile files.
      
      2- Allow comments in profile file. Lines starting with '#'
         are completely ignored while reading the profile.
      
      3- Add parsing support for discriminators and indirect call samples.
      
         Our external profiler can emit more profile information that we are
         currently not handling. This patch does not add new functionality to
         support this information, but it allows profile files to provide it.
      
         I will add actual support later on (for at least one of these
         features, I need support for DWARF discriminators in Clang).
      
         A sample line may contain the following additional information:
      
         Discriminator. This is used if the sampled program was compiled with
         DWARF discriminator support
         (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
         is currently only emitted by GCC and we just ignore it.
      
         Potential call targets and samples. If present, this line contains a
         call instruction. This models both direct and indirect calls. Each
         called target is listed together with the number of samples. For
         example,
      
                          130: 7  foo:3  bar:2  baz:7
      
         The above means that at relative line offset 130 there is a call
         instruction that calls one of foo(), bar() and baz(). With baz()
         being the relatively more frequent call target.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2355
      
      4- Simplify format of profile input file.
      
         This implements earlier suggestions to simplify the format of the
         sample profile file. The symbol table is not necessary and function
         profiles do not need to know the number of samples in advance.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2419
      
      llvm-svn: 198973
      9518b63b
Loading