Skip to content
  1. Jan 13, 2014
    • Chandler Carruth's avatar
      [PM] Split DominatorTree into a concrete analysis result object which · 73523021
      Chandler Carruth authored
      can be used by both the new pass manager and the old.
      
      This removes it from any of the virtual mess of the pass interfaces and
      lets it derive cleanly from the DominatorTreeBase<> template. In turn,
      tons of boilerplate interface can be nuked and it turns into a very
      straightforward extension of the base DominatorTree interface.
      
      The old analysis pass is now a simple wrapper. The names and style of
      this split should match the split between CallGraph and
      CallGraphWrapperPass. All of the users of DominatorTree have been
      updated to match using many of the same tricks as with CallGraph. The
      goal is that the common type remains the resulting DominatorTree rather
      than the pass. This will make subsequent work toward the new pass
      manager significantly easier.
      
      Also in numerous places things became cleaner because I switched from
      re-running the pass (!!! mid way through some other passes run!!!) to
      directly recomputing the domtree.
      
      llvm-svn: 199104
      73523021
    • Elena Demikhovsky's avatar
      AVX-512: Embedded Rounding Control - encoding and printing · b19c9dc1
      Elena Demikhovsky authored
      Changed intrinsics for vrcp14/vrcp28 vrsqrt14/vrsqrt28 - aligned with GCC.
      
      llvm-svn: 199102
      b19c9dc1
    • Chandler Carruth's avatar
      [PM] Pull the generic graph algorithms and data structures for dominator · e509db41
      Chandler Carruth authored
      trees into the Support library.
      
      These are all expressed in terms of the generic GraphTraits and CFG,
      with no reliance on any concrete IR types. Putting them in support
      clarifies that and makes the fact that the static analyzer in Clang uses
      them much more sane. When moving the Dominators.h file into the IR
      library I claimed that this was the right home for it but not something
      I planned to work on. Oops.
      
      So why am I doing this? It happens to be one step toward breaking the
      requirement that IR verification can only be performed from inside of
      a pass context, which completely blocks the implementation of
      verification for the new pass manager infrastructure. Fixing it will
      also allow removing the concept of the "preverify" step (WTF???) and
      allow the verifier to cleanly flag functions which fail verification in
      a way that precludes even computing dominance information. Currently,
      that results in a fatal error even when you ask the verifier to not
      fatally error. It's awesome like that.
      
      The yak shaving will continue...
      
      llvm-svn: 199095
      e509db41
    • Tim Northover's avatar
      Revert "ReMat: fix overly cavalier attitude to sub-register indices" · 7fdd4857
      Tim Northover authored
      Very sorry, this was a premature patch that I still need to investigate and
      finish off (for some reason beyond me at the moment it doesn't actually fix the
      issue in all cases).
      
      This reverts commit r199091.
      
      llvm-svn: 199093
      7fdd4857
    • Tim Northover's avatar
      ReMat: fix overly cavalier attitude to sub-register indices · 59f8d4b4
      Tim Northover authored
      There are two attempted optimisations in reMaterializeTrivialDef, trying to
      avoid promoting the size of a register too much when rematerializing.
      Unfortunately, both appear to be flawed. First, we see if the original register
      would have worked, but this is inadequate. Consider:
      
          v1 = SOMETHING (v1 is QQ)
          v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
          ...
          uses of v2
      
      In this case even though v2 *could* be used directly as the output of
      SOMETHING, this would set the wrong bits of the QQ register involved. The
      correct rematerialization must be:
      
          v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
          ...
          uses of v2:Q1_Q2
      
      For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
      we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
      to hunt for a class between v1 and v2 that works. Unfortunately, this is also
      wrong:
      
          v1 = SOMETHING (v1 is QQ)
          v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
          ...
          uses of v2 as a QQQ
      
      The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
      logic would decide that v2 could be a QQ (no interest is taken in later uses).
      
      This patch, therefore, always accepts the widened register class without trying
      to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
      GR64 case, expanding the width doesn't matter because it's not like you were
      going to do anything else with the high bits of a GR32 register). It can
      increase register pressure in cases like the ARM VFP regs though (multiple
      non-overlapping but equivalent subregisters). Hopefully this situation is rare
      enough that it won't matter.
      
      Unfortunately, no in-tree targets actually expose this as far as I can tell
      (there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
      been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
      and I will be adding a test there that we should be able to contribute
      soon(TM).
      
      llvm-svn: 199091
      59f8d4b4
    • Chandler Carruth's avatar
      [cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c
      Chandler Carruth authored
      directory. These passes are already defined in the IR library, and it
      doesn't make any sense to have the headers in Analysis.
      
      Long term, I think there is going to be a much better way to divide
      these matters. The dominators code should be fully separated into the
      abstract graph algorithm and have that put in Support where it becomes
      obvious that evn Clang's CFGBlock's can use it. Then the verifier can
      manually construct dominance information from the Support-driven
      interface while the Analysis library can provide a pass which both
      caches, reconstructs, and supports a nice update API.
      
      But those are very long term, and so I don't want to leave the really
      confusing structure until that day arrives.
      
      llvm-svn: 199082
      5ad5f15c
    • Chandler Carruth's avatar
      Re-sort #include lines again, prior to moving headers around. · 07baed53
      Chandler Carruth authored
      llvm-svn: 199080
      07baed53
    • Chandler Carruth's avatar
      [PM] Wire up support for writing bitcode with new PM. · b7bdfd65
      Chandler Carruth authored
      This moves the old pass creation functionality to its own header and
      updates the callers of that routine. Then it adds a new PM supporting
      bitcode writer to the header file, and wires that up in the opt tool.
      A test is added that round-trips code into bitcode and back out using
      the new pass manager.
      
      llvm-svn: 199078
      b7bdfd65
    • Kevin Qin's avatar
      [AArch64 NEON] Add missing patterns for bitcast from or to v1f64 · cfef55d6
      Kevin Qin authored
      llvm-svn: 199070
      cfef55d6
    • Kevin Qin's avatar
      [AArch64 NEON] Add more scenarios to use perm instructions when lowering shuffle_vector · 21e8f1c4
      Kevin Qin authored
      This patch covered 2 more scenarios:
      
      1.  Two operands of shuffle_vector are the same, like
      %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> %a, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
      
      2. One of operands is undef, like
      %shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
      
      After this patch, perm instructions will have chance to be emitted instead of lots of INS.
      
      llvm-svn: 199069
      21e8f1c4
    • Saleem Abdulrasool's avatar
      correct target directive handling error handling · a6505ca4
      Saleem Abdulrasool authored
      The target specific parser should return `false' if the target AsmParser handles
      the directive, and `true' if the generic parser should handle the directive.
      Many of the target specific directive handlers would `return Error' which does
      not follow these semantics.  This change simply changes the target specific
      routines to conform to the semantis of the ParseDirective correctly.
      
      Conformance to the semantics improves diagnostics emitted for the invalid
      directives.  X86 is taken as a sample to ensure that multiple diagnostics are
      not presented for a single error.
      
      llvm-svn: 199068
      a6505ca4
  2. Jan 12, 2014
  3. Jan 11, 2014
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Enable strided memory accesses versioning per default · 66c742ae
      Arnold Schwaighofer authored
      I saw no compile or execution time regressions on x86_64 -mavx -O3.
      
      radar://13075509
      
      llvm-svn: 199015
      66c742ae
    • Venkatraman Govindaraju's avatar
      [Sparc] Bundle instruction with delay slow and its filler. Now, we can use... · 0653218b
      Venkatraman Govindaraju authored
      [Sparc] Bundle instruction with delay slow and its filler. Now, we can use -verify-machineinstrs with SPARC backend.
      
      llvm-svn: 199014
      0653218b
    • Alp Toker's avatar
      Fix 'ned' typo in doc comment · 798060e0
      Alp Toker authored
      Patch by Jasper Neumann!
      
      llvm-svn: 199007
      798060e0
    • Chandler Carruth's avatar
      [PM] Add names to passes under the new pass manager, and a debug output · a13f27cc
      Chandler Carruth authored
      mode that can be used to debug the execution of everything.
      
      No support for analyses here, that will come later. This already helps
      show parts of the opt commandline integration that isn't working. Tests
      of that will start using it as the bugs are fixed.
      
      llvm-svn: 199004
      a13f27cc
    • NAKAMURA Takumi's avatar
      LoopVectorize.cpp: Appease MSC16. · 41c409ce
      NAKAMURA Takumi authored
      Excuse me, I hope msc16 builders would be fine till its end day.
      Introduce nullptr then. ;)
      
      llvm-svn: 199001
      41c409ce
    • Juergen Ributzka's avatar
      [anyregcc] Fix callee-save mask for anyregcc · 976d94b8
      Juergen Ributzka authored
      Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
      select the proper mask depending on the target cpu we compile for.
      
      llvm-svn: 198985
      976d94b8
    • Eric Christopher's avatar
      Revert r198979 - accidental commit. · 942f22c4
      Eric Christopher authored
      llvm-svn: 198981
      942f22c4
    • Eric Christopher's avatar
      Reformat. · ceec7b02
      Eric Christopher authored
      llvm-svn: 198980
      ceec7b02
    • Eric Christopher's avatar
      Update function name and add some helpful comments. · 67cde9ac
      Eric Christopher authored
      llvm-svn: 198979
      67cde9ac
    • Diego Novillo's avatar
      Extend and simplify the sample profile input file. · 9518b63b
      Diego Novillo authored
      1- Use the line_iterator class to read profile files.
      
      2- Allow comments in profile file. Lines starting with '#'
         are completely ignored while reading the profile.
      
      3- Add parsing support for discriminators and indirect call samples.
      
         Our external profiler can emit more profile information that we are
         currently not handling. This patch does not add new functionality to
         support this information, but it allows profile files to provide it.
      
         I will add actual support later on (for at least one of these
         features, I need support for DWARF discriminators in Clang).
      
         A sample line may contain the following additional information:
      
         Discriminator. This is used if the sampled program was compiled with
         DWARF discriminator support
         (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
         is currently only emitted by GCC and we just ignore it.
      
         Potential call targets and samples. If present, this line contains a
         call instruction. This models both direct and indirect calls. Each
         called target is listed together with the number of samples. For
         example,
      
                          130: 7  foo:3  bar:2  baz:7
      
         The above means that at relative line offset 130 there is a call
         instruction that calls one of foo(), bar() and baz(). With baz()
         being the relatively more frequent call target.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2355
      
      4- Simplify format of profile input file.
      
         This implements earlier suggestions to simplify the format of the
         sample profile file. The symbol table is not necessary and function
         profiles do not need to know the number of samples in advance.
      
         Differential Revision: http://llvm-reviews.chandlerc.com/D2419
      
      llvm-svn: 198973
      9518b63b
    • Diego Novillo's avatar
      Propagation of profile samples through the CFG. · 0accb3d2
      Diego Novillo authored
      This adds a propagation heuristic to convert instruction samples
      into branch weights. It implements a similar heuristic to the one
      implemented by Dehao Chen on GCC.
      
      The propagation proceeds in 3 phases:
      
      1- Assignment of block weights. All the basic blocks in the function
         are initial assigned the same weight as their most frequently
         executed instruction.
      
      2- Creation of equivalence classes. Since samples may be missing from
         blocks, we can fill in the gaps by setting the weights of all the
         blocks in the same equivalence class to the same weight. To compute
         the concept of equivalence, we use dominance and loop information.
         Two blocks B1 and B2 are in the same equivalence class if B1
         dominates B2, B2 post-dominates B1 and both are in the same loop.
      
      3- Propagation of block weights into edges. This uses a simple
         propagation heuristic. The following rules are applied to every
         block B in the CFG:
      
         - If B has a single predecessor/successor, then the weight
           of that edge is the weight of the block.
      
         - If all the edges are known except one, and the weight of the
           block is already known, the weight of the unknown edge will
           be the weight of the block minus the sum of all the known
           edges. If the sum of all the known edges is larger than B's weight,
           we set the unknown edge weight to zero.
      
         - If there is a self-referential edge, and the weight of the block is
           known, the weight for that edge is set to the weight of the block
           minus the weight of the other incoming edges to that block (if
           known).
      
      Since this propagation is not guaranteed to finalize for every CFG, we
      only allow it to proceed for a limited number of iterations (controlled
      by -sample-profile-max-propagate-iterations). It currently uses the same
      GCC default of 100.
      
      Before propagation starts, the pass builds (for each block) a list of
      unique predecessors and successors. This is necessary to handle
      identical edges in multiway branches. Since we visit all blocks and all
      edges of the CFG, it is cleaner to build these lists once at the start
      of the pass.
      
      Finally, the patch fixes the computation of relative line locations.
      The profiler emits lines relative to the function header. To discover
      it, we traverse the compilation unit looking for the subprogram
      corresponding to the function. The line number of that subprogram is the
      line where the function begins. That becomes line zero for all the
      relative locations.
      
      llvm-svn: 198972
      0accb3d2
Loading