Skip to content
  1. Mar 10, 2015
    • Mehdi Amini's avatar
      DataLayout is mandatory, update the API to reflect it with references. · a28d91d8
      Mehdi Amini authored
      Summary:
      Now that the DataLayout is a mandatory part of the module, let's start
      cleaning the codebase. This patch is a first attempt at doing that.
      
      This patch is not exactly NFC as for instance some places were passing
      a nullptr instead of the DataLayout, possibly just because there was a
      default value on the DataLayout argument to many functions in the API.
      Even though it is not purely NFC, there is no change in the
      validation.
      
      I turned as many pointer to DataLayout to references, this helped
      figuring out all the places where a nullptr could come up.
      
      I had initially a local version of this patch broken into over 30
      independant, commits but some later commit were cleaning the API and
      touching part of the code modified in the previous commits, so it
      seemed cleaner without the intermediate state.
      
      Test Plan:
      
      Reviewers: echristo
      
      Subscribers: llvm-commits
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 231740
      a28d91d8
  2. Mar 09, 2015
  3. Mar 06, 2015
    • Karthik Bhat's avatar
      Add a new pass "Loop Interchange" · 88db86dd
      Karthik Bhat authored
      This pass interchanges loops to provide a more cache-friendly memory access.
      
      For e.g. given a loop like -
        for(int i=0;i<N;i++)
          for(int j=0;j<N;j++)
            A[j][i] = A[j][i]+B[j][i];
      
      is interchanged to -
        for(int j=0;j<N;j++)
          for(int i=0;i<N;i++)
            A[j][i] = A[j][i]+B[j][i];
      
      This pass is currently disabled by default.
      
      To give a brief introduction it consists of 3 stages-
      
      LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
      LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
      LoopInterchangeTransform : Which does the actual transform.
      
      LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
      
      TODO:
      1) Add support for reductions and lcssa phi.
      2) Improve profitability model.
      3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
      4) Improve compile time regression found in llvm lnt due to this pass.
      5) Fix issues in Dependency Analysis module.
      
      A special thanks to Hal for reviewing this code.
      Review: http://reviews.llvm.org/D7499
      
      llvm-svn: 231458
      88db86dd
  4. Mar 04, 2015
    • Mehdi Amini's avatar
      Make DataLayout Non-Optional in the Module · 46a43556
      Mehdi Amini authored
      Summary:
      DataLayout keeps the string used for its creation.
      
      As a side effect it is no longer needed in the Module.
      This is "almost" NFC, the string is no longer
      canonicalized, you can't rely on two "equals" DataLayout
      having the same string returned by getStringRepresentation().
      
      Get rid of DataLayoutPass: the DataLayout is in the Module
      
      The DataLayout is "per-module", let's enforce this by not
      duplicating it more than necessary.
      One more step toward non-optionality of the DataLayout in the
      module.
      
      Make DataLayout Non-Optional in the Module
      
      Module->getDataLayout() will never returns nullptr anymore.
      
      Reviewers: echristo
      
      Subscribers: resistor, llvm-commits, jholewinski
      
      Differential Revision: http://reviews.llvm.org/D7992
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 231270
      46a43556
  5. Mar 03, 2015
    • Peter Collingbourne's avatar
      LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets. · da2dbf21
      Peter Collingbourne authored
      By loading from indexed offsets into a byte array and applying a mask, a
      program can test bits from the bit set with a relatively short instruction
      sequence. For example, suppose we have 15 bit sets to lay out:
      
      A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
      F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
      L (4 bits), M (3 bits), N (2 bits), O (1 bit)
      
      These bits can be laid out in a 16-byte array like this:
      
            Byte Offset
          0123456789ABCDEF
      Bit
        7 HHHHHHHHHIIIIIII
        6 GGGGGGGGGGJJJJJJ
        5 FFFFFFFFFFFKKKKK
        4 EEEEEEEEEEEELLLL
        3 DDDDDDDDDDDDDMMM
        2 CCCCCCCCCCCCCCNN
        1 BBBBBBBBBBBBBBBO
        0 AAAAAAAAAAAAAAAA
      
      For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
      test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
      in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
      
      This uses the LPT multiprocessor scheduling algorithm to lay out the bits
      efficiently.
      
      Saves ~450KB of instructions in a recent build of Chromium.
      
      Differential Revision: http://reviews.llvm.org/D7954
      
      llvm-svn: 231043
      da2dbf21
  6. Feb 28, 2015
  7. Feb 25, 2015
    • Peter Collingbourne's avatar
      LowerBitSets: Align referenced globals. · eba7f73f
      Peter Collingbourne authored
      This change aligns globals to the next highest power of 2 bytes, up to a
      maximum of 128. This makes it more likely that we will be able to compress
      bit sets with a greater alignment. In many more cases, we can now take
      advantage of a new optimization also introduced in this patch that removes
      bit set checks if the bit set is all ones.
      
      The 128 byte maximum was found to provide the best tradeoff between instruction
      overhead and data overhead in a recent build of Chromium. It allows us to
      remove ~2.4MB of instructions at the cost of ~250KB of data.
      
      Differential Revision: http://reviews.llvm.org/D7873
      
      llvm-svn: 230540
      eba7f73f
    • Peter Collingbourne's avatar
      LowerBitSets: Introduce global layout builder. · 1baeaa39
      Peter Collingbourne authored
      The builder is based on a layout algorithm that tries to keep members of
      small bit sets together. The new layout compresses Chromium's bit sets to
      around 15% of their original size.
      
      Differential Revision: http://reviews.llvm.org/D7796
      
      llvm-svn: 230394
      1baeaa39
  8. Feb 22, 2015
  9. Feb 20, 2015
    • Peter Collingbourne's avatar
      Introduce bitset metadata format and bitset lowering pass. · e6909c8e
      Peter Collingbourne authored
      This patch introduces a new mechanism that allows IR modules to co-operatively
      build pointer sets corresponding to addresses within a given set of
      globals. One particular use case for this is to allow a C++ program to
      efficiently verify (at each call site) that a vtable pointer is in the set
      of valid vtable pointers for the class or its derived classes. One way of
      doing this is for a toolchain component to build, for each class, a bit set
      that maps to the memory region allocated for the vtables, such that each 1
      bit in the bit set maps to a valid vtable for that class, and lay out the
      vtables next to each other, to minimize the total size of the bit sets.
      
      The patch introduces a metadata format for representing pointer sets, an
      '@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
      and builds the bitsets, and documents the new feature.
      
      Differential Revision: http://reviews.llvm.org/D7288
      
      llvm-svn: 230054
      e6909c8e
  10. Feb 17, 2015
    • Hal Finkel's avatar
      [BDCE] Add a bit-tracking DCE pass · 2bb61ba2
      Hal Finkel authored
      BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
      "aggressive DCE" pass), with the added capability to track dead bits of integer
      valued instructions and remove those instructions when all of the bits are
      dead.
      
      Currently, it does not actually do this all-bits-dead removal, but rather
      replaces the instruction's uses with a constant zero, and lets instcombine (and
      the later run of ADCE) do the rest. Because we essentially get a run of ADCE
      "for free" while tracking the dead bits, we also do what ADCE does and removes
      actually-dead instructions as well (this includes instructions newly trivially
      dead because all bits were dead, but not all such instructions can be removed).
      
      The motivation for this is a case like:
      
      int __attribute__((const)) foo(int i);
      int bar(int x) {
        x |= (4 & foo(5));
        x |= (8 & foo(3));
        x |= (16 & foo(2));
        x |= (32 & foo(1));
        x |= (64 & foo(0));
        x |= (128& foo(4));
        return x >> 4;
      }
      
      As it turns out, if you order the bit-field insertions so that all of the dead
      ones come last, then instcombine will remove them. However, if you pick some
      other order (such as the one above), the fact that some of the calls to foo()
      are useless is not locally obvious, and we don't remove them (without this
      pass).
      
      I did a quick compile-time overhead check using sqlite from the test suite
      (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
      twice as expensive as ADCE).
      
      I've not looked at why yet, but we eliminate instructions due to having
      all-dead bits in:
      External/SPEC/CFP2006/447.dealII/447.dealII
      External/SPEC/CINT2006/400.perlbench/400.perlbench
      External/SPEC/CINT2006/403.gcc/403.gcc
      MultiSource/Applications/ClamAV/clamscan
      MultiSource/Benchmarks/7zip/7zip-benchmark
      
      llvm-svn: 229462
      2bb61ba2
  11. Feb 16, 2015
  12. Feb 14, 2015
  13. Feb 13, 2015
    • Chandler Carruth's avatar
      [PM] Remove the old 'PassManager.h' header file at the top level of · 30d69c2e
      Chandler Carruth authored
      LLVM's include tree and the use of using declarations to hide the
      'legacy' namespace for the old pass manager.
      
      This undoes the primary modules-hostile change I made to keep
      out-of-tree targets building. I sent an email inquiring about whether
      this would be reasonable to do at this phase and people seemed fine with
      it, so making it a reality. This should allow us to start bootstrapping
      with modules to a certain extent along with making it easier to mix and
      match headers in general.
      
      The updates to any code for users of LLVM are very mechanical. Switch
      from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
      Qualify the types which now produce compile errors with "legacy::". The
      most common ones are "PassManager", "PassManagerBase", and
      "FunctionPassManager".
      
      llvm-svn: 229094
      30d69c2e
  14. Feb 12, 2015
    • Tim Northover's avatar
      DeadArgElim: aggregate Return assessment properly. · 02438033
      Tim Northover authored
      I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
      actually depends on the index too, which means we need to be careful about how
      the results are combined before return. In particular if a single Use returns
      Live, that counts for the entire object, at the granularity we're considering.
      
      llvm-svn: 228885
      02438033
  15. Feb 11, 2015
  16. Feb 10, 2015
    • Tim Northover's avatar
      DeadArgElim: arguments affect all returned sub-values by default. · 43c0d2db
      Tim Northover authored
      Unless we meet an insertvalue on a path from some value to a return, that value
      will be live if *any* of the return's components are live, so all of those
      components must be added to the MaybeLiveUses.
      
      Previously we were deleting arguments if sub-value 0 turned out to be dead.
      
      llvm-svn: 228731
      43c0d2db
  17. Feb 09, 2015
    • Tim Northover's avatar
      DeadArgElim: fix mismatch in accounting of array return types. · 705d2af9
      Tim Northover authored
      Some parts of DeadArgElim were only considering the individual fields
      of StructTypes separately, but others (where insertvalue &
      extractvalue instructions occur) also looked into ArrayTypes.
      
      This one is an actual bug; the mismatch can lead to an argument being
      considered used by a return sub-value that isn't being tracked (and
      hence is dead by default). It then gets incorrectly eliminated.
      
      llvm-svn: 228559
      705d2af9
    • Tim Northover's avatar
      DeadArgElim: assess uses of entire return value aggregate. · 854c927d
      Tim Northover authored
      Previously, a non-extractvalue use of an aggregate return value meant
      the entire return was considered live (the algorithm gave up
      entirely). This was correct, but conservative. It's better to actually
      look at that Use, making the analysis results apply to all sub-values
      under consideration.
      
      E.g.
      
        %val = call { i32, i32 } @whatever()
        [...]
        ret { i32, i32 } %val
      
      The return is using the entire aggregate (sub-values 0 and 1). We can
      still simplify @whatever if we can prove that this return is itself
      unused.
      
      Also unifies the logic slightly between aggregate and non-aggregate
      cases..
      
      llvm-svn: 228558
      854c927d
  18. Feb 04, 2015
  19. Jan 30, 2015
    • Chandler Carruth's avatar
      [PM] Sink the population of the pass manager with target-specific · 1efa12d6
      Chandler Carruth authored
      analyses back into the LTO code generator.
      
      The pass manager builder (and the transforms library in general)
      shouldn't be referencing the target machine at all.
      
      This makes the LTO population work like the others -- the data layout
      and target transform info need to be pre-populated.
      
      llvm-svn: 227576
      1efa12d6
  20. Jan 27, 2015
  21. Jan 19, 2015
  22. Jan 15, 2015
    • Chandler Carruth's avatar
      [PM] Separate the TargetLibraryInfo object from the immutable pass. · b98f63db
      Chandler Carruth authored
      The pass is really just a means of accessing a cached instance of the
      TargetLibraryInfo object, and this way we can re-use that object for the
      new pass manager as its result.
      
      Lots of delta, but nothing interesting happening here. This is the
      common pattern that is developing to allow analyses to live in both the
      old and new pass manager -- a wrapper pass in the old pass manager
      emulates the separation intrinsic to the new pass manager between the
      result and pass for analyses.
      
      llvm-svn: 226157
      b98f63db
    • NAKAMURA Takumi's avatar
      24ebfcb6
    • Chandler Carruth's avatar
      [PM] Move TargetLibraryInfo into the Analysis library. · 62d4215b
      Chandler Carruth authored
      While the term "Target" is in the name, it doesn't really have to do
      with the LLVM Target library -- this isn't an abstraction which LLVM
      targets generally need to implement or extend. It has much more to do
      with modeling the various runtime libraries on different OSes and with
      different runtime environments. The "target" in this sense is the more
      general sense of a target of cross compilation.
      
      This is in preparation for porting this analysis to the new pass
      manager.
      
      No functionality changed, and updates inbound for Clang and Polly.
      
      llvm-svn: 226078
      62d4215b
  23. Jan 13, 2015
    • Ramkumar Ramachandra's avatar
      Standardize {pred,succ,use,user}_empty() · 40c3e03e
      Ramkumar Ramachandra authored
      The functions {pred,succ,use,user}_{begin,end} exist, but many users
      have to check *_begin() with *_end() by hand to determine if the
      BasicBlock or User is empty. Fix this with a standard *_empty(),
      demonstrating a few usecases.
      
      llvm-svn: 225760
      40c3e03e
  24. Jan 04, 2015
    • Chandler Carruth's avatar
      [PM] Split the AssumptionTracker immutable pass into two separate APIs: · 66b3130c
      Chandler Carruth authored
      a cache of assumptions for a single function, and an immutable pass that
      manages those caches.
      
      The motivation for this change is two fold. Immutable analyses are
      really hacks around the current pass manager design and don't exist in
      the new design. This is usually OK, but it requires that the core logic
      of an immutable pass be reasonably partitioned off from the pass logic.
      This change does precisely that. As a consequence it also paves the way
      for the *many* utility functions that deal in the assumptions to live in
      both pass manager worlds by creating an separate non-pass object with
      its own independent API that they all rely on. Now, the only bits of the
      system that deal with the actual pass mechanics are those that actually
      need to deal with the pass mechanics.
      
      Once this separation is made, several simplifications become pretty
      obvious in the assumption cache itself. Rather than using a set and
      callback value handles, it can just be a vector of weak value handles.
      The callers can easily skip the handles that are null, and eventually we
      can wrap all of this up behind a filter iterator.
      
      For now, this adds boiler plate to the various passes, but this kind of
      boiler plate will end up making it possible to port these passes to the
      new pass manager, and so it will end up factored away pretty reasonably.
      
      llvm-svn: 225131
      66b3130c
  25. Dec 15, 2014
  26. Dec 09, 2014
    • Duncan P. N. Exon Smith's avatar
      IR: Split Metadata from Value · 5bf8fef5
      Duncan P. N. Exon Smith authored
      Split `Metadata` away from the `Value` class hierarchy, as part of
      PR21532.  Assembly and bitcode changes are in the wings, but this is the
      bulk of the change for the IR C++ API.
      
      I have a follow-up patch prepared for `clang`.  If this breaks other
      sub-projects, I apologize in advance :(.  Help me compile it on Darwin
      I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
      be simpler to just fix it yourself.
      
      This breaks the build for all metadata-related code that's out-of-tree.
      Rest assured the transition is mechanical and the compiler should catch
      almost all of the problems.
      
      Here's a quick guide for updating your code:
      
        - `Metadata` is the root of a class hierarchy with three main classes:
          `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
          the `Value` class hierarchy.  It is typeless -- i.e., instances do
          *not* have a `Type`.
      
        - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
      
        - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
          replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
      
          If you're referring solely to resolved `MDNode`s -- post graph
          construction -- just use `MDNode*`.
      
        - `MDNode` (and the rest of `Metadata`) have only limited support for
          `replaceAllUsesWith()`.
      
          As long as an `MDNode` is pointing at a forward declaration -- the
          result of `MDNode::getTemporary()` -- it maintains a side map of its
          uses and can RAUW itself.  Once the forward declarations are fully
          resolved RAUW support is dropped on the ground.  This means that
          uniquing collisions on changing operands cause nodes to become
          "distinct".  (This already happened fairly commonly, whenever an
          operand went to null.)
      
          If you're constructing complex (non self-reference) `MDNode` cycles,
          you need to call `MDNode::resolveCycles()` on each node (or on a
          top-level node that somehow references all of the nodes).  Also,
          don't do that.  Metadata cycles (and the RAUW machinery needed to
          construct them) are expensive.
      
        - An `MDNode` can only refer to a `Constant` through a bridge called
          `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
      
          As a side effect, accessing an operand of an `MDNode` that is known
          to be, e.g., `ConstantInt`, takes three steps: first, cast from
          `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
          third, cast down to `ConstantInt`.
      
          The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
          metadata schema owners transition away from using `Constant`s when
          the type isn't important (and they don't care about referring to
          `GlobalValue`s).
      
          In the meantime, I've added transitional API to the `mdconst`
          namespace that matches semantics with the old code, in order to
          avoid adding the error-prone three-step equivalent to every call
          site.  If your old code was:
      
              MDNode *N = foo();
              bar(isa             <ConstantInt>(N->getOperand(0)));
              baz(cast            <ConstantInt>(N->getOperand(1)));
              bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
              bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
              bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
      
          you can trivially match its semantics with:
      
              MDNode *N = foo();
              bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
              baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
              bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
              bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
              bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
      
          and when you transition your metadata schema to `MDInt`:
      
              MDNode *N = foo();
              bar(isa             <MDInt>(N->getOperand(0)));
              baz(cast            <MDInt>(N->getOperand(1)));
              bak(cast_or_null    <MDInt>(N->getOperand(2)));
              bat(dyn_cast        <MDInt>(N->getOperand(3)));
              bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
      
        - A `CallInst` -- specifically, intrinsic instructions -- can refer to
          metadata through a bridge called `MetadataAsValue`.  This is a
          subclass of `Value` where `getType()->isMetadataTy()`.
      
          `MetadataAsValue` is the *only* class that can legally refer to a
          `LocalAsMetadata`, which is a bridged form of non-`Constant` values
          like `Argument` and `Instruction`.  It can also refer to any other
          `Metadata` subclass.
      
      (I'll break all your testcases in a follow-up commit, when I propagate
      this change to assembly.)
      
      llvm-svn: 223802
      5bf8fef5
  27. Dec 03, 2014
    • Peter Collingbourne's avatar
      Prologue support · 51d2de7b
      Peter Collingbourne authored
      Patch by Ben Gamari!
      
      This redefines the `prefix` attribute introduced previously and
      introduces a `prologue` attribute.  There are a two primary usecases
      that these attributes aim to serve,
      
        1. Function prologue sigils
      
        2. Function hot-patching: Enable the user to insert `nop` operations
           at the beginning of the function which can later be safely replaced
           with a call to some instrumentation facility
      
        3. Runtime metadata: Allow a compiler to insert data for use by the
           runtime during execution. GHC is one example of a compiler that
           needs this functionality for its tables-next-to-code functionality.
      
      Previously `prefix` served cases (1) and (2) quite well by allowing the user
      to introduce arbitrary data at the entrypoint but before the function
      body. Case (3), however, was poorly handled by this approach as it
      required that prefix data was valid executable code.
      
      Here we redefine the notion of prefix data to instead be data which
      occurs immediately before the function entrypoint (i.e. the symbol
      address). Since prefix data now occurs before the function entrypoint,
      there is no need for the data to be valid code.
      
      The previous notion of prefix data now goes under the name "prologue
      data" to emphasize its duality with the function epilogue.
      
      The intention here is to handle cases (1) and (2) with prologue data and
      case (3) with prefix data.
      
      References
      ----------
      
      This idea arose out of discussions[1] with Reid Kleckner in response to a
      proposal to introduce the notion of symbol offsets to enable handling of
      case (3).
      
      [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
      
      Test Plan: testsuite
      
      Differential Revision: http://reviews.llvm.org/D6454
      
      llvm-svn: 223189
      51d2de7b
  28. Nov 21, 2014
  29. Nov 19, 2014
  30. Oct 28, 2014
  31. Oct 26, 2014
    • Arnold Schwaighofer's avatar
      Add an option to the LTO code generator to disable vectorization during LTO · eb1a38fa
      Arnold Schwaighofer authored
      We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
      
      r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
      'SLPVectorize' out of the desire to be able to disable vectorization using the
      cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
      fields default to.
      Unfortunately, this turns off vectorization because those fields
      default to false.
      This commit adds flags to the LTO library to disable lto vectorization which
      reconciles the desire to optionally disable vectorization during LTO and
      the desired behavior of defaulting to enabled vectorization.
      
      We really want tools to set PassManager flags directly to enable/disable
      vectorization and not go the route via cl::opt flags *in*
      PassManagerBuilder.cpp.
      
      llvm-svn: 220652
      eb1a38fa
  32. Oct 24, 2014
  33. Oct 22, 2014
    • JF Bastien's avatar
      LTO: respect command-line options that disable vectorization. · f42a6ea5
      JF Bastien authored
      Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags.
      
      Reviewers: nadav, aschwaighofer, yijiang
      
      Subscribers: llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D5884
      
      llvm-svn: 220345
      f42a6ea5
Loading