Skip to content
  1. Mar 19, 2015
  2. Mar 17, 2015
  3. Mar 14, 2015
  4. Mar 12, 2015
  5. Mar 10, 2015
    • Michael Zolotukhin's avatar
      Enable loop-rotate before loop-vectorize by default · 267e12f7
      Michael Zolotukhin authored
      llvm-svn: 231820
      267e12f7
    • Sanjay Patel's avatar
      remove function names from comments; NFC · f1b0db15
      Sanjay Patel authored
      llvm-svn: 231801
      f1b0db15
    • Mehdi Amini's avatar
      DataLayout is mandatory, update the API to reflect it with references. · a28d91d8
      Mehdi Amini authored
      Summary:
      Now that the DataLayout is a mandatory part of the module, let's start
      cleaning the codebase. This patch is a first attempt at doing that.
      
      This patch is not exactly NFC as for instance some places were passing
      a nullptr instead of the DataLayout, possibly just because there was a
      default value on the DataLayout argument to many functions in the API.
      Even though it is not purely NFC, there is no change in the
      validation.
      
      I turned as many pointer to DataLayout to references, this helped
      figuring out all the places where a nullptr could come up.
      
      I had initially a local version of this patch broken into over 30
      independant, commits but some later commit were cleaning the API and
      touching part of the code modified in the previous commits, so it
      seemed cleaner without the intermediate state.
      
      Test Plan:
      
      Reviewers: echristo
      
      Subscribers: llvm-commits
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 231740
      a28d91d8
  6. Mar 09, 2015
  7. Mar 06, 2015
    • Karthik Bhat's avatar
      Add a new pass "Loop Interchange" · 88db86dd
      Karthik Bhat authored
      This pass interchanges loops to provide a more cache-friendly memory access.
      
      For e.g. given a loop like -
        for(int i=0;i<N;i++)
          for(int j=0;j<N;j++)
            A[j][i] = A[j][i]+B[j][i];
      
      is interchanged to -
        for(int j=0;j<N;j++)
          for(int i=0;i<N;i++)
            A[j][i] = A[j][i]+B[j][i];
      
      This pass is currently disabled by default.
      
      To give a brief introduction it consists of 3 stages-
      
      LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
      LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
      LoopInterchangeTransform : Which does the actual transform.
      
      LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
      
      TODO:
      1) Add support for reductions and lcssa phi.
      2) Improve profitability model.
      3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
      4) Improve compile time regression found in llvm lnt due to this pass.
      5) Fix issues in Dependency Analysis module.
      
      A special thanks to Hal for reviewing this code.
      Review: http://reviews.llvm.org/D7499
      
      llvm-svn: 231458
      88db86dd
  8. Mar 04, 2015
    • Mehdi Amini's avatar
      Make DataLayout Non-Optional in the Module · 46a43556
      Mehdi Amini authored
      Summary:
      DataLayout keeps the string used for its creation.
      
      As a side effect it is no longer needed in the Module.
      This is "almost" NFC, the string is no longer
      canonicalized, you can't rely on two "equals" DataLayout
      having the same string returned by getStringRepresentation().
      
      Get rid of DataLayoutPass: the DataLayout is in the Module
      
      The DataLayout is "per-module", let's enforce this by not
      duplicating it more than necessary.
      One more step toward non-optionality of the DataLayout in the
      module.
      
      Make DataLayout Non-Optional in the Module
      
      Module->getDataLayout() will never returns nullptr anymore.
      
      Reviewers: echristo
      
      Subscribers: resistor, llvm-commits, jholewinski
      
      Differential Revision: http://reviews.llvm.org/D7992
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 231270
      46a43556
  9. Mar 03, 2015
    • Peter Collingbourne's avatar
      LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets. · da2dbf21
      Peter Collingbourne authored
      By loading from indexed offsets into a byte array and applying a mask, a
      program can test bits from the bit set with a relatively short instruction
      sequence. For example, suppose we have 15 bit sets to lay out:
      
      A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
      F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
      L (4 bits), M (3 bits), N (2 bits), O (1 bit)
      
      These bits can be laid out in a 16-byte array like this:
      
            Byte Offset
          0123456789ABCDEF
      Bit
        7 HHHHHHHHHIIIIIII
        6 GGGGGGGGGGJJJJJJ
        5 FFFFFFFFFFFKKKKK
        4 EEEEEEEEEEEELLLL
        3 DDDDDDDDDDDDDMMM
        2 CCCCCCCCCCCCCCNN
        1 BBBBBBBBBBBBBBBO
        0 AAAAAAAAAAAAAAAA
      
      For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
      test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
      in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
      
      This uses the LPT multiprocessor scheduling algorithm to lay out the bits
      efficiently.
      
      Saves ~450KB of instructions in a recent build of Chromium.
      
      Differential Revision: http://reviews.llvm.org/D7954
      
      llvm-svn: 231043
      da2dbf21
  10. Feb 28, 2015
  11. Feb 25, 2015
    • Peter Collingbourne's avatar
      LowerBitSets: Align referenced globals. · eba7f73f
      Peter Collingbourne authored
      This change aligns globals to the next highest power of 2 bytes, up to a
      maximum of 128. This makes it more likely that we will be able to compress
      bit sets with a greater alignment. In many more cases, we can now take
      advantage of a new optimization also introduced in this patch that removes
      bit set checks if the bit set is all ones.
      
      The 128 byte maximum was found to provide the best tradeoff between instruction
      overhead and data overhead in a recent build of Chromium. It allows us to
      remove ~2.4MB of instructions at the cost of ~250KB of data.
      
      Differential Revision: http://reviews.llvm.org/D7873
      
      llvm-svn: 230540
      eba7f73f
    • Peter Collingbourne's avatar
      LowerBitSets: Introduce global layout builder. · 1baeaa39
      Peter Collingbourne authored
      The builder is based on a layout algorithm that tries to keep members of
      small bit sets together. The new layout compresses Chromium's bit sets to
      around 15% of their original size.
      
      Differential Revision: http://reviews.llvm.org/D7796
      
      llvm-svn: 230394
      1baeaa39
  12. Feb 22, 2015
  13. Feb 20, 2015
    • Peter Collingbourne's avatar
      Introduce bitset metadata format and bitset lowering pass. · e6909c8e
      Peter Collingbourne authored
      This patch introduces a new mechanism that allows IR modules to co-operatively
      build pointer sets corresponding to addresses within a given set of
      globals. One particular use case for this is to allow a C++ program to
      efficiently verify (at each call site) that a vtable pointer is in the set
      of valid vtable pointers for the class or its derived classes. One way of
      doing this is for a toolchain component to build, for each class, a bit set
      that maps to the memory region allocated for the vtables, such that each 1
      bit in the bit set maps to a valid vtable for that class, and lay out the
      vtables next to each other, to minimize the total size of the bit sets.
      
      The patch introduces a metadata format for representing pointer sets, an
      '@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
      and builds the bitsets, and documents the new feature.
      
      Differential Revision: http://reviews.llvm.org/D7288
      
      llvm-svn: 230054
      e6909c8e
  14. Feb 17, 2015
    • Hal Finkel's avatar
      [BDCE] Add a bit-tracking DCE pass · 2bb61ba2
      Hal Finkel authored
      BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
      "aggressive DCE" pass), with the added capability to track dead bits of integer
      valued instructions and remove those instructions when all of the bits are
      dead.
      
      Currently, it does not actually do this all-bits-dead removal, but rather
      replaces the instruction's uses with a constant zero, and lets instcombine (and
      the later run of ADCE) do the rest. Because we essentially get a run of ADCE
      "for free" while tracking the dead bits, we also do what ADCE does and removes
      actually-dead instructions as well (this includes instructions newly trivially
      dead because all bits were dead, but not all such instructions can be removed).
      
      The motivation for this is a case like:
      
      int __attribute__((const)) foo(int i);
      int bar(int x) {
        x |= (4 & foo(5));
        x |= (8 & foo(3));
        x |= (16 & foo(2));
        x |= (32 & foo(1));
        x |= (64 & foo(0));
        x |= (128& foo(4));
        return x >> 4;
      }
      
      As it turns out, if you order the bit-field insertions so that all of the dead
      ones come last, then instcombine will remove them. However, if you pick some
      other order (such as the one above), the fact that some of the calls to foo()
      are useless is not locally obvious, and we don't remove them (without this
      pass).
      
      I did a quick compile-time overhead check using sqlite from the test suite
      (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
      twice as expensive as ADCE).
      
      I've not looked at why yet, but we eliminate instructions due to having
      all-dead bits in:
      External/SPEC/CFP2006/447.dealII/447.dealII
      External/SPEC/CINT2006/400.perlbench/400.perlbench
      External/SPEC/CINT2006/403.gcc/403.gcc
      MultiSource/Applications/ClamAV/clamscan
      MultiSource/Benchmarks/7zip/7zip-benchmark
      
      llvm-svn: 229462
      2bb61ba2
  15. Feb 16, 2015
  16. Feb 14, 2015
  17. Feb 13, 2015
    • Chandler Carruth's avatar
      [PM] Remove the old 'PassManager.h' header file at the top level of · 30d69c2e
      Chandler Carruth authored
      LLVM's include tree and the use of using declarations to hide the
      'legacy' namespace for the old pass manager.
      
      This undoes the primary modules-hostile change I made to keep
      out-of-tree targets building. I sent an email inquiring about whether
      this would be reasonable to do at this phase and people seemed fine with
      it, so making it a reality. This should allow us to start bootstrapping
      with modules to a certain extent along with making it easier to mix and
      match headers in general.
      
      The updates to any code for users of LLVM are very mechanical. Switch
      from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
      Qualify the types which now produce compile errors with "legacy::". The
      most common ones are "PassManager", "PassManagerBase", and
      "FunctionPassManager".
      
      llvm-svn: 229094
      30d69c2e
  18. Feb 12, 2015
    • Tim Northover's avatar
      DeadArgElim: aggregate Return assessment properly. · 02438033
      Tim Northover authored
      I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
      actually depends on the index too, which means we need to be careful about how
      the results are combined before return. In particular if a single Use returns
      Live, that counts for the entire object, at the granularity we're considering.
      
      llvm-svn: 228885
      02438033
  19. Feb 11, 2015
  20. Feb 10, 2015
    • Tim Northover's avatar
      DeadArgElim: arguments affect all returned sub-values by default. · 43c0d2db
      Tim Northover authored
      Unless we meet an insertvalue on a path from some value to a return, that value
      will be live if *any* of the return's components are live, so all of those
      components must be added to the MaybeLiveUses.
      
      Previously we were deleting arguments if sub-value 0 turned out to be dead.
      
      llvm-svn: 228731
      43c0d2db
  21. Feb 09, 2015
    • Tim Northover's avatar
      DeadArgElim: fix mismatch in accounting of array return types. · 705d2af9
      Tim Northover authored
      Some parts of DeadArgElim were only considering the individual fields
      of StructTypes separately, but others (where insertvalue &
      extractvalue instructions occur) also looked into ArrayTypes.
      
      This one is an actual bug; the mismatch can lead to an argument being
      considered used by a return sub-value that isn't being tracked (and
      hence is dead by default). It then gets incorrectly eliminated.
      
      llvm-svn: 228559
      705d2af9
    • Tim Northover's avatar
      DeadArgElim: assess uses of entire return value aggregate. · 854c927d
      Tim Northover authored
      Previously, a non-extractvalue use of an aggregate return value meant
      the entire return was considered live (the algorithm gave up
      entirely). This was correct, but conservative. It's better to actually
      look at that Use, making the analysis results apply to all sub-values
      under consideration.
      
      E.g.
      
        %val = call { i32, i32 } @whatever()
        [...]
        ret { i32, i32 } %val
      
      The return is using the entire aggregate (sub-values 0 and 1). We can
      still simplify @whatever if we can prove that this return is itself
      unused.
      
      Also unifies the logic slightly between aggregate and non-aggregate
      cases..
      
      llvm-svn: 228558
      854c927d
  22. Feb 04, 2015
  23. Jan 30, 2015
    • Chandler Carruth's avatar
      [PM] Sink the population of the pass manager with target-specific · 1efa12d6
      Chandler Carruth authored
      analyses back into the LTO code generator.
      
      The pass manager builder (and the transforms library in general)
      shouldn't be referencing the target machine at all.
      
      This makes the LTO population work like the others -- the data layout
      and target transform info need to be pre-populated.
      
      llvm-svn: 227576
      1efa12d6
  24. Jan 27, 2015
  25. Jan 19, 2015
Loading