Skip to content
  1. Mar 07, 2016
    • Adam Nemet's avatar
      Revert "Enable LoopLoadElimination by default" · 81113ef6
      Adam Nemet authored
      This reverts commit r262250.
      
      It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
      running with *ref* data set.  The error happens with
      "-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".
      
      llvm-svn: 262839
      81113ef6
  2. Mar 04, 2016
  3. Mar 03, 2016
  4. Mar 02, 2016
    • Chandler Carruth's avatar
      [AA] Hoist the logic to reformulate various AA queries in terms of other · 12884f7f
      Chandler Carruth authored
      parts of the AA interface out of the base class of every single AA
      result object.
      
      Because this logic reformulates the query in terms of some other aspect
      of the API, it would easily cause O(n^2) query patterns in alias
      analysis. These could in turn be magnified further based on the number
      of call arguments, and then further based on the number of AA queries
      made for a particular call. This ended up causing problems for Rust that
      were actually noticable enough to get a bug (PR26564) and probably other
      places as well.
      
      When originally re-working the AA infrastructure, the desire was to
      regularize the pattern of refinement without losing any generality.
      While I think it was successful, that is clearly proving to be too
      costly. And the cost is needless: we gain no actual improvement for this
      generality of making a direct query to tbaa actually be able to
      re-use some other alias analysis's refinement logic for one of the other
      APIs, or some such. In short, this is entirely wasted work.
      
      To the extent possible, delegation to other API surfaces should be done
      at the aggregation layer so that we can avoid re-walking the
      aggregation. In fact, this significantly simplifies the logic as we no
      longer need to smuggle the aggregation layer into each alias analysis
      (or the TargetLibraryInfo into each alias analysis just so we can form
      argument memory locations!).
      
      However, we also have some delegation logic inside of BasicAA and some
      of it even makes sense. When the delegation logic is baking in specific
      knowledge of aliasing properties of the LLVM IR, as opposed to simply
      reformulating the query to utilize a different alias analysis interface
      entry point, it makes a lot of sense to restrict that logic to
      a different layer such as BasicAA. So one aspect of the delegation that
      was in every AA base class is that when we don't have operand bundles,
      we re-use function AA results as a fallback for callsite alias results.
      This relies on the IR properties of calls and functions w.r.t. aliasing,
      and so seems a better fit to BasicAA. I've lifted the logic up to that
      point where it seems to be a natural fit. This still does a bit of
      redundant work (we query function attributes twice, once via the
      callsite and once via the function AA query) but it is *exactly* twice
      here, no more.
      
      The end result is that all of the delegation logic is hoisted out of the
      base class and into either the aggregation layer when it is a pure
      retargeting to a different API surface, or into BasicAA when it relies
      on the IR's aliasing properties. This should fix the quadratic query
      pattern reported in PR26564, although I don't have a stand-alone test
      case to reproduce it.
      
      It also seems general goodness. Now the numerous AAs that don't need
      target library info don't carry it around and depend on it. I think
      I can even rip out the general access to the aggregation layer and only
      expose that in BasicAA as it is the only place where we re-query in that
      manner.
      
      However, this is a non-trivial change to the AA infrastructure so I want
      to get some additional eyes on this before it lands. Sadly, it can't
      wait long because we should really cherry pick this into 3.8 if we're
      going to go this route.
      
      Differential Revision: http://reviews.llvm.org/D17329
      
      llvm-svn: 262490
      12884f7f
  5. Mar 01, 2016
  6. Feb 29, 2016
    • Adam Nemet's avatar
      Enable LoopLoadElimination by default · dd9e637a
      Adam Nemet authored
      Summary:
      I re-benchmarked this and results are similar to original results in
      D13259:
      
      On ARM64:
        SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
        SingleSource/Benchmarks/Polybench/stencils/adi                   -19.78%
      
      On x86:
        SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog  -27.14%
      
      And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
      Distribution.
      
      In terms of compile time, there is ~5% increase on both
      SingleSource/Benchmarks/Misc/oourafft and
      SingleSource/Benchmarks/Linkpack/linkpack-pc.  These are both very tiny
      loop-intensive programs where SCEV computations dominates compile time.
      
      The reason that time spent in SCEV increases has to do with the design
      of the old pass manager.  If a transform pass does not preserve an
      analysis we *invalidate* the analysis even if there was *no*
      modification made by the transform pass.
      
      This means that currently we don't take advantage of LLE and LV sharing
      the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
      for LLE.
      
      (There should be a way to work around this limitation in the case of
      SCEV and LAA since both compute things on demand and internally cache
      their result.  Thus we could pretend that transform passes preserve
      these analyses and manually invalidate them upon actual modification.
      On the other hand the new pass manager is supposed to solve so I am not
      sure if this is worthwhile.)
      
      Reviewers: hfinkel, dberlin
      
      Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D16300
      
      llvm-svn: 262250
      dd9e637a
  7. Feb 24, 2016
    • Artur Pilipenko's avatar
      NFC. Move isDereferenceable to Loads.h/cpp · 31bcca47
      Artur Pilipenko authored
      This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.   
      
      Reviewed By: hfinkel
      
      Differential Revision: http://reviews.llvm.org/D16180
      
      llvm-svn: 261736
      31bcca47
  8. Feb 22, 2016
  9. Feb 21, 2016
    • Duncan P. N. Exon Smith's avatar
      ADT: Remove == and != comparisons between ilist iterators and pointers · e9bc579c
      Duncan P. N. Exon Smith authored
      I missed == and != when I removed implicit conversions between iterators
      and pointers in r252380 since they were defined outside ilist_iterator.
      
      Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
      This commit removes all uses of these operators.  (I'll delete the
      operators themselves in a separate commit so that it can be easily
      reverted if necessary.)
      
      There should be NFC here.
      
      llvm-svn: 261498
      e9bc579c
  10. Feb 18, 2016
    • Richard Trieu's avatar
      Remove uses of builtin comma operator. · 7a083814
      Richard Trieu authored
      Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.
      
      llvm-svn: 261270
      7a083814
    • Chandler Carruth's avatar
      [PM] Port the PostOrderFunctionAttrs pass to the new pass manager and · 9c4ed175
      Chandler Carruth authored
      convert one test to use this.
      
      This is a particularly significant milestone because it required
      a working per-function AA framework which can be queried over each
      function from within a CGSCC transform pass (and additionally a module
      analysis to be accessible). This is essentially *the* point of the
      entire pass manager rewrite. A CGSCC transform is able to query for
      multiple different function's analysis results. It works. The whole
      thing appears to actually work and accomplish the original goal. While
      we were able to hack function attrs and basic-aa to "work" in the old
      pass manager, this port doesn't use any of that, it directly leverages
      the new fundamental functionality.
      
      For this to work, the CGSCC framework also has to support SCC-based
      behavior analysis, etc. The only part of the CGSCC pass infrastructure
      not sorted out at this point are the updates in the face of inlining and
      running function passes that mutate the call graph.
      
      The changes are pretty boring and boiler-plate. Most of the work was
      factored into more focused preperatory patches. But this is what wires
      it all together.
      
      llvm-svn: 261203
      9c4ed175
  11. Feb 17, 2016
    • Mehdi Amini's avatar
      Define the ThinLTO Pipeline (experimental) · 1db10ac6
      Mehdi Amini authored
      Summary:
      On the contrary to Full LTO, ThinLTO can afford to shift compile time
      from the frontend to the linker: both phases are parallel (even if
      it is not totally "free": projects like clang are reusing product
      from the "compile phase" for multiple link, think about
      libLLVMSupport reused for opt, llc, etc.).
      
      This pipeline is based on the proposal in D13443 for full LTO. We
      didn't move forward on this proposal because the LTO link was far too
      long after that. We believe that we can afford it with ThinLTO.
      
      The ThinLTO pipeline integrates in the regular O2/O3 flow:
      
       - The compile phase perform the inliner with a somehow lighter
         function simplification. (TODO: tune the inliner thresholds here)
         This is intendend to simplify the IR and get rid of obvious things
         like linkonce_odr that will be inlined.
       - The link phase will run the pipeline from the start, extended with
         some specific passes that leverage the augmented knowledge we have
         during LTO. Especially after the inliner is done, a sequence of
         globalDCE/globalOpt is performed, followed by another run of the
         "function simplification" passes. It is not clear if this part
         of the pipeline will stay as is, as the split model of ThinLTO
         does not allow the same benefit as FullLTO without added tricks.
      
      The measurements on the public test suite as well as on our internal
      suite show an overall net improvement. The binary size for the clang
      executable is reduced by 5%. We're still tuning it with the bringup
      of ThinLTO and it will evolve, but this should provide a good starting
      point.
      
      Reviewers: tejohnson
      
      Differential Revision: http://reviews.llvm.org/D17115
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 261029
      1db10ac6
  12. Feb 16, 2016
  13. Feb 13, 2016
    • Benjamin Kramer's avatar
    • Chandler Carruth's avatar
      [attrs] Move the norecurse deduction to operate on the node set rather · 632d208c
      Chandler Carruth authored
      than the SCC object, and have it scan the instruction stream directly
      rather than relying on call records.
      
      This makes the behavior of this routine consistent between libc routines
      and LLVM intrinsics for libc routines. We can go and start teaching it
      about those being norecurse, but we should behave the same for the
      intrinsic and the libc routine rather than differently. I chatted with
      James Molloy and the inconsistency doesn't seem intentional and likely
      is due to intrinsic calls not being modelled in the call graph analyses.
      
      This also fixes a bug where we would deduce norecurse on optnone
      functions, when generally we try to handle optnone functions as-if they
      were replaceable and thus unanalyzable.
      
      llvm-svn: 260813
      632d208c
  14. Feb 12, 2016
    • Chandler Carruth's avatar
      [attrs] Simplify the convergent removal to directly use the pre-built · 3937bc70
      Chandler Carruth authored
      node set rather than walking the SCC directly.
      
      This directly exposes the functions and has already had null entries
      filtered out. We also don't need need to handle optnone as it has
      already been handled in the caller -- we never try to remove convergent
      when there are optnone functions in the SCC.
      
      With this change, the code for removing convergent should work with the
      new pass manager and a different SCC analysis.
      
      llvm-svn: 260668
      3937bc70
    • Chandler Carruth's avatar
      [attrs] Consolidate the test for a non-SCC, non-convergent function call · 057df3d4
      Chandler Carruth authored
      with the test for a non-convergent intrinsic call.
      
      While it is possible to use the call records to search for function
      calls, we're going to do an instruction scan anyways to find the
      intrinsics, we can handle both cases while scanning instructions. This
      will also make the logic more amenable to the new pass manager which
      doesn't use the same call graph structure.
      
      My next patch will remove use of CallGraphNode entirely and allow this
      code to work with both the old and new pass manager. Fortunately, it
      should also get strictly simpler without changing functionality.
      
      llvm-svn: 260666
      057df3d4
    • Chandler Carruth's avatar
      [attrs] Run clang-format over a newly added routine in function-attrs · bbbbec0b
      Chandler Carruth authored
      before I update it to be friendly with the new pass manager.
      
      llvm-svn: 260653
      bbbbec0b
  15. Feb 11, 2016
    • Mehdi Amini's avatar
      Revert "Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()"" · 9c1c3ac6
      Mehdi Amini authored
      This reverts commit r260603.
      I didn't intend to push it :(
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260607
      9c1c3ac6
    • Mehdi Amini's avatar
      Revert "Define the ThinLTO Pipeline" · c5bf5ecc
      Mehdi Amini authored
      This reverts commit r260604.
      I didn't intend to push this now.
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260606
      c5bf5ecc
    • Mehdi Amini's avatar
      Define the ThinLTO Pipeline · 484470d6
      Mehdi Amini authored
      Summary:
      On the contrary to Full LTO, ThinLTO can afford to shift compile time
      from the frontend to the linker: both phases are parallel.
      This pipeline is based on the proposal in D13443 for full LTO. We ]
      didn't move forward on this proposal because the link was far too long
      after that.
      
      This patch refactor the "function simplification" passes that are part
      of the inliner loop in a helper function (this part is NFC and can be
      commited separately to simplify the diff). The ThinLTO pipeline
      integrates in the regular O2/O3 flow:
      
       - The compile phase perform the inliner with a somehow lighter
         function simplification. (TODO: tune the inliner thresholds here)
         This is intendend to simplify the IR and get rid of obvious things
         like linkonce_odr that will be inlined.
       - The link phase will run the pipeline from the start, extended with
         some specific passes that leverage the augmented knowledge we have
         during LTO. Especially after the inliner is done, a sequence of
         globalDCE/globalOpt is performed, followed by another run of the
         "function simplification" passes.
      
      The measurements on the public test suite as well as on our internal
      suite show an overall net improvement. The binary size for the clang
      executable is reduced by 5%. We're still tuning it with the bringup
      of ThinLTO but this should provide a good starting point.
      
      Reviewers: tejohnson
      
      Subscribers: joker.eph, llvm-commits, dexonsmith
      
      Differential Revision: http://reviews.llvm.org/D17115
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260604
      484470d6
    • Mehdi Amini's avatar
      Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" · f9a3718c
      Mehdi Amini authored
      It is intended to contains the passes run over a function after the
      inliner is done with a function and before it moves to its callers.
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260603
      f9a3718c
    • Ashutosh Nema's avatar
      Fixed typo in comment & coding style for LoopVersioningLICM. · 2260a3a0
      Ashutosh Nema authored
      llvm-svn: 260504
      2260a3a0
    • Teresa Johnson's avatar
      Fix Windows bot failure in Transforms/FunctionImport/funcimport.ll · 41806854
      Teresa Johnson authored
      Make sure we split ":" from the end of the global function id (which
      is <path>:<function> for local functions) instead of the beginning to
      avoid splitting at the wrong place for Windows file paths that contain
      a ":".
      
      llvm-svn: 260469
      41806854
    • Mehdi Amini's avatar
      FunctionImport: add a progressive heuristic to limit importing too deep in the callgraph · 40641748
      Mehdi Amini authored
      The current function importer will walk the callgraph, importing
      transitively any callee that is below the threshold. This can
      lead to import very deep which is costly in compile time and not
      necessarily beneficial as most of the inline would happen in
      imported function and not necessarilly in user code.
      
      The actual factor has been carefully chosen by flipping a coin ;)
      Some tuning need to be done (just at the existing limiting threshold).
      
      Reviewers: tejohnson
      
      Differential Revision: http://reviews.llvm.org/D17082
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260466
      40641748
    • Mehdi Amini's avatar
      Use a StringSet in Internalize, and allow to create the pass from an existing one (NFC) · c87d7d02
      Mehdi Amini authored
      There is not reason to pass an array of "char *" to rebuild a set if
      the client already has one.
      
      From: Mehdi Amini <mehdi.amini@apple.com>
      llvm-svn: 260462
      c87d7d02
  16. Feb 10, 2016
    • Teresa Johnson's avatar
      Restore "[ThinLTO] Use MD5 hash in function index." with fix · e1164de5
      Teresa Johnson authored
      This restores commit r260408, along with a fix for a bot failure.
      
      The bot failure was caused by dereferencing a unique_ptr in the same
      call instruction parameter list where it was passed via std::move.
      Apparently due to luck this was not exposed when I built the compiler
      with clang, only with gcc.
      
      llvm-svn: 260442
      e1164de5
    • Teresa Johnson's avatar
      Revert "[ThinLTO] Use MD5 hash in function index." due to bot failure · 89f38fb5
      Teresa Johnson authored
      This reverts commit r260408. Bot failure that I need to investigate.
      
      llvm-svn: 260412
      89f38fb5
    • Teresa Johnson's avatar
      [ThinLTO] Use MD5 hash in function index. · 0919a840
      Teresa Johnson authored
      Summary:
      This patch uses the lower 64-bits of the MD5 hash of a function name as
      a GUID in the function index, instead of storing function names. Any
      local functions are first given a global name by prepending the original
      source file name. This is the same naming scheme and GUID used by PGO in
      the indexed profile format.
      
      This change has a couple of benefits. The primary benefit is size
      reduction in the combined index file, for example 483.xalancbmk's
      combined index file was reduced by around 70%. It should also result in
      memory savings for the index file in memory, as the in-memory map is
      also indexed by the hash instead of the string.
      
      Second, this enables integration with indirect call promotion, since the
      indirect call profile targets are recorded using the same global naming
      convention and hash. This will enable the function importer to easily
      locate function summaries for indirect call profile targets to enable
      their import and subsequent promotion.
      
      The original source file name is recorded in the bitcode in a new
      module-level record for use in the ThinLTO backend pipeline.
      
      Reviewers: davidxl, joker.eph
      
      Subscribers: llvm-commits, joker.eph
      
      Differential Revision: http://reviews.llvm.org/D17028
      
      llvm-svn: 260408
      0919a840
    • Teresa Johnson's avatar
      [ThinLTO] Move global processing from Linker to TransformUtils (NFC) · 488a800a
      Teresa Johnson authored
      Summary:
      As discussed on IRC, move the ThinLTOGlobalProcessing code out of
      the linker, and into TransformUtils. The name of the class is changed
      to FunctionImportGlobalProcessing.
      
      Reviewers: joker.eph, rafael
      
      Subscribers: joker.eph, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17081
      
      llvm-svn: 260395
      488a800a
    • Justin Lebar's avatar
      Add convergent-removing bits to FunctionAttrs pass. · 260854bf
      Justin Lebar authored
      Summary:
      Remove the convergent attribute on any functions which provably do not
      contain or invoke any convergent functions.
      
      After this change, we'll be able to modify clang to conservatively add
      'convergent' to all functions when compiling CUDA.
      
      Reviewers:  jingyue, joker.eph
      
      Subscribers: llvm-commits, tra, jhen, hfinkel, resistor, chandlerc, arsenm
      
      Differential Revision: http://reviews.llvm.org/D17013
      
      llvm-svn: 260319
      260854bf
    • Peter Collingbourne's avatar
      Fix GCC build. · 9b656527
      Peter Collingbourne authored
      llvm-svn: 260317
      9b656527
  17. Feb 09, 2016
    • Peter Collingbourne's avatar
      WholeProgramDevirt: introduce. · df49d1bb
      Peter Collingbourne authored
      This pass implements whole program optimization of virtual calls in cases
      where we know (via bitset information) that the list of callees is fixed. This
      includes the following:
      
      - Single implementation devirtualization: if a virtual call has a single
        possible callee, replace all calls with a direct call to that callee.
      
      - Virtual constant propagation: if the virtual function's return type is an
        integer <=64 bits and all possible callees are readnone, for each class and
        each list of constant arguments: evaluate the function, store the return
        value alongside the virtual table, and rewrite each virtual call as a load
        from the virtual table.
      
      - Uniform return value optimization: if the conditions for virtual constant
        propagation hold and each function returns the same constant value, replace
        each virtual call with that constant.
      
      - Unique return value optimization for i1 return values: if the conditions
        for virtual constant propagation hold and a single vtable's function
        returns 0, or a single vtable's function returns 1, replace each virtual
        call with a comparison of the vptr against that vtable's address.
      
      Differential Revision: http://reviews.llvm.org/D16795
      
      llvm-svn: 260312
      df49d1bb
    • Sanjoy Das's avatar
      [FunctionAttrs] Fix SCC logic around operand bundles · 10c8a04b
      Sanjoy Das authored
      FunctionAttrs does an "optimistic" analysis of SCCs as a unit, which
      means normally it is able to disregard calls from an SCC into itself.
      However, calls and invokes with operand bundles are allowed to have
      memory effects not fully described by the memory effects on the call
      target, so we can't be optimistic around operand-bundled calls from an
      SCC into itself.
      
      llvm-svn: 260244
      10c8a04b
    • Sanjoy Das's avatar
      Add an "addUsedAAAnalyses" helper function · 1c481f50
      Sanjoy Das authored
      Summary:
      Passes that call `getAnalysisIfAvailable<T>` also need to call
      `addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
      legacy pass manager that it uses `T`.  This contract was being
      violated by passes that used `createLegacyPMAAResults`.  This change
      fixes this by exposing a helper in AliasAnalysis.h,
      `addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
      and does the right thing when called from `getAnalysisUsage`.
      
      Reviewers: chandlerc
      
      Subscribers: mcrosier, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D17010
      
      llvm-svn: 260183
      1c481f50
  18. Feb 06, 2016
    • Ashutosh Nema's avatar
      New Loop Versioning LICM Pass · df6763ab
      Ashutosh Nema authored
      Summary:
      When alias analysis is uncertain about the aliasing between any two accesses,
      it will return MayAlias. This uncertainty from alias analysis restricts LICM
      from proceeding further. In cases where alias analysis is uncertain we might
      use loop versioning as an alternative.
      
      Loop Versioning will create a version of the loop with aggressive aliasing
      assumptions in addition to the original with conservative (default) aliasing
      assumptions. The version of the loop making aggressive aliasing assumptions
      will have all the memory accesses marked as no-alias. These two versions of
      loop will be preceded by a memory runtime check. This runtime check consists
      of bound checks for all unique memory accessed in loop, and it ensures the
      lack of memory aliasing. The result of the runtime check determines which of
      the loop versions is executed: If the runtime check detects any memory
      aliasing, then the original loop is executed. Otherwise, the version with
      aggressive aliasing assumptions is used.
      
      The pass is off by default and can be enabled with command line option 
      -enable-loop-versioning-licm.
      
      Reviewers: hfinkel, anemet, chatur01, reames
      
      Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
                   llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D9151
      
      llvm-svn: 259986
      df6763ab
  19. Feb 03, 2016
Loading