Skip to content
  1. Apr 03, 2014
  2. Apr 01, 2014
    • Hal Finkel's avatar
      Move partial/runtime unrolling late in the pipeline · 86b3064f
      Hal Finkel authored
      The generic (concatenation) loop unroller is currently placed early in the
      standard optimization pipeline. This is a good place to perform full unrolling,
      but not the right place to perform partial/runtime unrolling. However, most
      targets don't enable partial/runtime unrolling, so this never mattered.
      
      However, even some x86 cores benefit from partial/runtime unrolling of very
      small loops, and follow-up commits will enable this. First, we need to move
      partial/runtime unrolling late in the optimization pipeline (importantly, this
      is after SLP and loop vectorization, as vectorization can drastically change
      the size of a loop), while keeping the full unrolling where it is now. This
      change does just that.
      
      llvm-svn: 205264
      86b3064f
  3. Mar 30, 2014
    • Rafael Espindola's avatar
      Add a missing break. · 5e66a7e6
      Rafael Espindola authored
      Patch by Tobias Güntner.
      
      I tried to write a test, but the only difference is the Changed value that
      gets returned. It can be tested with "opt -debug-pass=Executions -functionattrs,
      but that doesn't seem worth it.
      
      llvm-svn: 205121
      5e66a7e6
  4. Mar 23, 2014
  5. Mar 18, 2014
  6. Mar 17, 2014
    • Dan Gohman's avatar
      Use range metadata instead of introducing selects. · 172c5d34
      Dan Gohman authored
      When GlobalOpt has determined that a GlobalVariable only ever has two values,
      it would convert the GlobalVariable to a boolean, and introduce SelectInsts
      at every load, to choose between the two possible values. These SelectInsts
      introduce overhead and other unpleasantness.
      
      This patch makes GlobalOpt just add range metadata to loads from such
      GlobalVariables instead. This enables the same main optimization (as seen in
      test/Transforms/GlobalOpt/integer-bool.ll), without introducing selects.
      
      The main downside is that it doesn't get the memory savings of shrinking such
      GlobalVariables, but this is expected to be negligible.
      
      llvm-svn: 204076
      172c5d34
  7. Mar 14, 2014
  8. Mar 13, 2014
    • Stepan Dyatkovskiy's avatar
      First patch of patch series that improves MergeFunctions performance time from O(N*N) to · d8eb0bcb
      Stepan Dyatkovskiy authored
      O(N*log(N)). The idea is to introduce total ordering among functions set.
      That allows to build binary tree and perform function look-up procedure in O(log(N)) time. 
      
      This patch description:
      Introduced total ordering among Type instances. Actually it is improvement for existing
      isEquivalentType.
      0. Coerce pointer of 0 address space to integer.
      1. If left and right types are equal (the same Type* value), return 0 (means equal).
      2. If types are of different kind (different type IDs). Return result of type IDs
      comparison, treating them as numbers.
      3. If types are vectors or integers, return result of its
      pointers comparison (casted to numbers).
      4. Check whether type ID belongs to the next group: 
      * Void 
      * Float 
      * Double 
      * X86_FP80 
      * FP128 
      * PPC_FP128 
      * Label 
      * Metadata 
      If so, return 0.
      5. If left and right are pointers, return result of address space
      comparison (numbers comparison).
      6. If types are complex.
      Then both LEFT and RIGHT will be expanded and their element types will be checked with
      the same way. If we get Res != 0 on some stage, return it. Otherwise return 0.
      7. For all other cases put llvm_unreachable.
      
      llvm-svn: 203788
      d8eb0bcb
  9. Mar 12, 2014
    • Eli Bendersky's avatar
      Revive SizeOptLevel-explaining comments that were dropped in r203669 · 95b540f2
      Eli Bendersky authored
      llvm-svn: 203675
      95b540f2
    • Eli Bendersky's avatar
      Move duplicated code into a helper function (exposed through overload). · 49f65652
      Eli Bendersky authored
      There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
      computes the inliner threshold from opt level and size opt level.
      
      This patch moves the code to a function that lives alongside the inliner itself,
      providing a convenient overload to the inliner creation.
      
      A separate patch can be committed to Clang to use this once it's committed to
      LLVM. Standalone tools that use the inlining pass can also avoid duplicating
      this code and fearing it will go out of sync.
      
      Note: this patch also restructures the conditinal logic of the computation to
      be cleaner.
      
      llvm-svn: 203669
      49f65652
  10. Mar 11, 2014
    • Tim Northover's avatar
      IR: add a second ordering operand to cmpxhg for failure · e94a518a
      Tim Northover authored
      The syntax for "cmpxchg" should now look something like:
      
      	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
      
      where the second ordering argument gives the required semantics in the case
      that no exchange takes place. It should be no stronger than the first ordering
      constraint and cannot be either "release" or "acq_rel" (since no store will
      have taken place).
      
      rdar://problem/15996804
      
      llvm-svn: 203559
      e94a518a
  11. Mar 09, 2014
    • Chandler Carruth's avatar
      [C++11] Add range based accessors for the Use-Def chain of a Value. · cdf47884
      Chandler Carruth authored
      This requires a number of steps.
      1) Move value_use_iterator into the Value class as an implementation
         detail
      2) Change it to actually be a *Use* iterator rather than a *User*
         iterator.
      3) Add an adaptor which is a User iterator that always looks through the
         Use to the User.
      4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
      5) Add the range adaptors as Value::uses() and Value::users().
      6) Update *all* of the callers to correctly distinguish between whether
         they wanted a use_iterator (and to explicitly dig out the User when
         needed), or a user_iterator which makes the Use itself totally
         opaque.
      
      Because #6 requires churning essentially everything that walked the
      Use-Def chains, I went ahead and added all of the range adaptors and
      switched them to range-based loops where appropriate. Also because the
      renaming requires at least churning every line of code, it didn't make
      any sense to split these up into multiple commits -- all of which would
      touch all of the same lies of code.
      
      The result is still not quite optimal. The Value::use_iterator is a nice
      regular iterator, but Value::user_iterator is an iterator over User*s
      rather than over the User objects themselves. As a consequence, it fits
      a bit awkwardly into the range-based world and it has the weird
      extra-dereferencing 'operator->' that so many of our iterators have.
      I think this could be fixed by providing something which transforms
      a range of T&s into a range of T*s, but that *can* be separated into
      another patch, and it isn't yet 100% clear whether this is the right
      move.
      
      However, this change gets us most of the benefit and cleans up
      a substantial amount of code around Use and User. =]
      
      llvm-svn: 203364
      cdf47884
  12. Mar 07, 2014
  13. Mar 06, 2014
  14. Mar 05, 2014
  15. Mar 04, 2014
  16. Mar 03, 2014
  17. Mar 02, 2014
  18. Feb 28, 2014
  19. Feb 26, 2014
  20. Feb 25, 2014
  21. Feb 24, 2014
    • Arnold Schwaighofer's avatar
      LTO: Add the loop vectorizer to the LTO pipeline. · 6ccda923
      Arnold Schwaighofer authored
      During the LTO phase LICM will move loop invariant global variables out of loops
      (informed by GlobalModRef). This makes more loops countable presenting
      opportunity for the loop vectorizer.
      
      Adding the loop vectorizer improves some TSVC benchmarks and twolf/ref dataset
      (5%) on x86-64.
      
      radar://15970632
      
      llvm-svn: 202051
      6ccda923
  22. Feb 21, 2014
  23. Feb 13, 2014
    • Reid Kleckner's avatar
      GlobalOpt: Aliases don't have sections, don't copy them when replacing · 22b19da9
      Reid Kleckner authored
      As defined in LangRef, aliases do not have sections.  However, LLVM's
      GlobalAlias class inherits from GlobalValue, which means we can read and
      set its section.  We should probably ban that as a separate change,
      since it doesn't make much sense for an alias to have a section that
      differs from its aliasee.
      
      Fixes PR18757, where the section was being lost on the global in code
      from Clang like:
      
      extern "C" {
      __attribute__((used, section("CUSTOM"))) static int in_custom_section;
      }
      
      Reviewers: rafael.espindola
      
      Differential Revision: http://llvm-reviews.chandlerc.com/D2758
      
      llvm-svn: 201286
      22b19da9
  24. Feb 06, 2014
    • Manman Ren's avatar
      Set default of inlinecold-threshold to 225. · d4612449
      Manman Ren authored
      225 is the default value of inline-threshold. This change will make sure
      we have the same inlining behavior as prior to r200886.
      
      As Chandler points out, even though we don't have code in our testing
      suite that uses cold attribute, there are larger applications that do
      use cold attribute.
      
      r200886 + this commit intend to keep the same behavior as prior to r200886.
      We can later on tune the inlinecold-threshold.
      
      The main purpose of r200886 is to help performance of instrumentation based
      PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
      For instrumentation based PGO, we try to increase inlining of hot functions and
      reduce inlining of cold functions by setting inlinecold-threshold.
      
      Another option suggested by Chandler is to use a boolean flag that controls
      if we should use OptSizeThreshold for cold functions. The default value
      of the boolean flag should not change the current behavior. But it gives us
      less freedom in controlling inlining of cold functions.
      
      llvm-svn: 200898
      d4612449
    • Paul Robinson's avatar
      Disable most IR-level transform passes on functions marked 'optnone'. · af4e64d0
      Paul Robinson authored
      Ideally only those transform passes that run at -O0 remain enabled,
      in reality we get as close as we reasonably can.
      Passes are responsible for disabling themselves, it's not the job of
      the pass manager to do it for them.
      
      llvm-svn: 200892
      af4e64d0
  25. Feb 05, 2014
  26. Feb 04, 2014
  27. Feb 03, 2014
    • Reid Kleckner's avatar
      inalloca: Don't remove dead arguments in the presence of inalloca args · d47a59a4
      Reid Kleckner authored
      It disturbs the layout of the parameters in memory and registers,
      leading to problems in the backend.
      
      The plan for optimizing internal inalloca functions going forward is to
      essentially SROA the argument memory and demote any captured arguments
      (things that aren't trivially written by a load or store) to an indirect
      pointer to a static alloca.
      
      llvm-svn: 200717
      d47a59a4
Loading