Skip to content
  1. Oct 16, 2014
  2. Oct 14, 2014
  3. Oct 12, 2014
  4. Oct 10, 2014
    • Arnold Schwaighofer's avatar
      SimplifyCFG: Don't convert phis into selects if we could remove undef behavior · d7d010eb
      Arnold Schwaighofer authored
      instead
      
      We used to transform this:
      
        define void @test6(i1 %cond, i8* %ptr) {
        entry:
          br i1 %cond, label %bb1, label %bb2
      
        bb1:
          br label %bb2
      
        bb2:
          %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
          store i8 2, i8* %ptr.2, align 8
          ret void
        }
      
      into this:
      
        define void @test6(i1 %cond, i8* %ptr) {
          %ptr.2 = select i1 %cond, i8* null, i8* %ptr
          store i8 2, i8* %ptr.2, align 8
          ret void
        }
      
      because the simplifycfg transformation into selects would happen to happen
      before the simplifycfg transformation that removes unreachable control flow
      (We have 'unreachable control flow' due to the store to null which is undefined
      behavior).
      
      The existing transformation that removes unreachable control flow in simplifycfg
      is:
      
        /// If BB has an incoming value that will always trigger undefined behavior
        /// (eg. null pointer dereference), remove the branch leading here.
        static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
      
      Now we generate:
      
        define void @test6(i1 %cond, i8* %ptr) {
          store i8 2, i8* %ptr.2, align 8
          ret void
        }
      
      I did not see any impact on the test-suite + externals.
      
      rdar://18596215
      
      llvm-svn: 219462
      d7d010eb
  5. Oct 07, 2014
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Create sub-loops in LoopInfo · c46cfcbb
      Duncan P. N. Exon Smith authored
      `LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
      particular, tell `LoopInfo` about copies of inner loops when unrolling
      the outer loop.
      
      Conservatively, also tell `ScalarEvolution` to forget about the original
      versions of these loops, since their inputs may have changed.
      
      Fixes PR20987.
      
      llvm-svn: 219241
      c46cfcbb
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Only check for ScalarEvolution analysis once, NFC · 9b4d37e8
      Duncan P. N. Exon Smith authored
      A follow-up commit will add use to a tight loop.  We might as well just
      find it once anyway.
      
      llvm-svn: 219239
      9b4d37e8
    • Marcello Maggioni's avatar
      Two case switch to select optimization · 963bc87d
      Marcello Maggioni authored
      This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
      to a select or a couple of selects (depending if the default block is reachable or not).
      
      The typical case this optimization wants to be able to optimize is this one:
      
      Example:
      switch (a) {
        case 10:                %0 = icmp eq i32 %a, 10
          return 10;            %1 = select i1 %0, i32 10, i32 4
        case 20:        ---->   %2 = icmp eq i32 %a, 20
          return 2;             %3 = select i1 %2, i32 2, i32 %1
        default:
          return 4;
      }
      
      It also sets the base for further optimizations that are planned and being reviewed.
      
      llvm-svn: 219223
      963bc87d
    • Duncan P. N. Exon Smith's avatar
      LoopUnroll: Change code order of changes to new basic blocks · e5d7d979
      Duncan P. N. Exon Smith authored
      Add new basic blocks to `LoopInfo` earlier.  No functionality change
      intended (simplifies upcoming bugfix patch).
      
      llvm-svn: 219150
      e5d7d979
    • Duncan P. N. Exon Smith's avatar
      Sink comment, NFC · 0bbf5418
      Duncan P. N. Exon Smith authored
      llvm-svn: 219149
      0bbf5418
  6. Oct 01, 2014
    • Duncan P. N. Exon Smith's avatar
      DIBuilder: Encapsulate DIExpression's element type · 611afb22
      Duncan P. N. Exon Smith authored
      `DIExpression`'s elements are 64-bit integers that are stored as
      `ConstantInt`.  The accessors already encapsulate the storage.  This
      commit updates the `DIBuilder` API to also encapsulate that.
      
      llvm-svn: 218797
      611afb22
    • Adrian Prantl's avatar
      Move the complex address expression out of DIVariable and into an extra · 87b7eb9d
      Adrian Prantl authored
      argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
      
      Previously, DIVariable was a variable-length field that has an optional
      reference to a Metadata array consisting of a variable number of
      complex address expressions. In the case of OpPiece expressions this is
      wasting a lot of storage in IR, because when an aggregate type is, e.g.,
      SROA'd into all of its n individual members, the IR will contain n copies
      of the DIVariable, all alike, only differing in the complex address
      reference at the end.
      
      By making the complex address into an extra argument of the
      dbg.value/dbg.declare intrinsics, all of the pieces can reference the
      same variable and the complex address expressions can be uniqued across
      the CU, too.
      Down the road, this will allow us to move other flags, such as
      "indirection" out of the DIVariable, too.
      
      The new intrinsics look like this:
      declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
      declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
      
      This patch adds a new LLVM-local tag to DIExpressions, so we can detect
      and pretty-print DIExpression metadata nodes.
      
      What this patch doesn't do:
      
      This patch does not touch the "Indirect" field in DIVariable; but moving
      that into the expression would be a natural next step.
      
      http://reviews.llvm.org/D4919
      rdar://problem/17994491
      
      Thanks to dblaikie and dexonsmith for reviewing this patch!
      
      Note: I accidentally committed a bogus older version of this patch previously.
      llvm-svn: 218787
      87b7eb9d
    • Adrian Prantl's avatar
      Revert r218778 while investigating buldbot breakage. · b458dc2e
      Adrian Prantl authored
      "Move the complex address expression out of DIVariable and into an extra"
      
      llvm-svn: 218782
      b458dc2e
    • Adrian Prantl's avatar
      Move the complex address expression out of DIVariable and into an extra · 25a7174e
      Adrian Prantl authored
      argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
      
      Previously, DIVariable was a variable-length field that has an optional
      reference to a Metadata array consisting of a variable number of
      complex address expressions. In the case of OpPiece expressions this is
      wasting a lot of storage in IR, because when an aggregate type is, e.g.,
      SROA'd into all of its n individual members, the IR will contain n copies
      of the DIVariable, all alike, only differing in the complex address
      reference at the end.
      
      By making the complex address into an extra argument of the
      dbg.value/dbg.declare intrinsics, all of the pieces can reference the
      same variable and the complex address expressions can be uniqued across
      the CU, too.
      Down the road, this will allow us to move other flags, such as
      "indirection" out of the DIVariable, too.
      
      The new intrinsics look like this:
      declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
      declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
      
      This patch adds a new LLVM-local tag to DIExpressions, so we can detect
      and pretty-print DIExpression metadata nodes.
      
      What this patch doesn't do:
      
      This patch does not touch the "Indirect" field in DIVariable; but moving
      that into the expression would be a natural next step.
      
      http://reviews.llvm.org/D4919
      rdar://problem/17994491
      
      Thanks to dblaikie and dexonsmith for reviewing this patch!
      
      llvm-svn: 218778
      25a7174e
    • Tom Stellard's avatar
      C API: Add LLVMCloneModule() · 0a4e9a3b
      Tom Stellard authored
      llvm-svn: 218775
      0a4e9a3b
    • Jingyue Wu's avatar
      [SimplifyCFG] threshold for folding branches with common destination · fc029670
      Jingyue Wu authored
      Summary:
      This patch adds a threshold that controls the number of bonus instructions
      allowed for folding branches with common destination. The original code allows
      at most one bonus instruction. With this patch, users can customize the
      threshold to allow multiple bonus instructions. The default threshold is still
      1, so that the code behaves the same as before when users do not specify this
      threshold.
      
      The motivation of this change is that tuning this threshold significantly (up
      to 25%) improves the performance of some CUDA programs in our internal code
      base. In general, branch instructions are very expensive for GPU programs.
      Therefore, it is sometimes worth trading more arithmetic computation for a more
      straightened control flow. Here's a reduced example:
      
        __global__ void foo(int a, int b, int c, int d, int e, int n,
                            const int *input, int *output) {
          int sum = 0;
          for (int i = 0; i < n; ++i)
            sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
          *output = sum;
        }
      
      The select statement in the loop body translates to two branch instructions "if
      ((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
      With the default threshold, SimplifyCFG is unable to fold them, because
      computing the condition of the second branch "(i | c) ^ d > e" requires two
      bonus instructions. With the threshold increased, SimplifyCFG can fold the two
      branches so that the loop body contains only one branch, making the code
      conceptually look like:
      
        sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
      
      Increasing the threshold significantly improves the performance of this
      particular example. In the configuration where both conditions are guaranteed
      to be true, increasing the threshold from 1 to 2 improves the performance by
      18.24%. Even in the configuration where the first condition is false and the
      second condition is true, which favors shortcuts, increasing the threshold from
      1 to 2 still improves the performance by 4.35%.
      
      We are still looking for a good threshold and maybe a better cost model than
      just counting the number of bonus instructions. However, according to the above
      numbers, we think it is at least worth adding a threshold to enable more
      experiments and tuning. Let me know what you think. Thanks!
      
      Test Plan: Added one test case to check the threshold is in effect
      
      Reviewers: nadav, eliben, meheff, resistor, hfinkel
      
      Reviewed By: hfinkel
      
      Subscribers: hfinkel, llvm-commits
      
      Differential Revision: http://reviews.llvm.org/D5529
      
      llvm-svn: 218711
      fc029670
  7. Sep 29, 2014
    • Kevin Qin's avatar
      Use a loop to simplify the runtime unrolling prologue. · fc02e3c3
      Kevin Qin authored
      Runtime unrolling will create a prologue to execute the extra
      iterations which is can't divided by the unroll factor. It
      generates an if-then-else sequence to jump into a factor -1
      times unrolled loop body, like
      
          extraiters = tripcount % loopfactor
          if (extraiters == 0) jump Loop:
          if (extraiters == loopfactor) jump L1
          if (extraiters == loopfactor-1) jump L2
          ...
          L1:  LoopBody;
          L2:  LoopBody;
          ...
          if tripcount < loopfactor jump End
          Loop:
          ...
          End:
      
      It means if the unroll factor is 4, the loop body will be 7
      times unrolled, 3 are in loop prologue, and 4 are in the loop.
      This commit is to use a loop to execute the extra iterations
      in prologue, like
      
              extraiters = tripcount % loopfactor
              if (extraiters == 0) jump Loop:
              else jump Prol
       Prol:  LoopBody;
              extraiters -= 1                 // Omitted if unroll factor is 2.
              if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
              if (tripcount < loopfactor) jump End
       Loop:
       ...
       End:
      
      Then when unroll factor is 4, the loop body will be copied by
      only 5 times, 1 in the prologue loop, 4 in the original loop.
      And if the unroll factor is 2, new loop won't be created, just
      as the original solution.
      
      llvm-svn: 218604
      fc02e3c3
  8. Sep 24, 2014
    • Reid Kleckner's avatar
      GlobalOpt: Preserve comdats of unoptimized initializers · 78927e88
      Reid Kleckner authored
      Rather than slurping in and splatting out the whole ctor list, preserve
      the existing array entries without trying to understand them.  Only
      remove the entries that we know we can optimize away.  This way we don't
      need to wire through priority and comdats or anything else we might add.
      
      Fixes a linker issue where the .init_array or .ctors entry would point
      to discarded initialization code if the comdat group from the TU with
      the faulty global_ctors entry was dropped.
      
      llvm-svn: 218337
      78927e88
  9. Sep 17, 2014
  10. Sep 15, 2014
    • Jingyue Wu's avatar
      Remove dead code in SimplifyCFG · b67140b8
      Jingyue Wu authored
      Summary: UsedByBranch is always true according to how BonusInst is defined.
      
      Test Plan:
      Passes check-all, and also verified 
      
      if (BonusInst && !UsedByBranch) {
        ...
      }
      
      is never entered during check-all.
      
      Reviewers: resistor, nadav, jingyue
      
      Reviewed By: jingyue
      
      Subscribers: llvm-commits, eliben, meheff
      
      Differential Revision: http://reviews.llvm.org/D5324
      
      llvm-svn: 217824
      b67140b8
  11. Sep 13, 2014
  12. Sep 07, 2014
    • Hal Finkel's avatar
      Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) · 60db0589
      Hal Finkel authored
      This change, which allows @llvm.assume to be used from within computeKnownBits
      (and other associated functions in ValueTracking), adds some (optional)
      parameters to computeKnownBits and friends. These functions now (optionally)
      take a "context" instruction pointer, an AssumptionTracker pointer, and also a
      DomTree pointer, and most of the changes are just to pass this new information
      when it is easily available from InstSimplify, InstCombine, etc.
      
      As explained below, the significant conceptual change is that known properties
      of a value might depend on the control-flow location of the use (because we
      care that the @llvm.assume dominates the use because assumptions have
      control-flow dependencies). This means that, when we ask if bits are known in a
      value, we might get different answers for different uses.
      
      The significant changes are all in ValueTracking. Two main changes: First, as
      with the rest of the code, new parameters need to be passed around. To make
      this easier, I grouped them into a structure, and I made internal static
      versions of the relevant functions that take this structure as a parameter. The
      new code does as you might expect, it looks for @llvm.assume calls that make
      use of the value we're trying to learn something about (often indirectly),
      attempts to pattern match that expression, and uses the result if successful.
      By making use of the AssumptionTracker, the process of finding @llvm.assume
      calls is not expensive.
      
      Part of the structure being passed around inside ValueTracking is a set of
      already-considered @llvm.assume calls. This is to prevent a query using, for
      example, the assume(a == b), to recurse on itself. The context and DT params
      are used to find applicable assumptions. An assumption needs to dominate the
      context instruction, or come after it deterministically. In this latter case we
      only handle the specific case where both the assumption and the context
      instruction are in the same block, and we need to exclude assumptions from
      being used to simplify their own ephemeral values (those which contribute only
      to the assumption) because otherwise the assumption would prove its feeding
      comparison trivial and would be removed.
      
      This commit adds the plumbing and the logic for a simple masked-bit propagation
      (just enough to write a regression test). Future commits add more patterns
      (and, correspondingly, more regression tests).
      
      llvm-svn: 217342
      60db0589
    • Hal Finkel's avatar
      Add an Assumption-Tracking Pass · 74c2f355
      Hal Finkel authored
      This adds an immutable pass, AssumptionTracker, which keeps a cache of
      @llvm.assume call instructions within a module. It uses callback value handles
      to keep stale functions and intrinsics out of the map, and it relies on any
      code that creates new @llvm.assume calls to notify it of the new instructions.
      The benefit is that code needing to find @llvm.assume intrinsics can do so
      directly, without scanning the function, thus allowing the cost of @llvm.assume
      handling to be negligible when none are present.
      
      The current design is intended to be lightweight. We don't keep track of
      anything until we need a list of assumptions in some function. The first time
      this happens, we scan the function. After that, we add/remove @llvm.assume
      calls from the cache in response to registration calls and ValueHandle
      callbacks.
      
      There are no new direct test cases for this pass, but because it calls it
      validation function upon module finalization, we'll pick up detectable
      inconsistencies from the other tests that touch @llvm.assume calls.
      
      This pass will be used by follow-up commits that make use of @llvm.assume.
      
      llvm-svn: 217334
      74c2f355
  13. Sep 04, 2014
  14. Sep 01, 2014
    • Hal Finkel's avatar
      Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata · 0c083024
      Hal Finkel authored
      This feeds AA through the IFI structure into the inliner so that
      AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
      functions only access their arguments (instead of just hard-coding some
      knowledge of memory intrinsics). Most of the information is only available from
      BasicAA; this is important for preserving alias scoping information for
      target-specific intrinsics when doing the noalias parameter attribute to
      metadata conversion.
      
      llvm-svn: 216866
      0c083024
    • Hal Finkel's avatar
      Fix AddAliasScopeMetadata again - alias.scope must be a complete description · cbb85f24
      Hal Finkel authored
      I thought that I had fixed this problem in r216818, but I did not do a very
      good job. The underlying issue is that when we add alias.scope metadata we are
      asserting that this metadata completely describes the aliasing relationships
      within the current aliasing scope domain, and so in the context of translating
      noalias argument attributes, the pointers must all be based on noalias
      arguments (as underlying objects) and have no other kind of underlying object.
      In r216818 excluding appropriate accesses from getting alias.scope metadata is
      done by looking for underlying objects that are not identified function-local
      objects -- but that's wrong because allocas, etc. are also function-local
      objects and we need to explicitly check that all underlying objects are the
      noalias arguments for which we're adding metadata aliasing scopes.
      
      This fixes the underlying-object check for adding alias.scope metadata, and
      does some refactoring of the related capture-checking eligibility logic (and
      adds more comments; hopefully making everything a bit clearer).
      
      Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
      feature is still disabled by default).
      
      llvm-svn: 216863
      cbb85f24
  15. Aug 30, 2014
    • Hal Finkel's avatar
      Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers · a3708df4
      Hal Finkel authored
      The previous implementation of AddAliasScopeMetadata, which adds noalias
      metadata to preserve noalias parameter attribute information when inlining had
      a flaw: it would add alias.scope metadata to accesses which might have been
      derived from pointers other than noalias function parameters. This was
      incorrect because even some access known not to alias with all noalias function
      parameters could easily alias with an access derived from some other pointer.
      Instead, when deriving from some unknown pointer, we cannot add alias.scope
      metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4.
      Furthermore, we cannot add alias.scope to functions unless we know they
      access only argument-derived pointers (currently, we know this only for
      memory intrinsics).
      
      Also, we fix a theoretical problem with using the NoCapture attribute to skip
      the capture check. This is incorrect (as explained in the comment added), but
      would not matter in any code generated by Clang because we get only inferred
      nocapture attributes in Clang-generated IR.
      
      This functionality is not yet enabled by default.
      
      llvm-svn: 216818
      a3708df4
  16. Aug 29, 2014
  17. Aug 27, 2014
  18. Aug 25, 2014
  19. Aug 22, 2014
    • David Blaikie's avatar
      Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator... · 2f3f76fd
      David Blaikie authored
      Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks.
      
      Somewhat unnoticed in the original implementation of discriminators, but
      it could cause instructions to end up in new, small,
      DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
      discriminator changes.
      
      Instead, use DILexicalBlockFile which we already use to track file
      changes without introducing new scopes, so it works well to track
      discriminator changes in the same way.
      
      llvm-svn: 216239
      2f3f76fd
  20. Aug 21, 2014
  21. Aug 18, 2014
  22. Aug 15, 2014
  23. Aug 14, 2014
    • Hal Finkel's avatar
      Copy noalias metadata from call sites to inlined instructions · 61c38612
      Hal Finkel authored
      When a call site with noalias metadata is inlined, that metadata can be
      propagated directly to the inlined instructions (only those that might access
      memory because it is not useful on the others). Prior to inlining, the noalias
      metadata could express that a call would not alias with some other memory
      access, which implies that no instruction within that called function would
      alias. By propagating the metadata to the inlined instructions, we preserve
      that knowledge.
      
      This should complete the enhancements requested in PR20500.
      
      llvm-svn: 215676
      61c38612
    • Hal Finkel's avatar
      Add noalias metadata for general calls (not just memory intrinsics) during inlining · d2dee16c
      Hal Finkel authored
      When preserving noalias function parameter attributes by adding noalias
      metadata in the inliner, we should do this for general function calls (not just
      memory intrinsics). The logic is very similar to what already existed (except
      that we want to add this metadata even for functions taking no relevant
      parameters). This metadata can be used by ModRef queries in the caller after
      inlining.
      
      This addresses the first part of PR20500. Adding noalias metadata during
      inlining is still turned off by default.
      
      llvm-svn: 215657
      d2dee16c
  24. Aug 13, 2014
Loading