Skip to content
  1. Nov 16, 2013
    • Hal Finkel's avatar
      Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt · 12100bf7
      Hal Finkel authored
      InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:
      
        (fptrunc (sqrt (fpext x))) -> (sqrtf x)
      
      but does not apply the same optimization to llvm.sqrt. This is a problem
      because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
      fast-math mode, and because this optimization is being applied to sqrt and not
      applied to llvm.sqrt, sometimes the fast-math code is slower.
      
      This change makes InstCombine apply this optimization to llvm.sqrt as well.
      
      This fixes the specific problem in PR17758, although the same underlying issue
      (optimizations applied to libcalls are not applied to intrinsics) exists for
      other optimizations in SimplifyLibCalls.
      
      llvm-svn: 194935
      12100bf7
    • Benjamin Kramer's avatar
      InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs. · 03f3e248
      Benjamin Kramer authored
      This is common in bitfield code.
      
      llvm-svn: 194925
      03f3e248
    • Arnold Schwaighofer's avatar
      LoopVectorizer: Use abi alignment for accesses with no alignment · dbb7b87d
      Arnold Schwaighofer authored
      When we vectorize a scalar access with no alignment specified, we have to set
      the target's abi alignment of the scalar access on the vectorized access.
      Using the same alignment of zero would be wrong because most targets will have a
      bigger abi alignment for vector types.
      
      This probably fixes PR17878.
      
      llvm-svn: 194876
      dbb7b87d
  2. Nov 15, 2013
  3. Nov 14, 2013
  4. Nov 13, 2013
    • Jakub Staszak's avatar
      Use StringRef instead of std::string · 86a7492f
      Jakub Staszak authored
      llvm-svn: 194601
      86a7492f
    • Alexey Samsonov's avatar
    • Diego Novillo's avatar
      SampleProfileLoader pass. Initial setup. · 8d6568b5
      Diego Novillo authored
      This adds a new scalar pass that reads a file with samples generated
      by 'perf' during runtime. The samples read from the profile are
      incorporated and emmited as IR metadata reflecting that profile.
      
      The profile file is assumed to have been generated by an external
      profile source. The profile information is converted into IR metadata,
      which is later used by the analysis routines to estimate block
      frequencies, edge weights and other related data.
      
      External profile information files have no fixed format, each profiler
      is free to define its own. This includes both the on-disk representation
      of the profile and the kind of profile information stored in the file.
      A common kind of profile is based on sampling (e.g., perf), which
      essentially counts how many times each line of the program has been
      executed during the run.
      
      The SampleProfileLoader pass is organized as a scalar transformation.
      On startup, it reads the file given in -sample-profile-file to
      determine what kind of profile it contains.  This file is assumed to
      contain profile information for the whole application. The profile
      data in the file is read and incorporated into the internal state of
      the corresponding profiler.
      
      To facilitate testing, I've organized the profilers to support two file
      formats: text and native. The native format is whatever on-disk
      representation the profiler wants to support, I think this will mostly
      be bitcode files, but it could be anything the profiler wants to
      support. To do this, every profiler must implement the
      SampleProfile::loadNative() function.
      
      The text format is mostly meant for debugging. Records are separated by
      newlines, but each profiler is free to interpret records as it sees fit.
      Profilers must implement the SampleProfile::loadText() function.
      
      Finally, the pass will call SampleProfile::emitAnnotations() for each
      function in the current translation unit. This function needs to
      translate the loaded profile into IR metadata, which the analyzer will
      later be able to use.
      
      This patch implements the first steps towards the above design. I've
      implemented a sample-based flat profiler. The format of the profile is
      fairly simplistic. Each sampled function contains a list of relative
      line locations (from the start of the function) together with a count
      representing how many samples were collected at that line during
      execution. I generate this profile using perf and a separate converter
      tool.
      
      Currently, I have only implemented a text format for these profiles. I
      am interested in initial feedback to the whole approach before I send
      the other parts of the implementation for review.
      
      This patch implements:
      
      - The SampleProfileLoader pass.
      - The base ExternalProfile class with the core interface.
      - A SampleProfile sub-class using the above interface. The profiler
        generates branch weight metadata on every branch instructions that
        matches the profiles.
      - A text loader class to assist the implementation of
        SampleProfile::loadText().
      - Basic unit tests for the pass.
      
      Additionally, the patch uses profile information to compute branch
      weights based on instruction samples.
      
      This patch converts instruction samples into branch weights. It
      does a fairly simplistic conversion:
      
      Given a multi-way branch instruction, it calculates the weight of
      each branch based on the maximum sample count gathered from each
      target basic block.
      
      Note that this assignment of branch weights is somewhat lossy and can be
      misleading. If a basic block has more than one incoming branch, all the
      incoming branches will get the same weight. In reality, it may be that
      only one of them is the most heavily taken branch.
      
      I will adjust this assignment in subsequent patches.
      
      llvm-svn: 194566
      8d6568b5
    • Nadav Rotem's avatar
      Update the docs to match the function name. · ea186b95
      Nadav Rotem authored
      llvm-svn: 194537
      ea186b95
  5. Nov 12, 2013
  6. Nov 11, 2013
    • Shuxin Yang's avatar
      Fix PR17952. · 3168ab33
      Shuxin Yang authored
        The symptom is that an assertion is triggered. The assertion was added by
      me to detect the situation when value is propagated from dead blocks.
      (We can certainly get rid of assertion; it is safe to do so, because propagating
       value from dead block to alive join node is certainly ok.)
      
        The root cause of this bug is : edge-splitting is conducted on the fly,
      the edge being split could be a dead edge, therefore the block that 
      split the critial edge needs to be flagged "dead" as well.
      
        There are 3 ways to fix this bug:
        1) Get rid of the assertion as I mentioned eariler 
        2) When an dead edge is split, flag the inserted block "dead".
        3) proactively split the critical edges connecting dead and live blocks when
           new dead blocks are revealed.
      
        This fix go for 3) with additional 2 LOC.
      
        Testing case was added by Rafael the other day.
      
      llvm-svn: 194424
      3168ab33
    • Renato Golin's avatar
      Move debug message in vectorizer · 3f67a7de
      Renato Golin authored
      No functional change, just better reporting.
      
      llvm-svn: 194388
      3f67a7de
    • Evgeniy Stepanov's avatar
      [msan] Propagate origin for insertvalue, extractvalue. · 560e0893
      Evgeniy Stepanov authored
      llvm-svn: 194374
      560e0893
  7. Nov 10, 2013
  8. Nov 08, 2013
    • Hal Finkel's avatar
      Remove dead code from LoopUnswitch · 1a642aef
      Hal Finkel authored
      LoopUnswitch's code simplification routine has logic to convert conditional
      branches into unconditional branches, after unswitching makes the condition
      constant, and then remove any blocks that renders dead. Unfortunately, this
      code is dead, currently broken, and furthermore, has never been alive (at least
      as far back at 2006).
      
      No functionality change intended.
      
      llvm-svn: 194277
      1a642aef
  9. Nov 05, 2013
    • Michael Gottesman's avatar
      [objc-arc] Convert the one directional retain/release relation assert to a... · 24b2f6fd
      Michael Gottesman authored
      [objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.
      
      Due to the previously added overflow checks, we can have a retain/release
      relation that is one directional. This occurs specifically when we run into an
      additive overflow causing us to drop state in only one direction. If that
      occurs, we should bail and not optimize that retain/release instead of
      asserting.
      
      Apologies for the size of the testcase. It is necessary to cause the additive
      cfg overflow to trigger.
      
      rdar://15377890
      
      llvm-svn: 194083
      24b2f6fd
    • Hal Finkel's avatar
      Add a runtime unrolling parameter to the LoopUnroll pass constructor · 081eaef6
      Hal Finkel authored
      As with the other loop unrolling parameters (the unrolling threshold, partial
      unrolling, etc.) runtime unrolling can now also be controlled via the
      constructor. This will be necessary for moving non-trivial unrolling late in
      the pass manager (after loop vectorization).
      
      No functionality change intended.
      
      llvm-svn: 194027
      081eaef6
  10. Nov 04, 2013
  11. Nov 03, 2013
  12. Nov 02, 2013
Loading