Skip to content
  1. Jun 08, 2019
  2. Jun 07, 2019
    • James Henderson's avatar
      [docs]Move llvm-readobj from "Developer Tools" to "Basic Commands" · aa8753bc
      James Henderson authored
      On the Command Guide page, there are multiple sections with links to the
      different documentation pages available for LLVM tools. The "Basic
      Tools" section includes tools like llvm-objdump, llvm-nm and so on. The
      "Developer Tools" section contains things like FileCheck and lit. This
      change moves llvm-readobj into the former block, from the latter.
      
      Reviewed by: MaskRay
      
      Differential Revision: https://reviews.llvm.org/D63011
      
      llvm-svn: 362813
      aa8753bc
  3. Jun 06, 2019
    • Thomas Preud'homme's avatar
      FileCheck [6/12]: Introduce numeric variable definition · 71d3f227
      Thomas Preud'homme authored
      Summary:
      This patch is part of a patch series to add support for FileCheck
      numeric expressions. This specific patch introduces support for defining
      numeric variable in a CHECK directive.
      
      This commit introduces support for defining numeric variable from a
      litteral value in the input text. Numeric expressions can then use the
      variable provided it is on a later line.
      
      Copyright:
          - Linaro (changes up to diff 183612 of revision D55940)
          - GraphCore (changes in later versions of revision D55940 and
                       in new revision created off D55940)
      
      Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk
      
      Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60386
      
      llvm-svn: 362705
      71d3f227
  4. May 30, 2019
    • J. Ryan Stinnett's avatar
      [Docs] Modernize references to macOS · d45eaf94
      J. Ryan Stinnett authored
      Summary:
      This updates all places in documentation that refer to "Mac OS X", "OS X", etc.
      to instead use the modern name "macOS" when no specific version number is
      mentioned.
      
      If a specific version is mentioned, this attempts to use the OS name at the time
      of that version:
      
      * Mac OS X for 10.0 - 10.7
      * OS X for 10.8 - 10.11
      * macOS for 10.12 - present
      
      Reviewers: JDevlieghere
      
      Subscribers: mgorny, christof, arphaman, cfe-commits, lldb-commits, libcxx-commits, llvm-commits
      
      Tags: #clang, #lldb, #libc, #llvm
      
      Differential Revision: https://reviews.llvm.org/D62654
      
      llvm-svn: 362113
      d45eaf94
  5. May 23, 2019
    • Thomas Preud'homme's avatar
      FileCheck: Improve FileCheck variable terminology · 1a944d27
      Thomas Preud'homme authored
      Summary:
      Terminology introduced by [[#]] blocks is confusing and does not
      integrate well with existing terminology.
      
      First, variables referred by [[]] blocks are called "pattern variables"
      while the text a CHECK directive needs to match is called a "CHECK
      pattern". This is inconsistent with variables in [[#]] blocks since
      [[#]] blocks are also found in CHECK pattern yet those variables are
      called "numeric variable".
      
      Second, the replacing of both [[]] and [[#]] blocks by the value of the
      variable or expression they contain is represented by a
      FileCheckPatternSubstitution class. The naming refers to being a
      substitution in a CHECK pattern but could be wrongly understood as being
      a substitution of a pattern variable.
      
      Third and lastly, comments use "numeric expression" to refer both to the
      [[#]] blocks as well as to the numeric expressions these blocks contain
      which get evaluated at match time.
      
      This patch solves these confusions by
      - calling variables in [[]] and [[#]] blocks as string and numeric
        variables respectively;
      - referring to [[]] and [[#]] as substitution *blocks*, with the former
        being a string substitution block and the latter a numeric
        substitution block;
      - calling [[]] and [[#]] blocks to be replaced by the value of a
        variable or expression they contain a substitution (as opposed to
        definition when these blocks are used to defined a variable), with the
        former being a string substitution and the latter a numeric
        substitution;
      - renaming the FileCheckPatternSubstitution as a FileCheckSubstitution
        class with FileCheckStringSubstitution and
        FileCheckNumericSubstitution subclasses;
      - restricting the use of "numeric expression" to refer to the expression
        that is evaluated in a numeric substitution.
      
      While numeric substitution blocks only support numeric substitutions of
      numeric expressions at the moment there are plans to augment numeric
      substitution blocks to support numeric definitions as well as both a
      numeric definition and numeric substitution in the same numeric
      substitution block.
      
      Reviewers: jhenderson, jdenny, probinson, arichardson
      
      Subscribers: hiraditya, arichardson, probinson, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D62146
      
      llvm-svn: 361445
      1a944d27
  6. May 15, 2019
  7. May 14, 2019
  8. May 13, 2019
    • Thomas Preud'homme's avatar
      FileCheck [5/12]: Introduce regular numeric variables · e47362c1
      Thomas Preud'homme authored
      Summary:
      This patch is part of a patch series to add support for FileCheck
      numeric expressions. This specific patch introduces regular numeric
      variables which can be set on the command-line.
      
      This commit introduces regular numeric variable that can be set on the
      command-line with the -D option to a numeric value. They can then be
      used in CHECK patterns in numeric expression with the same shape as
      @LINE numeric expression, ie. VAR, VAR+offset or VAR-offset where offset
      is an integer literal.
      
      The commit also enable strict whitespace in the verbose.txt testcase to
      check that the position or the location diagnostics. It fixes one of the
      existing CHECK in the process which was not accurately testing a
      location diagnostic (ie. the diagnostic was correct, not the CHECK).
      
      Copyright:
          - Linaro (changes up to diff 183612 of revision D55940)
          - GraphCore (changes in later versions of revision D55940 and
                       in new revision created off D55940)
      
      Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk
      
      Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60385
      
      llvm-svn: 360578
      e47362c1
  9. May 09, 2019
    • Andrea Di Biagio's avatar
      [MCA] Add support for nested and overlapping region markers · 4e62554b
      Andrea Di Biagio authored
      This patch fixes PR41523
      https://bugs.llvm.org/show_bug.cgi?id=41523
      
      Regions can now nest/overlap provided that they have different names.
      Anonymous regions cannot overlap.
      
      Region end markers must specify the region name. The only exception is for when
      there is only one user-defined region; in that particular case, the region end
      marker doesn't need to specify a name.
      
      Incorrect region end markers are no longer ignored. Instead, the tool reports an
      error and we exit with an error code.
      
      Added test cases to verify the new diagnostic error messages.
      
      Updated the llvm-mca docs to reflect this feature change.
      
      Differential Revision: https://reviews.llvm.org/D61676
      
      llvm-svn: 360351
      4e62554b
  10. May 02, 2019
    • Thomas Preud'homme's avatar
      FileCheck [4/12]: Introduce @LINE numeric expressions · 288ed91e
      Thomas Preud'homme authored
      Summary:
      This patch is part of a patch series to add support for FileCheck
      numeric expressions. This specific patch introduces the @LINE numeric
      expressions.
      
      This commit introduces a new syntax to express a relation a numeric
      value in the input text must have with the line number of a given CHECK
      pattern: [[#<@LINE numeric expression>]]. Further commits build on that
      to express relations between several numeric values in the input text.
      To help with naming, regular variables are renamed into pattern
      variables and old @LINE expression syntax is referred to as legacy
      numeric expression.
      
      Compared to existing @LINE expressions, this new syntax allow arbitrary
      spacing between the component of the expression. It offers otherwise the
      same functionality but the commit serves to introduce some of the data
      structure needed to support more general numeric expressions.
      
      Copyright:
          - Linaro (changes up to diff 183612 of revision D55940)
          - GraphCore (changes in later versions of revision D55940 and
                       in new revision created off D55940)
      
      Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk
      
      Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D60384
      
      llvm-svn: 359741
      288ed91e
  11. May 01, 2019
  12. Apr 30, 2019
  13. Apr 19, 2019
  14. Apr 18, 2019
  15. Apr 08, 2019
  16. Apr 05, 2019
  17. Mar 28, 2019
    • Roman Lebedev's avatar
      [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) · c2423fe6
      Roman Lebedev authored
      Summary:
      This is an alternative to D59539.
      
      Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
      Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
      By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
      Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
      Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
      So all these points ended up in the same cluster.
      This may or may not be a correct implementation of dbscan clustering algorithm.
      
      But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
      Let's suppose all those opcodes are currently in the same sched cluster.
      If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
      the LLVM values this cluster will **never** match the LLVM values,
      and thus this cluster will **always** be displayed as inconsistent.
      
      The solution is obviously to split off some of these opcodes into different sched cluster.
      But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
      which ones are the "bad ones"? Which ones are the most different from the checked-in data?
      I'd need to go in to the `.yaml` and look it up manually.
      
      The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
      but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
      put them all into a new cluster, repeat". And just so as it happens, we can arrive
      at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
      
      But that won't work well once we teach analyze mode to operate in on-1D mode
      (i.e. on more than a single measurement type at a time), because the clustering would
      depend on the order of the measurements.
      
      Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
      And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
      and if they are not, the cluster (==opcode) is unstable.
      
      This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
      
      Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
      
      Reviewers: courbet, gchatelet
      
      Reviewed By: courbet
      
      Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D59820
      
      llvm-svn: 357152
      c2423fe6
  18. Mar 27, 2019
  19. Mar 14, 2019
    • Max Moroz's avatar
      Speeding up llvm-cov export with multithreaded renderFiles implementation. · a80d9ce5
      Max Moroz authored
      Summary:
      CoverageExporterJson::renderFiles accounts for most of the execution time given a large profdata file with multiple binaries.
      
      Proposed solution is to generate JSON for each file in parallel and sort at the end to preserve deterministic output. Also added flags to skip generating parts of the output to trim the output size.
      
      Patch by Sajjad Mirza (@sajjadm).
      
      Reviewers: Dor1s, vsk
      
      Reviewed By: Dor1s, vsk
      
      Subscribers: liaoyuke, mgrang, jdoerfert, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D59277
      
      llvm-svn: 356178
      a80d9ce5
  20. Mar 04, 2019
    • Andrea Di Biagio's avatar
      [MCA] Highlight kernel bottlenecks in the summary view. · be3281a2
      Andrea Di Biagio authored
      This patch adds a new flag named -bottleneck-analysis to print out information
      about throughput bottlenecks.
      
      MCA knows how to identify and classify dynamic dispatch stalls. However, it
      doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
      patch is to teach MCA how to correlate increases in backend pressure to backend
      stalls (and therefore, the loss of throughput).
      
      From a Scheduler point of view, backend pressure is a function of the scheduler
      buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
      time). Backend pressure increases (or decreases) when there is a mismatch
      between the number of opcodes dispatched, and the number of opcodes issued in
      the same cycle.  Since buffer resources are limited, continuous increases in
      backend pressure would eventually leads to dispatch stalls. So, there is a
      strong correlation between dispatch stalls, and how backpressure changed over
      time.
      
      This patch teaches how to identify situations where backend pressure increases
      due to:
       - unavailable pipeline resources.
       - data dependencies.
      
      Data dependencies may delay execution of instructions and therefore increase the
      time that uOps have to spend in the scheduler buffers. That often translates to
      an increase in backend pressure which may eventually lead to a bottleneck.
      Contention on pipeline resources may also delay execution of instructions, and
      lead to a temporary increase in backend pressure.
      
      Internally, the Scheduler classifies instructions based on whether register /
      memory operands are available or not.
      
      An instruction is marked as "ready to execute" only if data dependencies are
      fully resolved.
      Every cycle, the Scheduler attempts to execute all instructions that are ready
      to execute. If an instruction cannot execute because of unavailable pipeline
      resources, then the Scheduler internally updates a BusyResourceUnits mask with
      the ID of each unavailable resource.
      
      ExecuteStage is responsible for tracking changes in backend pressure. If backend
      pressure increases during a cycle because of contention on pipeline resources,
      then ExecuteStage sends a "backend pressure" event to the listeners.
      That event would contain information about instructions delayed by resource
      pressure, as well as the BusyResourceUnits mask.
      
      Note that ExecuteStage also knows how to identify situations where backpressure
      increased because of delays introduced by data dependencies.
      
      The SummaryView observes "backend pressure" events and prints out a "bottleneck
      report".
      
      Example of bottleneck report:
      
      ```
      Cycles with backend pressure increase [ 99.89% ]
      Throughput Bottlenecks:
        Resource Pressure       [ 0.00% ]
        Data Dependencies:      [ 99.89% ]
         - Register Dependencies [ 0.00% ]
         - Memory Dependencies   [ 99.89% ]
      ```
      
      A bottleneck report is printed out only if increases in backend pressure
      eventually caused backend stalls.
      
      About the time complexity:
      
      Time complexity is linear in the number of instructions in the
      Scheduler::PendingSet.
      
      The average slowdown tends to be in the range of ~5-6%.
      For memory intensive kernels, the slowdown can be significant if flag
      -noalias=false is specified. In the worst case scenario I have observed a
      slowdown of ~30% when flag -noalias=false was specified.
      
      We can definitely recover part of that slowdown if we optimize class LSUnit (by
      doing extra bookkeeping to speedup queries). For now, this new analysis is
      disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
      of MCA as a library can enable the generation of pressure events through the
      constructor of ExecuteStage.
      
      This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
      
      Differential Revision: https://reviews.llvm.org/D58728
      
      llvm-svn: 355308
      be3281a2
  21. Feb 28, 2019
    • Rong Xu's avatar
      [PGO] Context sensitive PGO (part 2) · a6ff69f6
      Rong Xu authored
      Part 2 of CSPGO changes (mostly related to ProfileSummary).
      Note that I use a default parameter in setProfileSummary() and getSummary().
      This is to break the dependency in clang. I will make the parameter explicit
      after changing clang in a separated patch.
      
      Differential Revision: https://reviews.llvm.org/D54175
      
      llvm-svn: 355131
      a6ff69f6
  22. Feb 25, 2019
    • Roman Lebedev's avatar
      [llvm-exegesis] Split Epsilon param into two (PR40787) · 542e5d7b
      Roman Lebedev authored
      Summary:
      This eps param is used for two distinct things:
      * initial point clusterization
      * checking clusters against the llvm values
      
      What if one wants to only look at highly different clusters, without changing
      the clustering itself? In particular, this helps to weed out noisy measurements
      (since the clusterization epsilon is still small, so there is a better chance
      that noisy measurements from the same opcode will go into different clusters)
      
      By splitting it into two params it is now possible.
      
      This is nearly-free performance-wise:
      Old:
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 10099 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
      
                  390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                      12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                       0      cpu-migrations            #    0.000 K/sec
                    4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
              1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
               185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
               392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
              1839236666      instructions              #    1.18  insn per cycle
                                                        #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
               407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
                10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)
      
                0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
      ```
      ```
      $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 50572 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
      
                 6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
                     262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                       0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
                   13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
             27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
              1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
             16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
             17611143370      instructions              #    0.65  insn per cycle
                                                        #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
              3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
               116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)
      
                  6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
      ```
      New:
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 10099 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
      
                  400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                      12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                       0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
                    4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
              1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
               199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
               402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
              1847783963      instructions              #    1.15  insn per cycle
                                                        #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
               407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
                10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)
      
                 0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )
      
      lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 50572 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
      ...
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
      
                 6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
                     217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                       1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
                   13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
             27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
              1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
             16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
             17755545931      instructions              #    0.64  insn per cycle
                                                        #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
              3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
               117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)
      
                  6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
      ```
      
      I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
      Within noise i'd say.
      
      Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
      
      Reviewers: courbet, gchatelet
      
      Reviewed By: courbet
      
      Subscribers: tschuett, RKSimon, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58476
      
      llvm-svn: 354767
      542e5d7b
  23. Feb 20, 2019
    • Roman Lebedev's avatar
      [llvm-exegesis] Opcode stabilization / reclusterization (PR40715) · 69716394
      Roman Lebedev authored
      Summary:
      Given an instruction `Opcode`, we can make benchmarks (measurements) of the
      instruction characteristics/performance. Then, to facilitate further analysis
      we group the benchmarks with *similar* characteristics into clusters.
      Now, this is all not entirely deterministic. Some instructions have variable
      characteristics, depending on their arguments. And thus, if we do several
      benchmarks of the same instruction `Opcode`, we may end up with *different*
      performance characteristics measurements. And when we then do clustering,
      these several benchmarks of the same instruction `Opcode` may end up being
      clustered into *different* clusters. This is not great for further analysis.
      
      We shall find every `Opcode` with benchmarks not in just one cluster, and move
      *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
      
      I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
      and introducing `-analysis-display-unstable-clusters` switch to toggle between
      displaying stable-only clusters and unstable-only clusters.
      
      The reclusterization is deterministically stable, produces identical reports
      between runs. (Or at least that is what i'm seeing, maybe it isn't)
      
      Timings/comparisons:
      old (current trunk/head) {F8303582}
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      ...
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
      
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
      
                 6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
                     172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                       0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
                   31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
             26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
              2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
             13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
             19770706799      instructions              #    0.74  insn per cycle
                                                        #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
              4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
               121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)
      
                  6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
      ```
      
      patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
      ...
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
      
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
      
                 6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
                     213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                       1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
                   31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
             25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
              1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
             13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
             19752995402      instructions              #    0.76  insn per cycle
                                                        #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
              4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
               121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)
      
                  6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
      ```
      Funnily, *this* measurement shows that said reclustering actually improved performance.
      
      patch, with reclustering, only the stable clusters {F8303594}
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
      ...
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
      
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
      
                 6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
                     133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                       0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
                   31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
             25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
              1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
             13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
             19767554347      instructions              #    0.77  insn per cycle
                                                        #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
              4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
               118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)
      
                  6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
      ```
      Performance improved even further?! Makes sense i guess, less clusters to print.
      
      patch, with reclustering, only the unstable clusters {F8303601}
      ```
      $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
      ...
      no exegesis target for x86_64-unknown-linux-gnu, using default
      Parsed 43970 benchmark points
      Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
      
       Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
      
                 6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
                     194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                       0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
                   31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
             24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
              1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
             13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
             18260877653      instructions              #    0.74  insn per cycle
                                                        #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
              4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
               114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)
      
                  6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
      ```
      This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
      (Also, wow this is fast, it used to take several minutes originally)
      
      Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].
      
      Reviewers: courbet, gchatelet
      
      Reviewed By: courbet
      
      Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D58355
      
      llvm-svn: 354441
      69716394
  24. Feb 19, 2019
    • Vedant Kumar's avatar
      [llvm-cov] Add support for gcov --hash-filenames option · a0b97254
      Vedant Kumar authored
      The patch adds support for --hash-filenames to llvm-cov. This option adds md5
      hash of the source path to the name of the generated .gcov file. The option is
      crucial for cases where you have multiple files with the same name but can't
      use --preserve-paths as resulting filenames exceed the limit.
      
      from gcov(1):
      
      ```
      -x
      --hash-filenames
          By default, gcov uses the full pathname of the source files to to
          create an output filename.  This can lead to long filenames that
          can overflow filesystem limits.  This option creates names of the
          form source-file##md5.gcov, where the source-file component is
          the final filename part and the md5 component is calculated from
          the full mangled name that would have been used otherwise.
      ```
      
      Patch by Igor Ignatev!
      
      Differential Revision: https://reviews.llvm.org/D58370
      
      llvm-svn: 354379
      a0b97254
  25. Feb 18, 2019
  26. Feb 04, 2019
    • Roman Lebedev's avatar
      [llvm-exegesis] Don't default to running&dumping all analyses to '-' · 21193f4b
      Roman Lebedev authored
      Summary:
      Up until the point i have looked in the source, i didn't even understood that
      i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
      (And hoped it wasn't contributing much of the run time.)
      
      While i expect that it has it's use-cases i never once needed it so far.
      If i forget to silence it, console is completely flooded with that output.
      
      How about not expecting users to opt-out of analyses,
      but to explicitly specify the analyses that should be performed?
      
      Reviewers: courbet, gchatelet
      
      Reviewed By: courbet
      
      Subscribers: tschuett, RKSimon, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D57648
      
      llvm-svn: 353021
      21193f4b
  27. Jan 30, 2019
  28. Jan 29, 2019
  29. Jan 25, 2019
    • James Henderson's avatar
      [llvm-symbolizer] Add switch to adjust addresses by fixed offset · 759d5e67
      James Henderson authored
      If a stack trace or similar has a list of addresses from an executable
      or DSO loaded at a variable address (e.g. due to ASLR), the addresses
      will not directly correspond to the addresses stored in the object file.
      If a user wishes to use llvm-symbolizer, they have to subtract the load
      address from every address. This is somewhat inconvenient, especially as
      the output of --print-address will result in the adjusted address being
      listed, rather than the address coming from the stack trace, making it
      harder to map results between the two.
      
      This change adds a new switch to llvm-symbolizer --adjust-vma which
      takes an offset, which is then used to automatically do this
      calculation. The printed address remains the input address (allowing for
      easy mapping), whilst the specified offset is applied to the addresses
      when performing the lookup.
      
      The switch is conceptually similar to llvm-objdump's new switch of the
      same name (see D57051), which in turn mirrors a GNU switch. There is no
      equivalent switch in addr2line.
      
      Reviewed by: grimar
      
      Differential Revision: https://reviews.llvm.org/D57151
      
      llvm-svn: 352195
      759d5e67
  30. Jan 24, 2019
  31. Jan 23, 2019
    • James Henderson's avatar
      [llvm-symbolizer] Improve compatibility of --functions with GNU addr2line · 25ce596c
      James Henderson authored
      This fixes https://bugs.llvm.org/show_bug.cgi?id=40072.
      
      GNU addr2line's --functions switch is off by default, has a short alias
      of -f, and does not take an argument. This patch changes llvm-symbolizer
      to allow the second and third point (changing the default behaviour may
      have negative impacts on users). If the option is missing a value, it
      now treats it as "linkage".
      
      This change does cause one previously valid command-line to behave
      differently. Before --functions <value> was accepted, but now only
      --functions=<value> is allowed (as well as --functions). The old
      behaviour will result in the value being treated as a positional
      argument.
      
      The previous testing for --functions=short has been pulled out into a
      new test that also tests the other accepted values and option formats.
      
      Reviewed by: ruiu
      
      Differential Revision: https://reviews.llvm.org/D57049
      
      llvm-svn: 351968
      25ce596c
  32. Jan 22, 2019
    • Joel E. Denny's avatar
      [FileCheck] Suppress old -v/-vv diags if dumping input · 352695c3
      Joel E. Denny authored
      The old diagnostic form of the trace produced by -v and -vv looks
      like:
      
      ```
      check1:1:8: remark: CHECK: expected string found in input
      CHECK: abc
             ^
      <stdin>:1:3: note: found here
      ; abc def
        ^~~
      ```
      
      When dumping annotated input is requested (via -dump-input), I find
      that this old trace is not useful and is sometimes harmful:
      
      1. The old trace is mostly redundant because the same basic
         information also appears in the input dump's annotations.
      
      2. The old trace buries any error diagnostic between it and the input
         dump, but I find it useful to see any error diagnostic up front.
      
      3. FILECHECK_OPTS=-dump-input=fail requests annotated input dumps only
         for failed FileCheck calls.  However, I have to also add -v or -vv
         to get a full set of annotations, and that can produce massive
         output from all FileCheck calls in all tests.  That's a real
         problem when I run this in the IDE I use, which grinds to a halt as
         it tries to capture all that output.
      
      When -dump-input=fail|always, this patch suppresses the old trace from
      -v or -vv.  Error diagnostics still print as usual.  If you want the
      old trace, perhaps to see variable expansions, you can set
      -dump-input=none (the default).
      
      Reviewed By: probinson
      
      Differential Revision: https://reviews.llvm.org/D55825
      
      llvm-svn: 351881
      352695c3
    • James Henderson's avatar
      [llvm-symbolizer] Add support for --basenames/-s · 33c16a3f
      James Henderson authored
      This fixes https://bugs.llvm.org/show_bug.cgi?id=40068.
      
      --basenames is a GNU addr2line switch which strips the directory names
      from the file path in the output.
      
      Reviewed by: ruiu
      
      Differential Revision: https://reviews.llvm.org/D56919
      
      llvm-svn: 351795
      33c16a3f
  33. Jan 21, 2019
  34. Jan 17, 2019
    • James Henderson's avatar
      [llvm-readobj][ELF]Add demangling support · e50d9cb3
      James Henderson authored
      This change adds demangling support to the ELF side of llvm-readobj,
      under the switch --demangle/-C.
      
      The following places are demangled: symbol table dumps (static and
      dynamic), relocation dumps (static and dynamic), addrsig dumps, call
      graph profile dumps, and group section signature symbols.
      
      Although GNU readelf doesn't support demangling, it is still a useful
      feature to have, and brings it on a par with llvm-objdump's
      capabilities.
      
      This fixes https://bugs.llvm.org/show_bug.cgi?id=40054.
      
      Reviewed by: grimar, rupprecht
      
      Differential Revision: https://reviews.llvm.org/D56791
      
      llvm-svn: 351450
      e50d9cb3
  35. Jan 16, 2019
  36. Jan 15, 2019
    • Michael Trent's avatar
      llvm-objdump -m -D should disassemble all text segments · 7e660211
      Michael Trent authored
      Summary:
      When running llvm-objdump with the -macho option objdump will by default
      disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when
      disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is
      treated no diferently than -disassemble.
      
      This change upates llvm-objdump's MachO parsing code to disassemble all
      __text sections found in a file when -disassemble-all is specified. This
      is useful for disassembling files with more than one __text section, or
      when disassembling files whose __text section is not present in __TEXT.
      
      I added a lit test case that verifies "llvm-objdump -m -d" and 
      "llvm-objdump -m -D" produce the expected results on a reference binary. 
      I also updated the CommandGuide documentation for llvm-objdump.rst and
      verified it renders correctly as man and html.
      
      rdar://42899338
      
      Reviewers: ab, pete, lhames
      
      Reviewed By: lhames
      
      Subscribers: rupprecht, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D56649
      
      llvm-svn: 351238
      7e660211
Loading