Commits · 351314a14f70246f1e873fe5bc04cbd70cbd6d92 · Roger Ferrer / llvm-epi

Feb 06, 2019

[cmake] Add all subprojects to LLVM_ALL_PROJECTS · 351314a1

Shoaib Meenai authored Feb 06, 2019

Make LLVM_ALL_PROJECTS reflect all top-level directories in the monorepo
rather than an arbitrary subset. clang-tools-extra is technically
unnecessary since it gets enabled by clang, but having it there for
consistency shouldn't hurt either.

Differential Revision: https://reviews.llvm.org/D57843

llvm-svn: 353346

351314a1

[PowerPC] Add vector truncate test to prep for D56507 NFC · 42f58498
Roland Froese authored Feb 06, 2019
```
llvm-svn: 353344
```
42f58498
[cmake] Add openmp to LLVM_ALL_PROJECTS · af8eadd9
Shoaib Meenai authored Feb 06, 2019
```
It'll get ignored in LLVM_ENABLE_PROJECTS after r353148 otherwise.

llvm-svn: 353343
```
af8eadd9
[libObject][NFC] Include filename in error message · d3a7e9d1
Jordan Rupprecht authored Feb 06, 2019
```
llvm-svn: 353341
```
d3a7e9d1

[LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA. · 6cba96ed

Alina Sbirlea authored Feb 06, 2019

Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339

6cba96ed

[DAG] Immediately cleanup unused nodes from extend-based combines. · b3506bf9
Nirav Dave authored Feb 06, 2019
```
llvm-svn: 353338
```
b3506bf9

Move IR flag handling directly into builder calls for cases translated from... · f0d81a31

Michael Berg authored Feb 06, 2019

Move IR flag handling directly into builder calls for cases translated from Instructions in GlobalIsel

Reviewers: aditya_nandakumar, volkan

Reviewed By: aditya_nandakumar

Subscribers: rovka, kristof.beyls, volkan, Petar.Avramovic

Differential Revision: https://reviews.llvm.org/D57630

llvm-svn: 353336

f0d81a31

[AliasSetTracker] Pass MustAlias to addPointer more often. · 910c6bef

Alina Sbirlea authored Feb 06, 2019

Summary:
Pass the alias info to addPointer when available. Will save an alias()
call for must sets when adding a known Must or May alias.
[Part of a series of cleanup patches]

Reviewers: reames, mkazantsev

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D56613

llvm-svn: 353335

910c6bef

[X86] Change the CPU on the test case for pr40529.ll to really show the bug. NFC · 1c7ee208
Craig Topper authored Feb 06, 2019
```
llvm-svn: 353334
```
1c7ee208

[X86][DAG] Avoid creating dangling bitcast. · c6bfa103

Nirav Dave authored Feb 06, 2019

combineExtractWithShuffle may leave a dangling bitcast which may
prevent further optimization in later passes. Avoid constructing it
unless it is used.

llvm-svn: 353333

c6bfa103

[x86] add tests for horizontal ops (PR38971, PR33758); NFC · 29a710be
Sanjay Patel authored Feb 06, 2019
```
llvm-svn: 353332
```
29a710be

[SystemZ] Improved handling of the @llvm.ctlz intrinsic. · b21dde05

Jonas Paulsson authored Feb 06, 2019

Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.

ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.

Review: Ulrich Weigand
https://reviews.llvm.org/D57710

llvm-svn: 353330

b21dde05

build: Remove the cmake check for malloc.h. · 02fc3c69

Peter Collingbourne authored Feb 06, 2019

As far as I can tell, malloc.h is only being used here to provide
a definition of mallinfo (malloc itself is declared in stdlib.h via
cstdlib). We already have a macro for whether mallinfo is available,
so switch to using that instead.

Differential Revision: https://reviews.llvm.org/D57807

llvm-svn: 353329

02fc3c69

[SystemZ] Wait with VGBM selection until after DAGCombine2. · 8cda83a5

Jonas Paulsson authored Feb 06, 2019

Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.

For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.

The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.

Review: Ulrich Weigand
https://reviews.llvm.org/D57152

llvm-svn: 353325

8cda83a5

[opt-viewer] Add --filter option to select remarks for displaying. · 169f6423

Florian Hahn authored Feb 06, 2019

This allows limiting the displayed remarks to the ones with names
matching the filter (regular) expression.

Generating html pages for a larger project with optimization remarks can
result in a huge HTML documents and using --filter allows to focus on a
set of interesting remarks.

Reviewers: hfinkel, anemet, thegameg, serge-sans-paille

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D57827

llvm-svn: 353322

169f6423

[SelectionDAG] Cleanup some code comments. NFC · 350352c8

Bjorn Pettersson authored Feb 06, 2019

Don't repeat the function name in some doxygen
comments.

(Just a minor cleanup, while testing to push
from the git monorepo setup.)

llvm-svn: 353317

350352c8

[GlobalISel][NFC] Gardening: Factor out code for simple unary intrinsics · e288c526

Jessica Paquette authored Feb 06, 2019

There was a lot of repeated code wrt unary math intrinsics in
translateKnownIntrinsic. This factors out the repeated MIRBuilder code into
two functions: translateSimpleUnaryIntrinsic and getSimpleUnaryIntrinsicOpcode.

This simplifies adding simple unary intrinsics, since after this, all you have
to do is add the mapping to SimpleUnaryIntrinsicOpcodes.

Differential Revision: https://reviews.llvm.org/D57774

llvm-svn: 353316

e288c526

[yaml2obj]Allow number for ELF symbol type · c836e488

James Henderson authored Feb 06, 2019

yaml2obj previously only recognised standard STT_* names, and didn't
allow arbitrary numbers. This change allows the user to specify a number
for the type instead. It also adds a test to verify the existing
behaviour for obj2yaml for unkown symbol types.

Reviewed by: grimar

Differential Revision: https://reviews.llvm.org/D57822

llvm-svn: 353315

c836e488

[InstCombine] X | C == C --> (X & ~C) == 0 · 68bc5fb0

Sanjay Patel authored Feb 06, 2019

We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353313

68bc5fb0

[InstCombine] add tests for PR40611 and regenerate checks; NFC · 51abb86f
Sanjay Patel authored Feb 06, 2019
```
Lots of unrelated diffs here from the newer version of the script.

llvm-svn: 353312
```
51abb86f

AArch64: enforce even/odd register pairs for CASP instructions. · 474f5d9b

Tim Northover authored Feb 06, 2019

ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308

474f5d9b

[InlineAsm][X86] Add backend support for X86 flag output parameters. · e5c37958
Nirav Dave authored Feb 06, 2019
```
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.

llvm-svn: 353307
```
e5c37958
[SelectionDAGBuilder] Refactor Inline Asm output check. NFCI. · 54511076
Nirav Dave authored Feb 06, 2019
```
llvm-svn: 353305
```
54511076

[SystemZ] Do not return INT_MIN from strcmp/memcmp · 17a00126

Ulrich Weigand authored Feb 06, 2019

The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case.  While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.

This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC).  However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.

llvm-svn: 353304

17a00126

AArch64: annotate atomics with dropped acquire semantics when printing. · 71025a2f

Tim Northover authored Feb 06, 2019

A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.

So this adds an annotation when disassembling such instructions, mentioning the
change.

llvm-svn: 353303

71025a2f

[x86] vectorize cast ops in lowering to avoid register file transfers · e84fbb67

Sanjay Patel authored Feb 06, 2019

The proposal in D56796 may cross the line because we're trying to avoid vectorization 
transforms in generic DAG combining. So this is an alternate, later, x86-specific 
translation of that patch.

There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P

Differential Revision: https://reviews.llvm.org/D56864

llvm-svn: 353302

e84fbb67

[MCA] Speedup ResourceManager queries. NFCI · 02974728

Andrea Di Biagio authored Feb 06, 2019

When a resource unit R is released, the ResourceManager notifies groups that
contain R. Before this patch, the logic in method ResourceManager::release()
implemented a potentially slow iterative search of dependent groups on the
entire set of processor resources.
This patch replaces that logic with a simpler (and often faster) lookup on array
`Resource2Groups`.  This patch gives an average speedup of ~3-4% (observed on a
release build when testing for target btver2).
No functional change intended.

llvm-svn: 353301

02974728

gn build: Merge r353265, r353237 · da2bb5d5
Nico Weber authored Feb 06, 2019
```
llvm-svn: 353298
```
da2bb5d5
Attempt to fix buildbot after r353289 · ef6eba24
Eugene Leviant authored Feb 06, 2019
```
llvm-svn: 353294
```
ef6eba24

[DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode. · 5a6712b6

Clement Courbet authored Feb 06, 2019

GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with
static typing instead of runtime cast.

llvm-svn: 353291

5a6712b6

[NFC] Simplify check in guard widening · cd48ac36
Max Kazantsev authored Feb 06, 2019
```
llvm-svn: 353290
```
cd48ac36
[llvm-objcopy] Allow regular expressions in name comparison · f324f6dc
Eugene Leviant authored Feb 06, 2019
```
Differential revision: https://reviews.llvm.org/D57517

llvm-svn: 353289
```
f324f6dc

[DebugInfo]Print correct value for special opcode address increment · b6b5b1a5

James Henderson authored Feb 06, 2019

The wrong variable was being used when printing the address increment in
verbose output of .debug_line. This patch fixes this.

Reviewed by: JDevlieghere

Differential Revision: https://reviews.llvm.org/D57693

llvm-svn: 353288

b6b5b1a5

[DebugInfo][llvm-symbolizer]Add some tests for edge cases when symbolizing · cd1424ae

James Henderson authored Feb 06, 2019

This patch adds half a dozen new tests that test various edge cases in
the behaviour of the symbolizer and DWARF data parsing. All of them test
the current behaviour.

Reviewed by: JDevlieghere, aprantl

Differential Revision: https://reviews.llvm.org/D57741

llvm-svn: 353286

cd1424ae

[yaml::BinaryRef] Slight perf tuning (for llvm-exegesis analysis mode) · 41828010

Roman Lebedev authored Feb 06, 2019

Summary:
llvm-exegesis uses this functionality to read it's benchmark dumps.
This reading of `.yaml`s takes ~60% of runtime for 14656 benchmark points (i.e. one sweep over all x86 instructions),
but only 30% of time for 3x as much benchmark points.

In particular, this `BinaryRef` appears to be an obvious pain point.
Without patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-orig.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-orig.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-orig.html' (25 runs):

            972.86 msec task-clock                #    0.994 CPUs utilized            ( +-  0.25% )
                30      context-switches          #   30.774 M/sec                    ( +- 21.74% )
                 0      cpu-migrations            #    0.370 M/sec                    ( +- 67.81% )
             11873      page-faults               # 12211.512 M/sec                   ( +-  0.00% )
        3898373408      cycles                    # 4009682.186 GHz                   ( +-  0.25% )  (83.12%)
         360399748      stalled-cycles-frontend   #    9.24% frontend cycles idle     ( +-  0.54% )  (83.24%)
        1099450483      stalled-cycles-backend    #   28.20% backend cycles idle      ( +-  0.59% )  (33.63%)
        4910528820      instructions              #    1.26  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.13% )  (50.21%)
        1111976775      branches                  # 1143726625.854 M/sec              ( +-  0.10% )  (66.77%)
          23248474      branch-misses             #    2.09% of all branches          ( +-  0.19% )  (83.29%)

           0.97850 +- 0.00647 seconds time elapsed  ( +-  0.66% )
```
With the patch:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            905.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.11% )
                15      context-switches          #   16.533 M/sec                    ( +- 32.27% )
                 0      cpu-migrations            #    0.000 K/sec
             11873      page-faults               # 13121.789 M/sec                   ( +-  0.00% )
        3627759720      cycles                    # 4009283.100 GHz                   ( +-  0.11% )  (83.19%)
         370401480      stalled-cycles-frontend   #   10.21% frontend cycles idle     ( +-  0.22% )  (83.19%)
        1007114438      stalled-cycles-backend    #   27.76% backend cycles idle      ( +-  0.34% )  (33.62%)
        4414014304      instructions              #    1.22  insn per cycle
                                                  #    0.23  stalled cycles per insn  ( +-  0.08% )  (50.36%)
        1003751700      branches                  # 1109314021.971 M/sec              ( +-  0.07% )  (66.97%)
          24611010      branch-misses             #    2.45% of all branches          ( +-  0.10% )  (83.41%)

           0.90593 +- 0.00105 seconds time elapsed  ( +-  0.12% )
```
So this decreases the overall run time of llvm-exegesis analysis mode (on one sweep) by roughly -7%.

To be noted, `BinaryRef::writeAsBinary()` change is the reason for the perf changes,
usage of `llvm::isHexDigit()` instead of `isxdigit()` does not appear to have any perf impact,
i have only changed it "for symmetry".

`writeAsBinary()` change is correct, it produces identical de-hex-ified buffer, and the final output is thus identical:
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-new.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-orig.html
```

Reviewers: silvas, espindola, sbc100, zturner, courbet, gchatelet

Reviewed By: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57699

llvm-svn: 353282

41828010

Fix misspelled filenames in file headers of llvm/{MC,Object,CodeGen}/*.h · b8ee8c85
Fangrui Song authored Feb 06, 2019
```
llvm-svn: 353278
```
b8ee8c85
[NFC] Factor out detatchment of dead blocks from their erasing · 36b392cb
Max Kazantsev authored Feb 06, 2019
```
llvm-svn: 353277
```
36b392cb
[LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simpler · a4ccfc18
Max Kazantsev authored Feb 06, 2019
```
llvm-svn: 353276
```
a4ccfc18
[NFC] Revert rL353274 · 0d7ad3c9
Max Kazantsev authored Feb 06, 2019
```
llvm-svn: 353275
```
0d7ad3c9
[NFC] Extend API of DeleteDeadBlock(s) to collect updates without DTU · 61e6ffc3
Max Kazantsev authored Feb 06, 2019
```
llvm-svn: 353274
```
61e6ffc3