Commits · d6317d22a92d596731b1dd9d4a8010879f3e43d9 · Lorenzo Albano / LLVM bpEVL

Jan 19, 2019

Convert two more files that were using Windows line endings and remove · 4a50956c

Chandler Carruth authored Jan 19, 2019

a stray single '\r' from one file. These are the last line ending issues
I can find in the files containing parts of LLVM's file headers.

llvm-svn: 351634

4a50956c

Enable IPConstantPropagation to work with abstract call sites · 36872b5d

Johannes Doerfert authored Jan 19, 2019

This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.

Differential Revision: https://reviews.llvm.org/D56447

llvm-svn: 351628

36872b5d

AbstractCallSite -- A unified interface for (in)direct and callback calls · 18251842

Johannes Doerfert authored Jan 19, 2019

  An abstract call site is a wrapper that allows to treat direct,
  indirect, and callback calls the same. If an abstract call site
  represents a direct or indirect call site it behaves like a stripped
  down version of a normal call site object. The abstract call site can
  also represent a callback call, thus the fact that the initially
  called function (=broker) may invoke a third one (=callback callee).
  In this case, the abstract call side hides the middle man, hence the
  broker function. The result is a representation of the callback call,
  inside the broker, but in the context of the original instruction that
  invoked the broker.

  Again, there are up to three functions involved when we talk about
  callback call sites. The caller (1), which invokes the broker
  function. The broker function (2), that may or may not invoke the
  callback callee. And finally the callback callee (3), which is the
  target of the callback call.

  The abstract call site will handle the mapping from parameters to
  arguments depending on the semantic of the broker function. However,
  it is important to note that the mapping is often partial. Thus, some
  arguments of the call/invoke instruction are mapped to parameters of
  the callee while others are not. At the same time, arguments of the
  callback callee might be unknown, thus "null" if queried.

  This patch introduces also !callback metadata which describe how a
  callback broker maps from parameters to arguments. This metadata is
  directly created by clang for known broker functions, provided through
  source code attributes by the user, or later deduced by analyses.

For motivation and additional information please see the corresponding
talk (slides/video)
  https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk20
as well as the LCPC paper
  http://compilers.cs.uni-saarland.de/people/doerfert/par_opt_lcpc18.pdf

Differential Revision: https://reviews.llvm.org/D54498

llvm-svn: 351627

18251842

Reapply "[CGP] Check for existing inttotpr before creating new one" · a0383d6c
Roman Tereshin authored Jan 19, 2019
```
Original commit: r351582

llvm-svn: 351626
```
a0383d6c

[MergeFunc] Allow merging identical vararg functions using aliases · b537b946

Vedant Kumar authored Jan 19, 2019

Thanks to Nikita Popov for pointing out this missed case.

This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).

Differential Revision: https://reviews.llvm.org/D56865

llvm-svn: 351624

b537b946

[HotColdSplit] Mark inherently cold functions as such · b755a2df

Vedant Kumar authored Jan 19, 2019

If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.

As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.

llvm-svn: 351623

b755a2df

[HotColdSplit] Remove a set which tracked split functions (NFC) · 4de1962b

Vedant Kumar authored Jan 19, 2019

Use the begin/end iterator idiom to avoid visiting split functions,
instead of doing a set lookup.

llvm-svn: 351622

4de1962b

[CodeExtractor] Emit lifetime markers around reloads of outputs · 17d9f14b

Vedant Kumar authored Jan 19, 2019

CodeExtractor permits extracting a region of blocks from a function even
when values defined within the region are used outside of it.

This is typically done by creating an alloca in the original function
and reloading the alloca after a call to the extracted function.

Wrap the reload in lifetime start/end markers to promote stack coloring.

Suggested by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D56045

llvm-svn: 351621

17d9f14b

Revert "Reapply "[CGP] Check for existing inttotpr before creating new one"" · 022bf3e8

Roman Tereshin authored Jan 19, 2019

This reverts commit r351618.

Compiler RT + ASAN tests are failing for PowerPC. Not sure
how would I reproduce these on macOS, so reverting (again)
until I do.

llvm-svn: 351619

022bf3e8

Reapply "[CGP] Check for existing inttotpr before creating new one" · dd6f9f68
Roman Tereshin authored Jan 19, 2019
```
Original commit: r351582

llvm-svn: 351618
```
dd6f9f68

Revert r351584: "GlobalISel: Verify g_zextload and g_sextload" · d5015edb

Amara Emerson authored Jan 19, 2019

This new assertion triggered on the AArch64 GlobalISel bots. Reverting while it's being investigated.

llvm-svn: 351617

d5015edb

[X86] Deduplicate static calling convention helpers for code size, NFC · 38f9900a

Reid Kleckner authored Jan 19, 2019

Summary:
Right now we include ${TGT}GenCallingConv.inc once per each instruction
selection method implemented by ${TGT}:
- ${TGT}ISelLowering.cpp
- ${TGT}CallLowering.cpp
- ${TGT}FastISel.cpp

Instead, add a mechanism to tablegen for marking a particular convention
as "External", which causes tablegen to emit into the ::llvm namespace,
instead of as a static helper. This allows us to provide a header to
forward declare it, so we can simply call the function from all the
places it is referenced. Typically the calling convention analyzer is
called indirectly, so it doesn't benefit from inlining.

This saves a bit of final binary size, but mostly just saves object file
size:

before  after   diff   artifact
12852K  12492K  -360K  X86ISelLowering.cpp.obj
4640K   4280K   -360K  X86FastISel.cpp.obj
1704K   2092K   +388K  X86CallingConv.cpp.obj
52448K  52336K  -112K  llc.exe

I didn't collect before numbers for X86CallLowering.cpp.obj, which is
for GlobalISel, but we should save 360K there as well.

This patch applies the strategy to the X86 backend, but there is no
reason it couldn't be applied to the other backends that implement
multiple ISel strategies, like AArch64.

Reviewers: craig.topper, hfinkel, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56883

llvm-svn: 351616

38f9900a

Remove F_modify flag from FileOutputBuffer. · 8e7600dc

Rui Ueyama authored Jan 19, 2019

This code is dead. There is no use of the feature in the entire LLVM codebase.

Differential Revision: https://reviews.llvm.org/D56939

llvm-svn: 351613

8e7600dc

Jan 18, 2019

AMDGPU/GlobalISel: Legalize more types for select · 96e47014
Matt Arsenault authored Jan 18, 2019
```
llvm-svn: 351599
```
96e47014
Revert "[CGP] Check for existing inttotpr before creating new one" · 86ac5326
Roman Tereshin authored Jan 18, 2019
```
This reverts commit r351582.

Bots are failing. Reverting this to fix and re-commit later.

llvm-svn: 351598
```
86ac5326
AMDGPU/GlobalISel: Legalize illegal g_constant · 4599159a
Matt Arsenault authored Jan 18, 2019
```
llvm-svn: 351596
```
4599159a
GlobalISel: Verify G_BITCAST · bd3a5b29
Matt Arsenault authored Jan 18, 2019
```
llvm-svn: 351594
```
bd3a5b29
GlobalISel: Verify G_ICMP/G_FCMP vector types · 215c4f68
Matt Arsenault authored Jan 18, 2019
```
llvm-svn: 351591
```
215c4f68

AMDGPU: Remove llvm.SI.load.const · 85af701e

Matt Arsenault authored Jan 18, 2019

It's taken 3 years, but now all of the old AMDGPU and SI intrinsics
are finally gone

llvm-svn: 351586

85af701e

GlobalISel: Verify g_zextload and g_sextload · f67ae611
Matt Arsenault authored Jan 18, 2019
```
llvm-svn: 351584
```
f67ae611

[X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of... · 08d3d32e

Craig Topper authored Jan 18, 2019

[X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of going directly to MachineSDNode.

This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.

llvm-svn: 351583

08d3d32e

[CGP] Check for existing inttotpr before creating new one · 85a0467a

Roman Tereshin authored Jan 18, 2019

Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.

This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56838

llvm-svn: 351582

85a0467a

[SelectionDAG] Updates for -dag-dump-verbose · d4023bd2

Bjorn Pettersson authored Jan 18, 2019

Summary:
This patch makes some changes related to -dag-dump-verbose.
Main use case has been when debugging how SelectionDAG is
dealing with debug info (SDDbgValue nodes).

1) We now print the number of DbgValues that are mapped to each
   SDNode.
2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is
   printed also when not using -dag-dump-verbose).
3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a
   new SDDbgValue::dump that will start a new line after calling
   print.
4) SDDbgValue::print now prints "Order", and it also prints
   some additional information when kind is CONST/FRAMEIX/VREG.
5) SelectionDAG::dump() now dumps all SDDbgValue nodes after
   the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s).
   Invalidated nodes are not printed.
6) Prohibit inline printing of SDNode operands that has SDDbgValue
   nodes associated to them.

Reviewers: jmorse, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56793

llvm-svn: 351581

d4023bd2

Fix the buildbot issue introduced by r351421 · eaa421d1

Sanjin Sijaric authored Jan 18, 2019

The EXPENSIVE_CHECK x86_64 Windows buildbot is failing due to this change. Fix
the map access.

llvm-svn: 351577

eaa421d1

[GlobalISel] Change to range-based invocation of llvm::sort · f0e99779
Mandeep Singh Grang authored Jan 18, 2019
```
llvm-svn: 351574
```
f0e99779

[SelectionDAG] Split very large token factors for chained stores to 64k chunks. · dc4e1547

Florian Hahn authored Jan 18, 2019

Similar to D55073. Without this change, the DAG combiner crashes on code
with more than 64k of stores in a single basic block that form parallelizable
chains.

No test case, as it would be very IR file.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D56740

llvm-svn: 351571

dc4e1547

[X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of... · b9d4461f

Craig Topper authored Jan 18, 2019

[X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.:

This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field.

Differential Revision: https://reviews.llvm.org/D56827

llvm-svn: 351570

b9d4461f

[LCSSA] Skip blocks in sub-loops when scanning for uses. · be7cbe3f

Florian Hahn authored Jan 18, 2019

Summary:
Scanning blocks in sub-loops for uses is unnecessary, as they were
already handled while dealing with the containing sub-loop.

This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
halves the time spent in LCSSA. In cases were we won't be able to skip
any blocks, the additional lookup should be negligible.

Time-passes without this patch for test case from PR37202:

  Total Execution Time: 48.5505 seconds (48.5511 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.0822 ( 21.0%)   0.1406 ( 27.0%)  10.2228 ( 21.1%)  10.2228 ( 21.1%)  Loop-Closed SSA Form Pass
  10.0417 ( 20.9%)   0.1467 ( 28.2%)  10.1884 ( 21.0%)  10.1890 ( 21.0%)  Loop-Closed SSA Form Pass #2
   4.2703 (  8.9%)   0.0040 (  0.8%)   4.2742 (  8.8%)   4.2742 (  8.8%)  Unswitch loops
   2.7376 (  5.7%)   0.0229 (  4.4%)   2.7605 (  5.7%)   2.7611 (  5.7%)  Loop-Closed SSA Form Pass #5
   2.7332 (  5.7%)   0.0214 (  4.1%)   2.7546 (  5.7%)   2.7546 (  5.7%)  Loop-Closed SSA Form Pass #3
   2.7088 (  5.6%)   0.0230 (  4.4%)   2.7319 (  5.6%)   2.7324 (  5.6%)  Loop-Closed SSA Form Pass #4
   2.6855 (  5.6%)   0.0236 (  4.5%)   2.7091 (  5.6%)   2.7090 (  5.6%)  Loop-Closed SSA Form Pass #6
   2.1648 (  4.5%)   0.0018 (  0.4%)   2.1666 (  4.5%)   2.1664 (  4.5%)  Unroll loops
   1.8371 (  3.8%)   0.0009 (  0.2%)   1.8379 (  3.8%)   1.8380 (  3.8%)  Value Propagation
   1.8149 (  3.8%)   0.0021 (  0.4%)   1.8170 (  3.7%)   1.8169 (  3.7%)  Loop Invariant Code Motion
   1.6755 (  3.5%)   0.0226 (  4.3%)   1.6981 (  3.5%)   1.6980 (  3.5%)  Loop-Closed SSA Form Pass #7

Time-passes with this patch

  Total Execution Time: 29.9285 seconds (29.9276 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
   4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
   4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
   2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
   2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
   1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
   1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
   1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
   1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
   1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
   1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide, efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56848

llvm-svn: 351567

be7cbe3f

[AMDGPU] Add some missing always-uniform values. · 3ed09f8e

Neil Henning authored Jan 18, 2019

This commit adds some missing intrinsics into the isAlwaysUniform list
for the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D56845

llvm-svn: 351562

3ed09f8e

[SelectionDAGBuilder] Cleanup InlineAsm Output generation. NFCI. · 0a45bf0e

Nirav Dave authored Jan 18, 2019

Defer inline asm's output fixup work until after we've generated the
inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and
RetValRegs in favor of using ConstraintOperands.

llvm-svn: 351558

0a45bf0e

[x86] simplify code for SDValue.getOperand(); NFC · b6c91a1a
Sanjay Patel authored Jan 18, 2019
```
llvm-svn: 351557
```
b6c91a1a

[AMDGPU][MC][GFX8+][DISASSEMBLER] Corrected 1/2pi value for 64-bit operands · 6bc26aaa

Dmitry Preobrazhensky authored Jan 18, 2019

See bug 39332: https://bugs.llvm.org/show_bug.cgi?id=39332

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D56794

llvm-svn: 351555

6bc26aaa

[SelectionDAG] Add getTokenFactor, which splits nodes with > 64k operands. · d2c733b4

Florian Hahn authored Jan 18, 2019

This functionality is required at multiple places which potentially
create large operand lists, like SelectionDAGBuilder or DAGCombiner.

Differential Revision: https://reviews.llvm.org/D56739

llvm-svn: 351552

d2c733b4

Add __[_[_]]Z demangling to new common demangle function · f5356944

James Henderson authored Jan 18, 2019

This is a follow-up to r351448. It adds support for other _*Z extensions
of the Itanium demanling, to the newly available demangle function
heuristic.

Reviewed by: erik.pilkington, rupprecht, grimar

Differential Revision: https://reviews.llvm.org/D56855

llvm-svn: 351551

f5356944

[AMDGPU][MC] Disabled use of 2 different literals with SOP2/SOPC instructions · 61105bab

Dmitry Preobrazhensky authored Jan 18, 2019

See bug 39319: https://bugs.llvm.org/show_bug.cgi?id=39319

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D56847

llvm-svn: 351549

61105bab

[ADT] Add streaming operators for llvm::Optional · 47e9a21d

Pavel Labath authored Jan 18, 2019

Summary:
The operators simply print the underlying value or "None".

The trickier part of this patch is making sure the streaming operators
work even in unit tests (which was my primary motivation, though I can
also see them being useful elsewhere). Since the stream operator was a
template, implicit conversions did not kick in, and our gtest glue code
was explicitly introducing an implicit conversion to make sure other
implicit conversions do not kick in :P. I resolve that by specializing
llvm_gtest::StreamSwitch for llvm:Optional<T>.

Reviewers: sammccall, dblaikie

Reviewed By: sammccall

Subscribers: mgorny, dexonsmith, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D56795

llvm-svn: 351548

47e9a21d

[AVR] Fix codegen bug in 16-bit loads · 77364be4

Dylan McKay authored Jan 18, 2019

Prior to this patch, the AVR::LDWRdPtr instruction was always lowered to
instructions of this pattern:

    ld  $GPR8, [PTR:XYZ]+
    ld  $GPR8, [PTR]+1

This has a problem; the [PTR] is incremented in-place once, but never
decremented.

Future uses of the same pointer will use the now clobbered value,
leading to the pointer being incorrect by an offset of one.

This patch modifies the expansion code of the LDWRdPtr pseudo
instruction so that the pointer variable is not silently clobbered in
future uses in the same live range.

Patch by Keshav Kini.

llvm-svn: 351544

77364be4

[SelectionDAG] Add static getMaxNumOperands function to SDNode. · 1b817723

Florian Hahn authored Jan 18, 2019

Summary:
Use this helper to make sure we use the same value at various places.
This will likely be needed at more places were we currently crash
because we use more operands than possible.

Also makes it easier to change in the future.

Reviewers: RKSimon, craig.topper, efriedma, aemerson

Reviewed By: RKSimon

Subscribers: hiraditya, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D56859

llvm-svn: 351537

1b817723

[ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent · e84c729a

Shiva Chen authored Jan 18, 2019

We should not pre-scheduled the node has ADJCALLSTACKDOWN parent,
or else, when bottom-up scheduling, ADJCALLSTACKDOWN and
ADJCALLSTACKUP may hold CallResource too long and make other
calls can't be scheduled. If there's no other available node
to schedule, the scheduler will try to rename the register by
creating copy to avoid the conflict which will fail because
CallResource is not a real physical register.

llvm-svn: 351527

e84c729a

[AVR] Rewrite the CBRRdK instruction as an alias of ANDIRdK · d770da98

Dylan McKay authored Jan 18, 2019

The CBR instruction is just an ANDI instruction with the immediate
complemented.

Because of this, prior to this change TableGen would warn due to a
decoding conflict.

This commit fixes the existing compilation warning:

  ===============
  [423/492] Building AVRGenDisassemblerTables.inc...
  Decoding Conflict:
                  0111............
                  01..............
                  ................
          ANDIRdK 0111____________
          CBRRdK 0111____________
  ================

After this commit, there are no more decoding conflicts in the AVR
backend's instruction definitions.

Thanks to Eli F for pointing me torward `t2_so_imm_not` as an example of
how to perform a complement in an instruction alias.

Fixes BugZilla PR38802.

llvm-svn: 351526

d770da98