Commits · 4ba041fa25b7e34bd10c598b53b04f023681b322 · Roger Ferrer / llvm-epi

Jun 28, 2018

[X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT... · ec5d568a

Craig Topper authored Jun 28, 2018

[X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded.

This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.

I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.

llvm-svn: 335806

ec5d568a

[X86] Change how we prefer shift by immediate over folding a load into a shift. · ab70f588

Craig Topper authored Jun 28, 2018

BMI2 added new shift by register instructions that have the ability to fold a load.

Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.

We used to enforce this priority by artificially lowering the complexity of the load pattern.

This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.

llvm-svn: 335804

ab70f588

[cmake][xcode-toolchain] add support for major Xcode version >= 10 · 3ddd210a

Alex Lorenz authored Jun 28, 2018

The regex that extracts the Xcode version should support major versions with two
digits.

rdar://41465184

llvm-svn: 335801

3ddd210a

[CGProfile] Fix unused variable warning. · 98f5475f
Michael J. Spencer authored Jun 28, 2018
```
llvm-svn: 335797
```
98f5475f

Add support for generating a call graph profile from Branch Frequency Info. · 5bf1ead3

Michael J. Spencer authored Jun 27, 2018

=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794

5bf1ead3

Jun 27, 2018

Move some code from PDBFileBuilder to MSFBuilder. · ee8010ab

Zachary Turner authored Jun 27, 2018

The code to emit the pieces of the MSF file were actually in
PDBFileBuilder.  Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.

llvm-svn: 335789

ee8010ab

[X86] Make folding table checking threadsafe · e214f046
Benjamin Kramer authored Jun 27, 2018
```
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
```
e214f046

[X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it. · 880e34ed

Craig Topper authored Jun 27, 2018

If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.

Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.

llvm-svn: 335787

880e34ed

[RISCV] Add machine function pass to merge base + offset · 9b65ffb0

Sameer AbuAsal authored Jun 27, 2018

Summary:
   In r333455 we added a peephole to fix the corner cases that result
   from separating base + offset lowering of global address.The
   peephole didn't handle some of the cases because it only has a basic
   block view instead of a function level view.

   This patch replaces that logic with a machine function pass. In
   addition to handling the original cases it handles uses of the global
   address across blocks in function and folding an offset from LW\SW
   instruction. This pass won't run for OptNone compilation, so there
   will be a negative impact overall vs the old approach at O0.

Reviewers: asb, apazos, mgrang

Reviewed By: asb

Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones

Differential Revision: https://reviews.llvm.org/D47857

llvm-svn: 335786

9b65ffb0

[llvm-objdump] Add -x --all-headers options · 8513cd4c

Fangrui Song authored Jun 27, 2018

Reviewers: paulsemel, echristo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48622

llvm-svn: 335785

8513cd4c

[InstCombine] add tests for vector-select-of-binops with 2 variables; NFC · 1ef49be8
Sanjay Patel authored Jun 27, 2018
```
llvm-svn: 335778
```
1ef49be8
Document the git config for Windows to do line-endings correctly. · bc0748e8
Paul Robinson authored Jun 27, 2018
```
Differential Revision: https://reviews.llvm.org/D48494

llvm-svn: 335775
```
bc0748e8
[DAGCombine] Disable TokenFactor simplifications when optnone. · 7c57ae57
Nirav Dave authored Jun 27, 2018
```
llvm-svn: 335773
```
7c57ae57

[ADT] drop_begin: use adl_begin/adl_end. NFC. · 2cb21999

Michael Kruse authored Jun 27, 2018

Summary:
The instantiation of the drop_begin function template usually fails because the functions begin() and end() do not exist. Only when using on a container from the std namespace (or `llvm::iterator_range`s of something derived from `std::iterator`), they are matched to std::begin() and std::end() due to Koenig-lookup.

Explicitly use llvm::adl_begin and llvm::adl_end to make drop_begin applicable to anything iterable (including C-style arrays).

A solution for general `llvm::iterator_range`s was already tried in r244620, but got reverted in r244621 due to MSVC not liking it.

Reviewers: dblaikie, grosbach, aaron.ballman, ruiu

Reviewed By: dblaikie, aaron.ballman

Subscribers: aaron.ballman, llvm-commits

Differential Revision: https://reviews.llvm.org/D48598

llvm-svn: 335772

2cb21999

[WebAssembly] Try fixing test/CodeGen/WebAssembly/vector_sdiv.ll · 5dc371a7
Fangrui Song authored Jun 27, 2018
```
llvm-svn: 335771
```
5dc371a7
[X86] Fix unmatched parenthesis in r335768 · b0d57a53
Fangrui Song authored Jun 27, 2018
```
llvm-svn: 335769
```
b0d57a53

[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the... · 6bea2c7f

Craig Topper authored Jun 27, 2018

[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result.

The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.

This should match the behavior of objdump from binutils.

llvm-svn: 335768

6bea2c7f

[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 · bdeb880d

Daniel Sanders authored Jun 27, 2018

Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767

bdeb880d

[ThinLTO] Fix test · 6835c284

Teresa Johnson authored Jun 27, 2018

Fix test changes added in r335760. Even though we are invoking llvm-lto2
in single threaded mode, the order of processing the modules in the
backend is apparently not deterministic. Handle the expected debug
messages in any order. (The determinism would be good to fix, but not
related to this change.)

This also undoes the change I made in r335764 to help debug this.

llvm-svn: 335766

6835c284

[ThinLTO] Modify test to help diagnose bot failures · 6535b356

Teresa Johnson authored Jun 27, 2018

I am getting bot failures from r335760 that are difficult to diagnose
since the stderr is getting redirected to FileCheck. Save and dump the
debug output to stderr to help debug the issue.

llvm-svn: 335764

6535b356

[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros · d052de85

Sanjay Patel authored Jun 27, 2018

As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc 
can produce -0.0 where the original code does not:

#include <stdio.h>
  
int main(int argc) {
  float x;
  x = -0.8 * argc;
  printf("%f\n", (float)((int)x));
  return 0;
}

$ clang -O0 -mavx fp.c ; ./a.out 
0.000000
$ clang -O1 -mavx fp.c ; ./a.out 
-0.000000

Ideally, we'd use IR/node flags to predicate the transform, but the IR parser 
doesn't currently allow fast-math-flags on the cast instructions. So for now, 
just use the function attribute that corresponds to clang's "-fno-signed-zeros" 
option.

Differential Revision: https://reviews.llvm.org/D48085

llvm-svn: 335761

d052de85

[ThinLTO] Print names in function import debug messages when available · 7e7b13d0

Teresa Johnson authored Jun 27, 2018

Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48612

llvm-svn: 335760

7e7b13d0

[Object] Allow iterating over an IRObjectFile's modules · 2b1327b9

Justin Bogner authored Jun 27, 2018

If you've already loaded an IRObjectFile and need access to the
Modules themselves you shouldn't have to reparse a byte stream to do
it. Adds an accessor for the modules in IRObjectFile.

llvm-svn: 335759

2b1327b9

[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across · f472f615

Jessica Paquette authored Jun 27, 2018

It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758

f472f615

[InstCombine] add more tests for shuffle with different binops; NFC · 7e45aebe
Sanjay Patel authored Jun 27, 2018
```
llvm-svn: 335756
```
7e45aebe

[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position · 812fcb35

Craig Topper authored Jun 27, 2018

If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754

812fcb35

[X86] Add test cases for D48606. · 069628b4
Craig Topper authored Jun 27, 2018
```
llvm-svn: 335753
```
069628b4

[AliasSet] Fix UnknownInstructions printing · 555e41bb

Jakub Kuderski authored Jun 27, 2018

Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.

This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.

Reviewers: asbirlea, chandlerc, sanjoy, grosser

Reviewed By: asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48609

llvm-svn: 335751

555e41bb

[dsymutil] Move abstractions into separate files (NFC) · c0fb4b6b

Jonas Devlieghere authored Jun 27, 2018

This patch splits off some abstractions used by dsymutil's dwarf linker
and moves them into separate header and implementation files. This
almost halves the number of LOC in DwarfLinker.cpp and makes it a lot
easier to understand what functionality lives where.

Differential revision: https://reviews.llvm.org/D48647

llvm-svn: 335749

c0fb4b6b

[llvm-mca] Register listeners with stages; remove Pipeline dependency from Stage. · 7b5a36ec

Matt Davis authored Jun 27, 2018

Summary:
This patch removes a few callbacks from Pipeline.  It comes at the cost of
registering Listeners with all Stages.  Not all stages need listeners or issue
callbacks, this registration is a bit redundant.  However, as we build-out the
API, this redundancy can disappear.

The main purpose here is to move callback code from the Pipeline and into the
stages that actually issue those callbacks. This removes the back-pointer to
the Pipeline that was put into a few Stage subclasses.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb, courbet

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48576

llvm-svn: 335748

7b5a36ec

[X86][SSE] Add missing AVX512 rotation tests · 8a02b253

Simon Pilgrim authored Jun 27, 2018

Increase coverage to make sure we're not doing anything stupid without AVX512BW

llvm-svn: 335746

8a02b253

[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that... · 31cbe75b

Craig Topper authored Jun 27, 2018

[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.

I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744

31cbe75b

[AMDGPU] Convert rcp to rcp_iflag · 1a1687f1

Stanislav Mekhanoshin authored Jun 27, 2018

If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742

1a1687f1

[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix · 31632715
Luke Geeson authored Jun 27, 2018
```
llvm-svn: 335737
```
31632715

[AArch64] Add custom lowering for v4i8 trunc store · cadcfed7

Adhemerval Zanella authored Jun 27, 2018

This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735

cadcfed7

[NEON] Support vldNq intrinsics in AArch32 (LLVM part) · 7231598f

Ivan A. Kosarev authored Jun 27, 2018

This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733

7231598f

[ValueLattice] Return false if value range did not change in mergeIn. · f681413e
Florian Hahn authored Jun 27, 2018
```
llvm-svn: 335729
```
f681413e
[DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion · d3e583a5
Simon Pilgrim authored Jun 27, 2018
```
For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs

llvm-svn: 335727
```
d3e583a5
Build TaskQueueTest in threads=on builds, fixes regression from r335608. · 0d63dbbc
Nico Weber authored Jun 27, 2018
```
llvm-svn: 335724
```
0d63dbbc

[llvm-mca] Avoid calling method update() on instructions that are already in... · eb1bef60

Andrea Di Biagio authored Jun 27, 2018

[llvm-mca] Avoid calling method update() on instructions that are already in the IS_READY state. NFCI

When promoting instructions from the wait queue to the ready queue, we should
check if an instruction has already reached the IS_READY state before
calling method update().

llvm-svn: 335722

eb1bef60