Commits · cac8ce06dd446dff8b8256bf19c0f3f46dbdf361 · Roger Ferrer / llvm-epi

Feb 16, 2017

GlobalISel: legalize va_arg on AArch64. · 9136617a

Tim Northover authored Feb 15, 2017

Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.

llvm-svn: 295255

9136617a

GlobalISel: support translating va_arg · 4a652227

Tim Northover authored Feb 15, 2017

Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.

llvm-svn: 295254

4a652227

Implement intrinsic mangling for literal struct types. · 3c1432fe

Daniel Berlin authored Feb 15, 2017

Fixes PR 31921

Summary:
Predicateinfo requires an ugly workaround to try to avoid literal
struct types due to the intrinsic mangling not being implemented.
This workaround actually does not work in all cases (you can hit the
assert by bootstrapping with -print-predicateinfo), and can't be made
to work without DFS'ing the type (IE copying getMangledStr and using a
version that detects if it would crash).

Rather than do that, i just implemented the mangling.  It seems
simple, since they are unified structurally.

Looking at the overloaded-mangling testcase we have, it actually turns
out the gc intrinsics will *also* crash if you try to use a literal
struct.  Thus, the testcase added fails before this patch, and works
after, without needing to resort to predicateinfo.

Reviewers: chandlerc, davide

Subscribers: llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29925

llvm-svn: 295253

3c1432fe

Feb 15, 2017

AMDGPU: Remove dead node definitions · 824de226
Matt Arsenault authored Feb 15, 2017
```
llvm-svn: 295247
```
824de226
Fix typos · 900b21c3
Matt Arsenault authored Feb 15, 2017
```
llvm-svn: 295246
```
900b21c3
AMDGPU: Consolidate sendmsg/sendmsghalt handling and tests · a78ca62c
Matt Arsenault authored Feb 15, 2017
```
llvm-svn: 295244
```
a78ca62c
[Support] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). · 454d0cea
Eugene Zelenko authored Feb 15, 2017
```
llvm-svn: 295243
```
454d0cea
DAG: Do not scalarize fsub if fneg is legal · 5de8dc9c
Matt Arsenault authored Feb 15, 2017
```
Tests will be included with future commit.

llvm-svn: 295242
```
5de8dc9c
Re-apply r295110 and r295144 with a fix for the ASan issue. · 50cbd7cc
Peter Collingbourne authored Feb 15, 2017
```
llvm-svn: 295241
```
50cbd7cc
AMDGPU: Replace assert with report_fatal_error · d122abea
Matt Arsenault authored Feb 15, 2017
```
Also use a more refined condition.

llvm-svn: 295239
```
d122abea

[GlobalObject] Fix setSection("") · 5e1e5918

Keno Fischer authored Feb 15, 2017

Summary:
In rL291613, the section name was interned in LLVMContext. However,
this broke the ability to remove the section from a GlobalObject,
because it tried to intern empty strings, which is not allowed.
Fix that and add an appropriate regression test.

Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D29795

llvm-svn: 295238

5e1e5918

[InstCombine] improve formatting; NFC · 845ea963
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295237
```
845ea963

AssumptionCache: Disable the verifier by default, move it behind a hidden... · 9421c2dc

Peter Collingbourne authored Feb 15, 2017

AssumptionCache: Disable the verifier by default, move it behind a hidden cl::opt and verify from releaseMemory().

This is a short term solution to the problem that many passes currently fail
to update the assumption cache. In the long term the verifier should not
be controllable with a flag. We should either fix all passes to correctly
update the assumption cache and enable the verifier unconditionally or
somehow arrange for the assumption list to be updated automatically by passes.

Differential Revision: https://reviews.llvm.org/D30003

llvm-svn: 295236

9421c2dc

[X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing. · 5b4c30fb

Simon Pilgrim authored Feb 15, 2017

Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow.

llvm-svn: 295235

5b4c30fb

AddressSanitizer: don't track swifterror memory addresses · 8d61e003

Arnold Schwaighofer authored Feb 15, 2017

They are register promoted by ISel and so it makes no sense to treat them as
memory.

Inserting calls to the thread sanitizer would also generate invalid IR.

You would hit:

"swifterror value can only be loaded and stored from, or as a swifterror
argument!"

llvm-svn: 295230

8d61e003

[AArch64] Make am_ldrlit an iPTR - not OtherVT - operand. NFC-ish. · f8acf568

Ahmed Bougacha authored Feb 15, 2017

am_ldrlit diverged from am_brcond in r207105, but kept the OtherVT
operand type.  It made sense for branch targets, as those are
represented as MVT::Other in SDAG.  But loads operate on pointers.

This shouldn't have an observable effect on any in-tree code, but helps
make the patterns consistent for external users.

llvm-svn: 295229

f8acf568

[OptDiag] Pass const Values/Types to Argument. NFC. · 36026006
Ahmed Bougacha authored Feb 15, 2017
```
llvm-svn: 295228
```
36026006
[IR] Accept 'const Type &' in the Type operator<<. NFC. · f9e5a1dd
Ahmed Bougacha authored Feb 15, 2017
```
Type::print is const; there's no reason for the operator not to be.

llvm-svn: 295227
```
f9e5a1dd

[LTO] Add ability to emit assembly to new LTO API · f454b9ea

Tobias Edler von Koch authored Feb 15, 2017

Summary:
Add a field to LTO::Config, CGFileType, to select the file type to emit (object
or assembly). This is useful for testing and to implement -save-temps.

Reviewers: tejohnson, mehdi_amini, pcc

Reviewed By: mehdi_amini

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D29475

llvm-svn: 295226

f454b9ea

Codegen: Make chains from trellis-shaped CFGs · 7fbec9bd

Kyle Butt authored Feb 15, 2017

Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

llvm-svn: 295223

7fbec9bd

include function name in dot filename · 538d6668
Xinliang David Li authored Feb 15, 2017
```
Differential Revision: http://reviews.llvm.org/D29975

llvm-svn: 295220
```
538d6668

ThreadSanitizer: don't track swifterror memory addresses · 8eb1a485

Arnold Schwaighofer authored Feb 15, 2017

They are register promoted by ISel and so it makes no sense to treat them as
memory.

Inserting calls to the thread sanitizer would also generate invalid IR.

You would hit:

"swifterror value can only be loaded and stored from, or as a swifterror
argument!"

llvm-svn: 295215

8eb1a485

[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source · ba80db39

Michael Kuperstein authored Feb 15, 2017

We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

llvm-svn: 295213

ba80db39

Expose getBaseDiscriminatorFromDiscriminator,... · 726da628

Dehao Chen authored Feb 15, 2017

Expose getBaseDiscriminatorFromDiscriminator, getDuplicationFactorFromDiscriminator and getCopyIdentifierFromDiscriminator API so that downstream tools can use them to get the correct encoding.

Summary: Discriminators are now encoded with rich information. This patch exposes the encoding API to downstream tools.

Reviewers: davidxl, hfinkel

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29852

llvm-svn: 295210

726da628

[Inline] add tests to show attribute information loss; NFC · 05621864
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295209
```
05621864
[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining · da25d5c7
Simon Pilgrim authored Feb 15, 2017
```
Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this.

llvm-svn: 295208
```
da25d5c7

[AMDGPU] Revert failed scheduling · 582a5237

Stanislav Mekhanoshin authored Feb 15, 2017

This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206

582a5237

Revert "[JumpThreading] Thread through guards" · 94c8d497

Anna Thomas authored Feb 15, 2017

This reverts commit r294617.

We fail on an assert while trying to get a condition from an
unconditional branch.

llvm-svn: 295200

94c8d497

[X86] Regenerate scalar stack reload test · d811bdd6
Simon Pilgrim authored Feb 15, 2017
```
llvm-svn: 295195
```
d811bdd6
Fix unittest for buildbot with mips host (32bit big endian) from r295174 · 4b21d022
David Bozier authored Feb 15, 2017
```
llvm-svn: 295188
```
4b21d022
[InlineFunction] use getFunction(); NFC · 288f075f
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295185
```
288f075f
Fix spelling mistake - paramater -> parameter. NFCI. · 1746e215
Simon Pilgrim authored Feb 15, 2017
```
llvm-svn: 295182
```
1746e215
[InlineFunction] use getCaller(); NFCI · 32d753ca
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295181
```
32d753ca
[InlineFunction] use range-for loop; NFCI · ada717e2
Sanjay Patel authored Feb 15, 2017
```
llvm-svn: 295179
```
ada717e2
[X86] Regenerate i64 ext-load on 32-bit target tests · a0e56d2d
Simon Pilgrim authored Feb 15, 2017
```
llvm-svn: 295177
```
a0e56d2d

Attempt to fix buildbots after commit of r295173. · 5c8e5f37

David Bozier authored Feb 15, 2017

Unit tests needed to check on the endianness of the host platform. (Test was failing for big endian hosts).

llvm-svn: 295174

5c8e5f37

Fix incorrect formatting of DataRefImpl members in operator<< function · 4ab9a06f

David Bozier authored Feb 15, 2017

Changed format specifiers to use format macro constant for pointer type. 
Moved width part of format specifier in the correct place for formatting members a and b.

Added a unit test to confirm the output.

Differential Revision: https://reviews.llvm.org/D28957

llvm-svn: 295173

4ab9a06f

[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs · 0f0e5bd3

Simon Pilgrim authored Feb 15, 2017

Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets

llvm-svn: 295169

0f0e5bd3

[LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el · ec657929

Sagar Thakur authored Feb 15, 2017

Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697

llvm-svn: 295164

ec657929

Revert r295110 and r295144. · eef9b033

Daniel Jasper authored Feb 15, 2017

This fails under ASAN:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio

llvm-svn: 295162

eef9b033