Commits · 4c77a696ae4d42d791b7443ce387d9f42197e10d · Roger Ferrer / llvm-epi

Jul 17, 2019

Update email address. · 4c77a696
Qiu Chaofan authored Jul 17, 2019
```
llvm-svn: 366291
```
4c77a696
gn build: Merge r366265 · 67cf3d61
Nico Weber authored Jul 17, 2019
```
llvm-svn: 366289
```
67cf3d61
gn build: Merge r366216 · 420f3f64
Nico Weber authored Jul 17, 2019
```
llvm-svn: 366288
```
420f3f64
[AMDGPU] Autogenerate register asm names · e5012ab3
Stanislav Mekhanoshin authored Jul 16, 2019
```
Differential Revision: https://reviews.llvm.org/D64839

llvm-svn: 366283
```
e5012ab3
ARM: Fix missing immarg for space intrinsic · 1bd9c654
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366280
```
1bd9c654

GlobalISel: Add overload of handleAssignments with CCState · 1c3f4ec7

Matt Arsenault authored Jul 16, 2019

AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.

The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.

llvm-svn: 366279

1c3f4ec7

[TableGen] Generate offsets into a flat array for getOperandType · 418516c7

Justin Bogner authored Jul 16, 2019

Rather than an array of std::initializer_list, generate a table of
offsets and a flat array of the operands for getOperandType. This is a
bit more efficient on platforms that don't manage to get the array of
inintializer_lists initialized at link time (I'm looking at you
macOS). It's also quite quite a bit faster to compile.

llvm-svn: 366278

418516c7

[WebAssembly] Compile all TLS on Emscripten as local-exec · 0a8d4df7

Guanzhong Chen authored Jul 16, 2019

Summary:
Currently, on Emscripten, dynamic linking is not supported with threads.
This means that if thread-local storage is used, it must be used in a
statically-linked executable. Hence, local-exec is the only possible model.

This diff compiles all TLS variables to use local-exec on Emscripten as a
temporary measure until dynamic linking is supported with threads.

The goal for this is to allow C++ types with constructors to be thread-local.

Currently, when `clang` compiles a `thread_local` variable with a constructor,
it generates `__tls_guard` variable:

    @__tls_guard = internal thread_local global i8 0, align 1

As no TLS model is specified, this is treated as general-dynamic, which we do
not support (and cannot support without implementing dynamic linking support
with threads in Emscripten). As a result, any C++ constructor in `thread_local`
variables would not compile.

By compiling all `thread_local` as local-exec, `__tls_guard` will compile and
we can support C++ constructors with TLS without implementing dynamic linking
with threads.

Depends on D64537

Reviewers: tlively, aheejin, sbc100

Reviewed By: aheejin

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64776

llvm-svn: 366275

0a8d4df7

[TableGen] Add "getOperandType" to get operand types from opcode/opidx · fe66fdb8

Justin Bogner authored Jul 16, 2019

The InstrInfoEmitter outputs an enum called "OperandType" which gives
numerical IDs to each operand type. This patch makes use of this enum
to define a function called "getOperandType", which allows looking up
the type of an operand given its opcode and operand index.

Patch by Nicolas Guillemot. Thanks!

Differential Revision: https://reviews.llvm.org/D63320

llvm-svn: 366274

fe66fdb8

[WebAssembly] Implement thread-local storage (local-exec model) · 42bba4b8

Guanzhong Chen authored Jul 16, 2019

Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.

`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.

`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.

`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.

To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.

The expected usage is to run something like the following upon thread startup:

__wasm_init_tls(malloc(__builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, kripken, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64537

llvm-svn: 366272

42bba4b8

AMDGPU: Partially revert r366250 · 21f2858d

Matt Arsenault authored Jul 16, 2019

GCCBuiltin doesn't work for these, because they have a mangled type
(although they arguably should not).

llvm-svn: 366271

21f2858d

Jul 16, 2019

[ORC][docs] Fix an RST error: the code-block directive needs a newline after it. · c23619b0
Lang Hames authored Jul 16, 2019
```
llvm-svn: 366270
```
c23619b0
[ORC][docs] Trim ORCv1 to ORCv2 transition section, add a how-to section. · 607cd44b
Lang Hames authored Jul 16, 2019
```
llvm-svn: 366269
```
607cd44b

[x86] use more phadd for reductions · d746a210

Sanjay Patel authored Jul 16, 2019

This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023

There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.

Differential Revision: https://reviews.llvm.org/D64760

llvm-svn: 366268

d746a210

DWARF: Skip zero column for inline call sites · 40580d36

David Blaikie authored Jul 16, 2019

D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for
inline sites. However, that change wasn't aware of "-gno-column-info".
To avoid adding column info when "-gno-column-info" is used, now
DW_AT_call_column is only added when we have non-zero column (when
"-gno-column-info" is used, column will be zero).

Patch by Wenlei He!

Differential Revision: https://reviews.llvm.org/D64784

llvm-svn: 366264

40580d36

AMDGPU/GlobalISel: Select G_ASHR · f8c82844
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366257
```
f8c82844
AMDGPU/GlobalISel: Select G_LSHR · e5b28b98
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366256
```
e5b28b98

[PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin · 65e34a31

Jinsong Ji authored Jul 16, 2019

Summary:
This is exposed by our internal testing.
The reduced testcase will assert with "Impossible reg-to-reg copy"

We can't use COPY to do 32-bit to 64-bit conversion.

Reviewers: kbarton, hfinkel, nemanjai

Reviewed By: hfinkel

Subscribers: hiraditya, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64499

llvm-svn: 366255

65e34a31

AMDGPU/GlobalISel: Select G_SHL · 1b69fd27

Matt Arsenault authored Jul 16, 2019

I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.

The 16-bit versions don't work yet.

llvm-svn: 366254

1b69fd27

[AMDGPU] Change register type for v32 vectors · 6e0fa292

Stanislav Mekhanoshin authored Jul 16, 2019

When it is AReg_1024 this results in unnecessary copying into
AGPRs of a 32 element vectors even though they are not intended
for an mfma instruction.

Differential Revision: https://reviews.llvm.org/D64815

llvm-svn: 366252

6e0fa292

Fix -Wreturn-type warning. NFC. · ccf22ef9
Michael Liao authored Jul 16, 2019
```
llvm-svn: 366251
```
ccf22ef9
AMDGPU: Fix some missing GCCBuiltin declarations · afdf6b3c
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366250
```
afdf6b3c
AMDGPU/GlobalISel: Fix selection of private stores · 2d104077
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366249
```
2d104077
AMDGPU/GlobalISel: Select private loads · 7161fb0b
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366248
```
7161fb0b
AMDGPU/GlobalISel: Select flat stores · dad1f892
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366246
```
dad1f892

AMDGPU: Add register classes to flat store patterns · 7eb1902c

Matt Arsenault authored Jul 16, 2019

For some reason GlobalISelEmitter needs register classes to import
these, although it works for the load patterns.

llvm-svn: 366242

7eb1902c

[IndVars] Speculative fix for an assertion failure seen in bots · 6e1c3bb1

Philip Reames authored Jul 16, 2019

I don't have an IR sample which is actually failing, but the issue described in the comment is theoretically possible, and should be guarded against even if there's a different root cause for the bot failures.

llvm-svn: 366241

6e1c3bb1

AMDGPU: Replace store PatFrags · 8f8d07e9
Matt Arsenault authored Jul 16, 2019
```
Convert the easy cases to formats understood for GlobalISel.

llvm-svn: 366240
```
8f8d07e9

AMDGPU/GlobalISel: Select flat loads · 35c96598

Matt Arsenault authored Jul 16, 2019

Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237

35c96598

Teach `llvm-pdbutil pretty -native` about `-injected-sources` · d100b5dd

Nico Weber authored Jul 16, 2019

`pretty -native -injected-sources -injected-source-content` works with
this patch, and produces identical output to the dia version.

Differential Revision: https://reviews.llvm.org/D64428

llvm-svn: 366236

d100b5dd

[AMDGPU] Optimize atomic max/min · 17060f0a

Jay Foad authored Jul 16, 2019

Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64328

llvm-svn: 366235

17060f0a

AMDGPU: Redefine load PatFrags · c6fd5abe

Matt Arsenault authored Jul 16, 2019

Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.

llvm-svn: 366234

c6fd5abe

AMDGPU: Fix missing immarg for mfma intrinsics · c65a9db4
Matt Arsenault authored Jul 16, 2019
```
llvm-svn: 366230
```
c65a9db4

[AMDGPU] Add the adjusted FP as a livein register. · b3f967d4

Michael Liao authored Jul 16, 2019

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64145

llvm-svn: 366223

b3f967d4

[Strict FP] Allow more relaxed scheduling · 450c62e3

Ulrich Weigand authored Jul 16, 2019

Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling.  Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.

Differential Revision: https://reviews.llvm.org/D64412

Reviewed By: cameron.mcinally

llvm-svn: 366222

450c62e3

Revert [tools] [llvm-nm] Default to reading from stdin not a.out · 2eacf698
Alex Brachet authored Jul 16, 2019
```
This reverts r365889 (git commit 60c81354)

llvm-svn: 366219
```
2eacf698
Add missing test for r366215 · 88ed076e
Amara Emerson authored Jul 16, 2019
```
llvm-svn: 366218
```
88ed076e

[Remarks] Simplify and refactor the RemarkParser interface · 94bad22c

Francis Visoiu Mistrih authored Jul 16, 2019

Before, everything was based on some kind of type erased parser
implementation which container a lot of boilerplate code when multiple
formats were to be supported.

This simplifies it by:

* the remark now owns its arguments
* *always* returning an error from the implementation side
* working around the way the YAML parser reports errors: catch them through
callbacks and re-insert them in a proper llvm::Error
* add a CParser wrapper that is used when implementing the C API to
avoid cluttering the C++ API with useless state
* LLVMRemarkParserGetNext now returns an object that needs to be
released to avoid leaking resources
* add a new API to dispose of a remark entry: LLVMRemarkEntryDispose

llvm-svn: 366217

94bad22c

[Remarks][NFC] Combine ParserFormat and SerializerFormat · cc909812
Francis Visoiu Mistrih authored Jul 16, 2019
```
It's useless to have both.

llvm-svn: 366216
```
cc909812
[ADCE] Fix non-deterministic behaviour due to iterating over a pointer set. · 228a7b4f
Amara Emerson authored Jul 16, 2019
```
Original patch by Yann Laigle-Chapuy

Differential Revision: https://reviews.llvm.org/D64785

llvm-svn: 366215
```
228a7b4f