Commits · 45bc148093dcf0ccac5d0a42d6c4c4091eb0cf04 · Lorenzo Albano / LLVM bpEVL

Aug 08, 2018

AMDGPU: Fix enabling denormals by default on pre-VI targets · 45bc1480

Matt Arsenault authored Aug 08, 2018

Fast FMAF is not a sufficient condition to enable denormals.
Before VI, enabling denormals caused F32 instructions to
run at F64 speeds.

llvm-svn: 339278

45bc1480

[DebugInfo][OpenCL] Address post-commit review for r338299 · 58df0e4d

Scott Linder authored Aug 08, 2018

NFC refactor of code to generate debug info for OpenCL 2.X blocks.

Differential Revision: https://reviews.llvm.org/D50099

llvm-svn: 339265

58df0e4d

Aug 07, 2018
- Fix one hard coded value I missed in r339185. · be1166a4
  Douglas Yung authored Aug 07, 2018
```
llvm-svn: 339188
```
  be1166a4
- Make test more robust by not checking hard coded debug info values, but... · dca675a0
  Douglas Yung authored Aug 07, 2018
```
Make test more robust by not checking hard coded debug info values, but instead check the relationships between them.

llvm-svn: 339185
```
  dca675a0
- [OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return · f8b3df4d
  Scott Linder authored Aug 07, 2018
```
Always emit alloca in entry block for enqueue_kernel builtin.

Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.

llvm-svn: 339150
```
  f8b3df4d
- AMDGPU: Add builtin for s_dcache_wb · 31c895ec
  Matt Arsenault authored Aug 07, 2018
```
llvm-svn: 339110
```
  31c895ec
- AMDGPU: Add builtin for s_dcache_inv_vol · 24f39247
  Matt Arsenault authored Aug 07, 2018
```
llvm-svn: 339109
```
  24f39247
Aug 03, 2018

Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin" · c7d3d34b
Vlad Tsyrklevich authored Aug 03, 2018
```
This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast.

llvm-svn: 338904
```
c7d3d34b

[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin · 91f57846

Scott Linder authored Aug 03, 2018

Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.

Differential Revision: https://reviews.llvm.org/D50104

llvm-svn: 338899

91f57846

Aug 02, 2018

AMDGPU: Fix missing declaration of queue ptr builtin · e3d81572
Matt Arsenault authored Aug 02, 2018
```
llvm-svn: 338754
```
e3d81572

Try to make builtin address space declarations not useless · c65f966d

Matt Arsenault authored Aug 02, 2018

The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.

This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).

The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.

Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).

This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".

llvm-svn: 338707

c65f966d

Aug 01, 2018
- AMDGPU: Add clamp bit to dot builtins · 9057546c
  Konstantin Zhuravlyov authored Aug 01, 2018
```
Differential Revision: https://reviews.llvm.org/D50011

llvm-svn: 338471
```
  9057546c
Jul 30, 2018

[DebugInfo][OpenCL] Generate correct block literal debug info for OpenCL · 2b5cf041

Scott Linder authored Jul 30, 2018

OpenCL block literal structs have different fields which are now correctly
identified in the debug info.

Differential Revision: https://reviews.llvm.org/D49930

llvm-svn: 338299

2b5cf041

Jul 13, 2018

CodeGen: specify alignment + inbounds for automatic variable initialization · 9aab85a6

JF Bastien authored Jul 13, 2018

Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds.

Subscribers: dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D49209

llvm-svn: 337041

9aab85a6

May 21, 2018

[AMDGPU] fixes for lds f32 builtins · 1b14a3ad

Daniil Fukalov authored May 21, 2018

1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
   needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
   tree)
3. builtins renamed as requested

Differential Revision: https://reviews.llvm.org/D43281

llvm-svn: 332848

1b14a3ad

May 16, 2018

[OpenCL] make test independent of optimizer · cda77b30

Sanjay Patel authored May 16, 2018

There shouldn't be any tests that run the entire optimizer here,
but the last test in this file is definitely going to break with 
a change in LLVM IR canonicalization. Change that part to check
the unoptimized IR because that's the real intent of this file.

llvm-svn: 332473

cda77b30

May 09, 2018

[OpenCL] Fix typos in emitted enqueue kernel function names · 3cab24aa

Yaxun Liu authored May 09, 2018

Two typos: 
vaarg => vararg
get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple

Differential Revision: https://reviews.llvm.org/D46601

llvm-svn: 331895

3cab24aa

[OpenCL] Add constant address space to __func__ in AST. · 59055b94

Anastasia Stulova authored May 09, 2018

Added string literal helper function to obtain the type
attributed by a constant address space.

Also fixed predefind __func__ expr to use the helper
to constract the string literal correctly.

Differential Revision: https://reviews.llvm.org/D46049

llvm-svn: 331877

59055b94

May 01, 2018

Add Microsoft Mangling for OpenCL Half Type · 14c10853

Erich Keane authored May 01, 2018

Half-type mangling is accomplished following the method introduced by Erich 
Keane for mangling _Float16. Updated the half.cl LIT test to cover this 
particular case.

Patch By: vbridgers

Differential Revision: https://reviews.llvm.org/D46131

llvm-svn: 331263

14c10853

Apr 30, 2018
- AMDGPU: Add Vega12 and Vega20 · d2da3c20
  Matt Arsenault authored Apr 30, 2018
```
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331216
```
  d2da3c20
Apr 27, 2018

[OpenCL] Add separate read_only and write_only pipe IR types · 4700faa2

Sven van Haastregt authored Apr 27, 2018

SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V.  Other backends
may also find this useful.

These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.

This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:

 * __get_pipe_num_packets_ro(...)
 * __get_pipe_num_packets_wo(...)
 * __get_pipe_max_packets_ro(...)
 * __get_pipe_max_packets_wo(...)

These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.

Patch by Stuart Brady.

Differential Revision: https://reviews.llvm.org/D46015

llvm-svn: 331026

4700faa2

Apr 20, 2018

Fix some tests that were failing on Windows · a417362c
Hans Wennborg authored Apr 20, 2018
```
llvm-svn: 330441
```
a417362c

[OpenCL] Add 'denorms-are-zero' function attribute · 3858e26f

Alexey Sotkin authored Apr 20, 2018

Summary:
Generate attribute 'denorms-are-zero'='true' if '-cl-denorms-are-zero'
compile option was specified and 'denorms-are-zero'='false' otherwise.

Patch by krisb

Reviewers: Anastasia, yaxunl

Reviewed By:  yaxunl 

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D45808

llvm-svn: 330404

3858e26f

Apr 06, 2018

Fix typos in clang · 2a8c18d9

Alexander Kornienko authored Apr 06, 2018

Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399

2a8c18d9

Mar 27, 2018
- AMDGPU: Update datalayout for stack alignment · b130ea56
  Matt Arsenault authored Mar 27, 2018
```
llvm-svn: 328657
```
  b130ea56
Mar 23, 2018

[AMDGPU] Fix codegen for inline assembly · ac1263cd

Yaxun Liu authored Mar 23, 2018

Need to override convertConstraint to recognise amdgpu specific register names.

Differential Revision: https://reviews.llvm.org/D44533

llvm-svn: 328359

ac1263cd

[AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG) · 68e11a6e

Tony Tye authored Mar 23, 2018

Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44696

llvm-svn: 328350

68e11a6e

[AMDGPU] Remove use of OpenCL triple environment and replace with function... · 1a3f3a2d

Tony Tye authored Mar 23, 2018

[AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU (CLANG)


- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use a function attribute to communicate to the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D43735

llvm-svn: 328347

1a3f3a2d

Mar 15, 2018
- Recommit r326946 after reducing CallArgList memory footprint · 5b330e8d
  Yaxun Liu authored Mar 15, 2018
```
llvm-svn: 327634
```
  5b330e8d
Mar 10, 2018
- Revert r326946. It caused stack overflows by significantly increasing the size of a CallArgList. · 007cb6df
  Richard Smith authored Mar 10, 2018
```
llvm-svn: 327195
```
  007cb6df
Mar 07, 2018

CodeGen: Fix address space of indirect function argument · 06dd8114

Yaxun Liu authored Mar 07, 2018

The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in default
addr space, not the alloca address space.

Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.

CallArg is refactored to facilitate this fix.

Differential Revision: https://reviews.llvm.org/D34367

llvm-svn: 326946

06dd8114

[OpenCL] Remove block invoke function from emitted block literal struct · cb35e9fa

Yaxun Liu authored Mar 07, 2018

OpenCL runtime tracks the invoke function emitted for
any block expression. Due to restrictions on blocks in
OpenCL (v2.0 s6.12.5), it is always possible to know the
block invoke function when emitting call of block expression
or __enqueue_kernel builtin functions. Since __enqueu_kernel
already has an argument for the invoke function, it is redundant
to have invoke function member in the llvm block literal structure.

This patch removes invoke function from the llvm block literal
structure. It also removes the bitcast of block invoke function
to the generic block literal type which is useless for OpenCL.

This will save some space for the kernel argument, and also
eliminate some store instructions.

Differential Revision: https://reviews.llvm.org/D43783

llvm-svn: 326937

cb35e9fa

Feb 23, 2018

Bring r325915 back. · 922f2aa9

Rafael Espindola authored Feb 23, 2018

The tests that failed on a windows host have been fixed.

Original message:

Start setting dso_local for COFF.

With this there are still some GVs where we don't set dso_local
because setGVProperties is never called. I intend to fix that in
followup commits. This is just the bare minimum to teach
shouldAssumeDSOLocal what it should do for COFF.

llvm-svn: 325940

922f2aa9

Feb 22, 2018

[OpenCL] Add '-cl-uniform-work-group-size' compile option · 20f65928

Alexey Sotkin authored Feb 22, 2018

Summary:
OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
which requires that the global work-size be a multiple of the work-group
size specified to clEnqueueNDRangeKernel and allows optimizations that
are made possible by this restriction.

The patch introduces the support of this option.

To keep information about whether an OpenCL kernel has uniform work
group size or not, clang generates 'uniform-work-group-size' function
attribute for every kernel:
- "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
- "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
 '-cl-uniform-work-group-size' option was specified,
- "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
 '-cl-uniform-work-group-size' options was specified.

If the function is not an OpenCL kernel, 'uniform-work-group-size'
attribute isn't generated.

Patch by: krisb

Reviewers: yaxunl, Anastasia, b-sumner

Reviewed By: yaxunl, Anastasia

Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits

Differential Revision: https://reviews.llvm.org/D43570

llvm-svn: 325771

20f65928

Feb 15, 2018

Clean up AMDGCN tests · f8ad59d9

Yaxun Liu authored Feb 15, 2018

Differential Revision: https://reviews.llvm.org/D43340

llvm-svn: 325279

f8ad59d9

[OpenCL] Fix __enqueue_block for block with captures · fa13d015

Yaxun Liu authored Feb 15, 2018

The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.

The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.

The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.

Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.

Differential Revision: https://reviews.llvm.org/D43240

llvm-svn: 325264

fa13d015

Feb 13, 2018
- [AMDGPU] Change constant addr space to 4 · 651bd73c
  Yaxun Liu authored Feb 13, 2018
```
Differential Revision: https://reviews.llvm.org/D43171

llvm-svn: 325031
```
  651bd73c
Feb 09, 2018
- AMDGPU: Update for datalayout change · e7da136a
  Matt Arsenault authored Feb 09, 2018
```
llvm-svn: 324748
```
  e7da136a
Feb 08, 2018
- Fix crash on array initializer with non-0 alloca addrspace · 935574a4
  Matt Arsenault authored Feb 08, 2018
```
llvm-svn: 324641
```
  935574a4
Feb 04, 2018
- Recommit rL323890: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions · da2a0558
  Daniil Fukalov authored Feb 04, 2018
```
Fixed asserts in tests.

llvm-svn: 324201
```
  da2a0558