Commits · 2c4d52467a25aed5ec9ed868fe8b74a1a67d1535 · Lorenzo Albano / LLVM bpEVL

Jun 05, 2022
- [Transforms/Utils] Use predecessors (NFC) · 2c4d5246
  Kazu Hirata authored Jun 05, 2022
  
  2c4d5246
- Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32)... · 400fef08
  Christian Sigg authored Jun 04, 2022
```
Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."

This change rolls bcfc0a90 forward (i.e., reverting 369ce54b) with fixed CMakeLists.txt.
```
  400fef08
- Remove unneeded cl::ZeroOrMore for cl::list options · d0d1c416
  Fangrui Song authored Jun 04, 2022
  
  d0d1c416
- Use llvm::less_second (NFC) · e0039b8d
  Kazu Hirata authored Jun 04, 2022
  
  e0039b8d
- [Target] Use MachineBasicBlock::erase (NFC) · 9a8e65de
  Kazu Hirata authored Jun 04, 2022
  
  9a8e65de
- [CodeGen] Use a range-based for loop (NFC) · bcf4fa45
  Kazu Hirata authored Jun 04, 2022
  
  bcf4fa45
- Use static_cast from SmallString to std::string (NFC) · 8cc9fa6f
  Kazu Hirata authored Jun 04, 2022
  
  8cc9fa6f
- Use llvm::less_first (NFC) · 4969a692
  Kazu Hirata authored Jun 04, 2022
  
  4969a692
- [CodeGen] Use StringRef::contains (NFC) · 32ce076d
  Kazu Hirata authored Jun 04, 2022
  
  32ce076d
- [Transforms] Use llvm::is_contained (NFC) · f83a88a1
  Kazu Hirata authored Jun 04, 2022
  
  f83a88a1
- [SPARC] Fix type for i64 inline asm operands · 700eadca
  LemonBoy authored Jun 04, 2022
```
Differential Revision: https://reviews.llvm.org/D101694
```
  700eadca
Jun 04, 2022

[VPlan] Update vector latch terminator edge to exit block after execution. · 416a5080

Florian Hahn authored Jun 04, 2022

Instead of setting the successor to the exit using CFG.ExitBB, set it to
nullptr initially. The successor to the exit block is later set either
through createEmptyBasicBlock or after VPlan execution (because at the
moment, no block is created by VPlan for the exit block, the existing
one is reused).

This also enables BranchOnCond to be used as terminator for the exiting
block of the topmost vector region.

Depends on D126618.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126679

416a5080

[mlir] Use context provided rather than getContext · 29794ab0
Jacques Pienaar authored Jun 04, 2022
```
Avoids "pass state was never initialized" assertion failure.
```
29794ab0

[flang][runtime] Catch OPEN of connected file · 03c066ab

Peter Klausler authored Jun 03, 2022

Diagnose OPEN(FILE=f) when f is already connected by the same name to
a distinct external I/O unit.

Differential Revision: https://reviews.llvm.org/D127035

03c066ab

[flang][runtime] Emit error message rather than crashing for MOD(ULO)(x,P=0) · 562fd2c9

Peter Klausler authored Jun 02, 2022

Add extra arguments and checks to the runtime support library so that
a call to the intrinsic functions MOD and MODULO with "denominator"
argument P of zero will cause a crash with a source location rather
than an uninformative floating-point error or integer division by
zero signal.

Additional work is required in lowering to (1) pass source file path and
source line number arguments and (2) actually call these runtime
library APIs instead of emitting inline code for MOD &/or MODULO.

Differential Revision: https://reviews.llvm.org/D127034

562fd2c9

[flang][runtime] Fix deadlock in error recovery · 11f928af

Peter Klausler authored Jun 02, 2022

When an external I/O statement is in a recoverable error
state before any data transfers take place (for example,
an unformatted transfer with ERR=/IOSTAT=/IOMSG= attempted on
a formatted unit), ensure that the unit's mutex is still
released at the end of the statement.

Differential Revision: https://reviews.llvm.org/D127032

11f928af

[flang] When folding FINDLOC, convert operands to a common type · ed71a0b4

Peter Klausler authored Jun 01, 2022

For example, FINDLOC(A,X) should convert both A and X to COMPLEX(8)
if the operands are REAL(8) and COMPLEX(4), so that comparisons
can be done without losing inforation. The current implementation
unconditionally converts X to the type of the array A.

Differential Revision: https://reviews.llvm.org/D127030

ed71a0b4

[flang][runtime] Fix WRITE after OPEN(.., ACCESS="APPEND") · 9a163ffe

Peter Klausler authored Jun 01, 2022

The initial size of the file was not being captured as the file position
on which the first output buffer should be framed.

Differential Revision: https://reviews.llvm.org/D127029

9a163ffe

[flang][runtime] Fix edge case discrepancies with EN output editing · dfcccc6d

Peter Klausler authored Jun 01, 2022

The "engineering" ENw.d output editing descriptor has some difficult
edge case behavior for values that might format into a bunch of 9's
or round up to a 1 for a given scale factor.  Fix the algorithm,
and add tests to protect against regressions.

Differential Revision: https://reviews.llvm.org/D127028

dfcccc6d

[flang] Don't crash on initialization with a zero-sized derived type · d484fe93

Peter Klausler authored Jun 01, 2022

Avoid calls to memcpy with zero byte counts if their address argument
calculations may not be valid expressions.

Differential Revision: https://reviews.llvm.org/D127027

d484fe93

[flang][runtime] Don't crash after surviving internal output overflow · ea5b205b

Peter Klausler authored May 31, 2022

After the program has survived its attempt to overflow the output buffer
with an internal WRITE using ERR=, IOSTAT=, &/or IOMSG=, don't crash
by accidentally blank-filling the next record that usually doesn't exist.

Differential Revision: https://reviews.llvm.org/D127024

ea5b205b

[flang][runtime] Don't let random seed queries change the sequence · ea1a69d6

Peter Klausler authored Jun 03, 2022

When the current seed of the pseudo-random generator is queried
with CALL RANDOM_SEED(GET=n), that query should not change the
stream of pseudo-random numbers produced by CALL RANDOM_NUMBER().

Differential Revision: https://reviews.llvm.org/D127023

ea1a69d6

Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication... · 369ce54b

Mehdi Amini authored Jun 04, 2022

Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."

This reverts commit bcfc0a90.

The build is broken with shared library enabled.

369ce54b

Remove unneeded cl::ZeroOrMore for cl::opt options · 36c7d79d

Fangrui Song authored Jun 04, 2022

Similar to 557efc9a.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.

36c7d79d

[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with... · bcfc0a90

Christian Sigg authored Jun 03, 2022

[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.

This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158

bcfc0a90

[flang][runtime] Signal new I/O error on floating-point input overflow · 9c54d762

Peter Klausler authored Jun 03, 2022

Besides raising the IEEE floating-point overflow exception, treat
a floating-point overflow on input as an I/O error catchable with
ERR=, IOSTAT=, &/or IOMSG=.

Differential Revision: https://reviews.llvm.org/D127022

9c54d762

[BOLT][UTILS] Usability improvements for nfc-check-setup · b346af6d

Amir Ayupov authored Jun 03, 2022

# Stash local changes before checkout.
# Print a message that the source repository revision has been changed, with
  instructions to switch back.
# Make the script executable.
# Print sample instructions how to run bolt tests.
# Assume that llvm-bolt-wrapper script is in the same source directory.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126941

b346af6d

[flang] Don't discard lower bounds of implicit-shape named constants · 08c6a323

Peter Klausler authored May 30, 2022

F18 preserves lower bounds of explicit-shape named constant arrays, but
failed to also do so for implicit-shape named constants. Fix.

Differential Revision: https://reviews.llvm.org/D127021

08c6a323

[flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1. · f3278e0f

Peter Klausler authored May 30, 2022

It was possible for RANDOM_NUMBER() to return 1.0.

Differential Revision: https://reviews.llvm.org/D127020

f3278e0f

Revert D126950 "[lld][WebAssembly] Retain data segments referenced via __start/__stop" · 025b3096

Fangrui Song authored Jun 03, 2022

This reverts commit dcf3368e.

It breaks -DLLVM_ENABLE_ASSERTIONS=on builds. In addition, the description is
incorrect about ld.lld behavior. For wasm, there should be justification to add
the new mode.

025b3096

[flang] Distinguish intrinsic module USE in module files; correct search paths · 15faac90

Peter Klausler authored May 30, 2022

In the USE statements that f18 emits to module files, ensure that symbols
from intrinsic modules are marked as such on their USE statements. And
ensure that the current working directory (".") cannot override the intrinsic
module search path when trying to locate an intrinsic module.

Differential Revision: https://reviews.llvm.org/D127019

15faac90

[Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC · 72f9c694
Fangrui Song authored Jun 03, 2022
```
Similar to 557efc9a
```
72f9c694
[clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC · 734c2234
Fangrui Song authored Jun 03, 2022
```
Similar to 557efc9a
```
734c2234

[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC · 557efc9a

Fangrui Song authored Jun 03, 2022

Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.

557efc9a

[RISCV] Add more patterns for FNMADD · f14d18c7

LiaoChunyu authored Jun 02, 2022

D54205 handles fnmadd: -rs1 * rs2 - rs3
This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126852

f14d18c7

[libc++][ranges][NFC] Fix a patch link in ranges status. · 7c63cc19
varconst authored Jun 03, 2022

7c63cc19
[libc++][ranges][NFC] Mark range algorithms that are in progress. · faf43ad7
varconst authored Jun 03, 2022

faf43ad7

[lld][WebAssembly] Retain data segments referenced via __start/__stop · dcf3368e

Yuta Saito authored Jun 04, 2022

As well as ELF linker does, retain all data segments named X referenced
through `__start_X` or `__stop_X`.

For example, `FOO_MD` should not be stripped in the below case, but it's currently mis-stripped

```llvm
@FOO_MD  = global [4 x i8] c"bar\00", section "foo_md", align 1
@__start_foo_md = external constant i8*
@__stop_foo_md = external constant i8*
@llvm.used = appending global [1 x i8*] [i8* bitcast (i32 ()* @foo_md_size to i8*)], section "llvm.metadata"

define i32 @foo_md_size()  {
entry:
  ret i32 sub (
    i32 ptrtoint (i8** @__stop_foo_md to i32),
    i32 ptrtoint (i8** @__start_foo_md to i32)
  )
}
```

This fixes https://github.com/llvm/llvm-project/issues/55839

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D126950

dcf3368e

[flang] Correct folding of CSHIFT and EOSHIFT for DIM>1 · e0adee84

Peter Klausler authored May 29, 2022

The algorithm was wrong for higher dimensions, and so were
the expected test results.  Rework.

Differential Revision: https://reviews.llvm.org/D127018

e0adee84

[pseudo] Fix leaks after D126731 · 47ec8b55
Fangrui Song authored Jun 03, 2022
```
Array Operator new Cookies help lsan find allocations, while std::array
can't.
```
47ec8b55