Commits · 4cb38bfe76b7ef157485338623c931d04d17b958 · Lorenzo Albano / LLVM bpEVL

Mar 31, 2022

[clangd] IncludeCleaner: Add support for IWYU pragma private · 4cb38bfe
Kirill Bobyrev authored Mar 31, 2022
```
Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D120306
```
4cb38bfe

[LV] Remove unneeded createHeaderBranch.(NFCI) · 32bc83d1

Florian Hahn authored Mar 31, 2022

The only remaining use was to get the exit block of the loop. Instead of
relying on the loop, use the successor of VectorHeaderBB
(LoopMiddleBlock) directly to set VPTransformState::CFG::ExitB

Depends on D121621.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121623

32bc83d1

[AArch64] Set MaxBytesForLoopAlignment for more targets · 7d676714
Nicholas Guy authored Mar 22, 2022
```
Differential Revision: https://reviews.llvm.org/D122566
```
7d676714

Added an empty __init__.py file to the MLIR Python bindings · b50893db

Sergei Lebedev authored Mar 31, 2022

While not strictly required after PEP-420, it is better to have one, since not
all tooling supports implicit namespace packages.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D122794

b50893db

Fixed mypy type errors in MLIR Python type stubs · 65b2f24c

Sergei Lebedev authored Mar 31, 2022

This commit fixes or disables all errors reported by

    python3 -m mypy -p mlir --show-error-codes

Note that unhashable types cannot be currently expressed in a way compatible
with typeshed. See https://github.com/python/typeshed/issues/6243 for details.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D122790

65b2f24c

[X86] Extend xor-lea test coverage · a1d09f3a
Simon Pilgrim authored Mar 31, 2022
```
Add ADD/SUB(XOR(X,MIN_SIGNED_VALUE),Y) tests
```
a1d09f3a

[VPlan] Remove unneeded Loop variable (NFC). · 2c494f09

Florian Hahn authored Mar 31, 2022

Suggested in D121623. The remaining uses of L can be replaced, reducing
the need for the variable.

2c494f09

[AddressSanitizer] Allow prefixing memintrinsic calls in kernel mode · b8e49fdc

Marco Elver authored Mar 30, 2022

Allow receiving memcpy/memset/memmove instrumentation by using __asan or
__hwasan prefixed versions for AddressSanitizer and HWAddressSanitizer
respectively when compiling in kernel mode, by passing params
-asan-kernel-mem-intrinsic-prefix or -hwasan-kernel-mem-intrinsic-prefix.

By default the kernel-specialized versions of both passes drop the
prefixes for calls generated by memintrinsics. This assumes that all
locations that can lower the intrinsics to libcalls can safely be
instrumented. This unfortunately is not the case when implicit calls to
memintrinsics are inserted by the compiler in no_sanitize functions [1].

To solve the issue, normal memcpy/memset/memmove need to be
uninstrumented, and instrumented code should instead use the prefixed
versions. This also aligns with ASan behaviour in user space.

[1] https://lore.kernel.org/lkml/Yj2yYFloadFobRPx@lakrids/

Reviewed By: glider

Differential Revision: https://reviews.llvm.org/D122724

b8e49fdc

[MLIR][Presburger] Remove forward declaration to PresburgerLocalSpace · 8de84198
Groverkss authored Mar 31, 2022
```
This patch removes a forward declaration to PresburgerLocalSpace, a
class which does not exist anymore.
```
8de84198

[flang] Allow user to recover from bad edit descriptor with INTEGER · 88d4b85f

Jean Perier authored Mar 31, 2022

Runtime was crashing when an INTEGER passed in formatted output with
a bad edit descriptor even when the user did provide IOSTAT. Flang
is already signaling an error when facing similar error with other
types. Do the same with INTEGERs.

The input case is already signaling an error in the related input error
case.

Differential Revision: https://reviews.llvm.org/D122749

88d4b85f

[flang] Skip `D` when including D debug line · 14c7754a

Jean Perier authored Mar 31, 2022

When including debug lines as code, the `D` should be considered as
a white space. Currently an error was raised about bad labels because
it the `D` remained a `D` when considering the source line as code.

Differential Revision: https://reviews.llvm.org/D122711

14c7754a

[X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can... · 481b1856

Simon Pilgrim authored Mar 31, 2022

[X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT

As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB

Differential Revision: https://reviews.llvm.org/D122572

481b1856

[RISCV][RVV] Add Uses = [FRM] and mayRaiseFPException = true to RVV instructions · 2f1261ab

ShihPo Hung authored Mar 06, 2022

This patch adds Uses = [FRM] and mayRaiseFPException = true to following
instructions:

VFADD, VFSUB, VFRSUB, VFMUL, VFDIV, VFRDIV
VFWADD, VFWSUB, VFWMUL
VFMADD, VFMACC, VFMSAC, VFMSUB
VFNMADD, VFNMACC, VFNMSAC, VVFNMSUB
VFWMACC, VFWMSAC,
VFWNMACC, VFWNMSAC
VFSQRT, VFREC7
VFREDOSUM, VFREDUSUM,
VFWREDOSUM, VFWREDUSUM
and only adds mayRaiseFPException = true to following instructions:

VFRSQRT7,
VFMIN, VFMAX, VFREDMIN, VFREDMAX
VMFEQ, VMFNE, VMFLT,VMFLE, VMFGT, VMFGE

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D121087

2f1261ab

[LV] Invalidate widening decisions after maximizing vector bandwidth · b65267ca

David Green authored Mar 31, 2022

When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.

This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.

Differential Revision: https://reviews.llvm.org/D120215

b65267ca

[mlir][memref][NFC] Remove unused function · 566975a2
Matthias Springer authored Mar 31, 2022
```
This fixes a compiler warning.
```
566975a2
[AMDGPU] Document the intended semantics of llvm.amdgcn.s.buffer.load · aa4c055e
Jay Foad authored Mar 29, 2022
```
Differential Revision: https://reviews.llvm.org/D122653
```
aa4c055e

[mlir][tensor] Fix bufferization of CollapseShapeOp / ExpandShapeOp · 51df6238

Matthias Springer authored Mar 31, 2022

Infer a tighter MemRef type instead of always falling back to the most dynamic MemRef type. This is inefficient and caused op verification errors.

Differential Revision: https://reviews.llvm.org/D122649

51df6238

[RISCV][NFC] Fix comment to refer to correct file · 893d63fb
Fraser Cormack authored Mar 31, 2022

893d63fb
[mlir][memref] Fix CollapseShapeOp verifier · 86d118e7
Matthias Springer authored Mar 31, 2022
```
Differential Revision: https://reviews.llvm.org/D122647
```
86d118e7

[mlir][memref] Fix ExpandShapeOp verifier · 2bd7ee45

Matthias Springer authored Mar 31, 2022

* Complete rewrite of the verifier.
* CollapseShapeOp verifier will be updated in a subsequent commit.
* Update and expand op documentation.
* Add a new builder that infers the result type based on the source type, result shape and reassociation indices. In essence, only the result layout map is inferred.

Differential Revision: https://reviews.llvm.org/D122641

2bd7ee45

[Support/BLAKE3] Re-enable building with the simd-optimized implementations, v2 · 5426da8f

Argyrios Kyrtzidis authored Mar 31, 2022

* Support compiling with clang-5
* Check for `LLVM_DISABLE_ASSEMBLY_FILES` and have it set by
  `compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh`
  which wants to receive and process only bitcode files.

5426da8f

Revert "[Clang] Add option to set alternative toolchain path" · e1b85430

Qiu Chaofan authored Mar 31, 2022

--overlay-platform-toolchain inserts a whole new toolchain path with
higher priority than system default, which could be achieved by
composing smaller options. We need to figure out alternative solution
and what is missing among these basic options.

e1b85430

[runtimes] Create Tests.cmake if it does not exist · d6623d72

Petr Hosek authored Mar 30, 2022

This is necessary so that Tests.cmake is always included in the
generated build file and any changes made by subbuilds are detected
without needing to rerun CMake.

This is equivalent to an earlier version of D121647.

Differential Revision: https://reviews.llvm.org/D121647

d6623d72

[docs] [tools] Document and alphabetize all llvm-config command-line options · aaf66084
Frances Wingerter authored Mar 31, 2022
```
Also implements explicit handling for the already-documented --help
flag.
```
aaf66084
[VP][LangRef] Correct select operands in vp.fptosi docs · cc67a8fc
Fraser Cormack authored Mar 31, 2022

cc67a8fc
[RISCV] Add VL patterns for vfwmul/vfwadd/vfwsub · b3851e99
Lian Wang authored Mar 24, 2022
```
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122369
```
b3851e99

[test-release] Added -silent-log flag to test-release.sh · a30972fb

Tobias Hieta authored Mar 31, 2022

This flag silents the build output of test-release.sh so that
it can be used in CI systems a bit better. It will still log
the build output to the log files but not echo it to stdout.

Reviewed By: tstellar

Differential Revision: https://reviews.llvm.org/D122146

a30972fb

[Driver] Move legacy -f[no-]unit-at-a-time to clang_ignored_gcc_optimization_f_Group · 85bd90cb
Fangrui Song authored Mar 30, 2022
```
Move to clang_ignored_gcc_optimization_f_Group like other ignored options. This
decreases code size a bit: ~400 bytes on x86-64.
```
85bd90cb

Mapping of FP operations to constrained intrinsics · 881350a9

Serge Pavlov authored Mar 30, 2022

A new function 'getConstrainedIntrinsic' is added, which for any gived
instruction returns id of the corresponding constrained intrinsic. If
there is no constrained counterpart for the instruction or the instruction
is already a constrained intrinsic, the function returns zero.

This is recommit of 115b3ace, reverted in
8160dd58.

Differential Revision: https://reviews.llvm.org/D69562

881350a9

[LoongArch] Construct codegen infra and generate first add instruction. · a1c67439

wanglei authored Mar 31, 2022

This patch constructs codegen infra and successfully generate the first
'add' instruction. Add integer calling convention for fixed arguments which
are passed with general-purpose registers.

New test added here:

  CodeGen/LoongArch/ir-instruction/add.ll

The test file is placed in a subdirectory because we will use
subdirctories to distinguish different categories of tests (e.g.
 intrinsic, inline-asm ...)

Reviewed By: MaskRay, SixWeining

Differential Revision: https://reviews.llvm.org/D122366

a1c67439

[C++20] [Modules] Use '-' as the separator of partitions when searching · ee572129

Chuanqi Xu authored Mar 31, 2022

in filesystems

It is simpler to search for module unit by -fprebuilt-module-path
option. However, the separator ':' of partitions is not friendly.
According to the discussion in https://reviews.llvm.org/D118586, I think
we get consensus to use '-' as the separator instead. The '-' is the
choice of GCC too.

Previously I thought it would be better to add an option. But I feel it
is over-engineering now. Another reason here is that there are too many
options for modules (for clang module mainly) now. Given it is not bad
to use '-' when searching, I think it is acceptable to not add an
option.

Reviewed By: iains

Differential Revision: https://reviews.llvm.org/D120874

ee572129

[GVNHoist] drop debug location according to the debug info guide · 368681f8

Aditya Kumar authored Mar 30, 2022

According to the LLVM debug info update guide: https://llvm.org/docs/HowToUpdateDebugInfo.html,
"Hoisting identical instructions which appear in several successor
blocks into a predecessor block. In this case there is no single
merged instruction. The rule for dropping locations applies".

Thanks to Yuanbo Li for reporting this.

Reviewed By: dblaikie

Reviewers: sebpop, tejohnson, dblaikie

Differential Revision: https://reviews.llvm.org/D122730

368681f8

[mlir][Vector] Fold ShuffleOp if result is identical to one of source vectors. · 01ad70fd

jacquesguan authored Mar 30, 2022

For example, we could do the following eliminations:
  fold vector.shuffle V1, V2, [0, 1, 2, 3] : <4xi32>, <2xi32> -> V1
  fold vector.shuffle V1, V2, [4, 5] : <4xi32>, <2xi32> -> V2

Differential Revision: https://reviews.llvm.org/D122706

01ad70fd

[X86] Add test with abs intrinsic for x86-partial-reduction optimization · 3728eebd
Wei Xiao authored Mar 30, 2022

3728eebd
[flang] Correct a typo when parsing format token white space · 09b1a6d6
V Donaldson authored Mar 29, 2022
```
A format such as "( D   C, X6. 2  )" is parsed the same as "(DC,X6.2)".
```
09b1a6d6

[LLDB] Fix NSIndexPathSyntheticFrontEnd::Impl::Clear() to only clear the active union member · 14cad95d

Shafik Yaghmour authored Mar 30, 2022

NSIndexPathSyntheticFrontEnd::Impl::Clear() currently calls Clear() on both
unions members regardless of which one is active. I modified it to only call
Clear() on the active member.

Differential Revision: https://reviews.llvm.org/D122753

14cad95d

[Utils] Add URL formatting for revert_checker · a4b56d76

Jordan R Abrahams-Whitehead authored Mar 30, 2022

This lets the revert_checker.py get called with the -u option, which
formats the revert and reverted SHAs into handy URLs which point to the
LLVM reviews associated with those SHAs. This is useful for viewers to
look quickly at the changes made by SHAs that were potentially reverted.

Differential Revision: https://reviews.llvm.org/D122772

a4b56d76

[LoopIdiom] Merge TBAA of adjacent stores when creating memset · e02f4976

Stephen Long authored Mar 30, 2022

Factor in the TBAA of adjacent stores instead of just the head store
when merging stores into a memset. We were seeing GVN remove a load that
had a TBAA that matched the 2nd store because GVN determined it didn't
match the TBAA of the memset. The memset had the TBAA of only the first
store.

i.e. Loading the field pi_ of shared_count after memset to create an
array of shared_ptr

template<class T>
class shared_ptr {
  T *p;
  shared_count refcount;
};

class shared_count {
  sp_counted_base *pi_;
};

Differential Revision: https://reviews.llvm.org/D122205

e02f4976

[Clang][CodeGen]Beautify dump format, add indent for nested struct and struct members · 907d3ace

wangyihan authored Mar 31, 2022

Beautify dump format, add indent for nested struct and struct members, also fix test cases in dump-struct-builtin.c
for example:
struct:
```
  struct A {
    int a;
    struct B {
      int b;
      struct C {
        struct D {
          int d;
          union E {
            int x;
            int y;
          } e;
        } d;
        int c;
      } c;
    } b;
  };
```
Before:
```
struct A {
int a = 0
struct B {
    int b = 0
struct C {
struct D {
            int d = 0
union E {
                int x = 0
                int y = 0
                }
            }
        int c = 0
        }
    }
}
```
After:
```
struct A {
    int a = 0
    struct B {
        int b = 0
        struct C {
            struct D {
                int d = 0
                union E {
                    int x = 0
                    int y = 0
                }
            }
            int c = 0
        }
    }
}
```

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D122704

907d3ace

[libc] Improve the performance of expm1f. · a5466f04

Tue Ly authored Mar 27, 2022

Improve the performance of expm1f:
- Rearrange the selection logic for different cases to improve the overall
throughput.
- Use the same degree-4 polynomial for large inputs as `expf`
(https://reviews.llvm.org/D122418), reduced from a degree-7 polynomial.

Performance benchmark using perf tool from CORE-MATH project
(https://gitlab.inria.fr/core-math/core-math/-/tree/master):
Before this patch:
```
$ ./perf.sh expm1f

CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.288
LIBC reciprocal throughput        : 54.572

$ ./perf.sh expm1f --latency

CORE-MATH latency   : 57.759
System LIBC latency : 147.146
LIBC latency        : 118.057
```

After this patch:
```
$ ./perf.sh expm1f

CORE-MATH reciprocal throughput   : 15.359
System LIBC reciprocal throughput : 53.188
LIBC reciprocal throughput        : 14.600

$ ./perf.sh expm1f --latency

CORE-MATH latency   : 57.774
System LIBC latency : 147.119
LIBC latency        : 60.280

```

Reviewed By: michaelrj, santoshn, zimmermann6

Differential Revision: https://reviews.llvm.org/D122538

a5466f04