Commits · 40651b25e8c41cc8ff821b0132cbda93a0763a8d · Roger Ferrer / llvm-epi-0.8

Feb 14, 2020

Merge remote-tracking branch 'upstream/master' · 40651b25
Jenkins CI authored 5 years ago

40651b25

[scudo][standalone] Allow setting release to OS · 5f91c7b9

Summary:
Add a method to set the release to OS value as the system runs,
and allow this to be set differently in the primary and the secondary.
Also, add a default value to use for primary and secondary. This
allows Android to have a default that is different for
primary/secondary.

Update mallopt to support setting the release to OS value.

Reviewers: pcc, cryptoad

Reviewed By: cryptoad

Subscribers: cryptoad, jfb, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D74448

5f91c7b9

[AsmPrinter] Use the McASMInfo to determine if we need descriptors. · b75692c3

Sean Fertile authored 5 years ago

In https://reviews.llvm.org/rG8b737688c21a9755cae14cb9343930e0882164ab I
switched the condition gating the creation of the descriptor symbol from
checking the MCAsmInfo if we need to support descriptors, to if the OS
was AIX. Technically the 2 should be interchangeable: if we are
targeting AIX then we need to emit XCOFF object files, and the MCAsmInfo
must return true for needing function descriptors.

This doesn't account for lit test with runsteps that only set the arch.
Eg: test/CodeGen/XCore/section-name.ll
which when run natively on AIX we end up with a target xcore-ibm-aix and
needFunctionDescriptors is false.

This patch reverts to using the MCAsmInfo and adds an assert that the
target OS must be AIX since that is the only target using the descriptor
hook.

Differential Revision: https://reviews.llvm.org/D74622

b75692c3

fix some comment typos to cycle bots · 87e80e5e
Nico Weber authored 5 years ago

87e80e5e

[windows] Add /Gw to compiler flags · 09153ab9

Nico Weber authored 5 years ago

This is like -fdata-sections, and it's not part of /O2 by default for some reason.

In the cmake build, reduces the size of clang.exe from 70,358,016 bytes to 69,982,720 bytes.

clang-format.exe goes from 3,703,296 bytes to 3,331,072 bytes.

Differential Revision: https://reviews.llvm.org/D74573

09153ab9

[AMDGPU] Always enable XNACK feature when support is explicitly requested · 07824e65
Austin Kerbow authored 5 years ago
```
Differential Revision: https://reviews.llvm.org/D74630
```
07824e65
[docs] Add note on using cmake to perform the build · 4af3be7b
Evandro Menezes authored 5 years ago
```
Repeat the build instructions from the top level README in the Getting
Started guide.
```
4af3be7b

AMDGPU: Add option to disable CGP division expansion · 9ec66860

Matt Arsenault authored 5 years ago

The division expansions in AMDGPUCodeGenPrepare can't be relied on for
correctness, since they punt to later optimization and possibly
legalization in some cases. We still need a way to be able to write
tests for the legalizer versions of the expansion. This is mostly for
GlobalISel, since the expected optimzations is expecting aren't
implemented.

The interaction with the flag to expand 64-bit division in the IR is
pretty confusing, but these flags have different purposes.

9ec66860

[x86] remove stray test assertions; NFC · 63ed0ece

Sanjay Patel authored 5 years ago

I updated the prefix and forgot to manually remove the old names
as part of rG6071fc57a45.f

63ed0ece

[x86] regenerate complete test checks for sqrt{est}; NFC · 6071fc57

Sanjay Patel authored 5 years ago

The existing checks were trying to test both CPU-specific
codegen and generic codegen with explicit attributes for
the various sqrt estimate possibilities, but that was hard
to decipher and update (D69989).

Instead generate the complete results for various CPUs,
and that makes it clear which models have slow/fast sqrt
attributes along with all of the other potential diffs
(FMA, AVX2, scheduling).

Also, explicitly add the function attributes corresponding
to whether DAZ/FTZ denorm settings are expected.

6071fc57

AMDGPU: Add option to expand 64-bit integer division in IR · 34d9a16e

Matt Arsenault authored 5 years ago

I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.

This now requires width reductions of 64-bit divisions before
introducing the expanded loops.

This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.

I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.

34d9a16e

[X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16. · 391cc4dd
Craig Topper authored 5 years ago

391cc4dd
[X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16. · fc0c72b2
Craig Topper authored 5 years ago

fc0c72b2

[LoopRotate] Get and update MSSA only if available in legacy pass manager. · 1326a5a4

Alina Sbirlea authored 5 years ago

Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408

In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.

This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.

Reviewers: dmgreen, fedor.sergeev, nikic

Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74574

1326a5a4

GlobalISel: Lower s64->s16 G_FPTRUNC · bfbfa185

Matt Arsenault authored 5 years ago

This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.

bfbfa185

[GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges · 187686a2

Volkan Keles authored 5 years ago

Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.

https://reviews.llvm.org/D70564

187686a2

[Hexagon] v67+ HVX register pairs should support either direction · bf3b86bc

Brian Cain authored 6 years ago

Assembler now permits pairs like 'v0:1', which are encoded
differently from the odd-first pairs like 'v1:0'.

The compiler will require more work to leverage these new register
pairs.

bf3b86bc

Fix tests after previous commit · 70530652
Aaron Puchert authored 5 years ago
```
We don't want to test for this warning, so we just fix it.
```
70530652

Warn about zero-parameter K&R definitions in -Wstrict-prototypes · 2f26bc55

Aaron Puchert authored 5 years ago

Summary:
Zero-parameter K&R definitions specify that the function has no
parameters, but they are still not prototypes, so calling the function
with the wrong number of parameters is just a warning, not an error.

The C11 standard doesn't seem to directly define what a prototype is,
but it can be inferred from 6.9.1p7: "If the declarator includes a
parameter type list, the list also specifies the types of all the
parameters; such a declarator also serves as a function prototype
for later calls to the same function in the same translation unit."
This refers to 6.7.6.3p5: "If, in the declaration “T D1”, D1 has
the form
    D(parameter-type-list)
or
    D(identifier-list_opt)
[...]". Later in 6.11.7 it also refers only to the parameter-type-list
variant as prototype: "The use of function definitions with separate
parameter identifier and declaration lists (not prototype-format
parameter type and identifier declarators) is an obsolescent feature."

We already correctly treat an empty parameter list as non-prototype
declaration, so we can just take that information.

GCC also warns about this with -Wstrict-prototypes.

This shouldn't affect C++, because there all FunctionType's are
FunctionProtoTypes. I added a simple test for that.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D66919

2f26bc55

[APInt] Add some basic APInt::byteSwap unit tests · f0181cc7
Simon Pilgrim authored 5 years ago
```
As noted on D74621 we currently have no test coverage
```
f0181cc7
TTI: Fix vectorization cost for bswap · b38940df
Matt Arsenault authored 5 years ago

b38940df
[lldb/Plugin] s/LLDB_PLUGIN/LLDB_PLUGIN_DEFINE/ (NFC) · bba9ba8d
Jonas Devlieghere authored 5 years ago
```
Rename LLDB_PLUGIN to LLDB_PLUGIN_DEFINE as Pavel suggested in D73067 to
avoid name conflict.
```
bba9ba8d
[libc++] Add missing include for is_same in test · e8358455
Eric Fiselier authored 5 years ago

e8358455
AMDGPU: Improve i16/v2i16 bswap · 8c2c0b36
Matt Arsenault authored 5 years ago

8c2c0b36
[X86] Fix copy/paste mistake in comment. NFC · 7badb389
Craig Topper authored 5 years ago

7badb389
AMDGPU: Add baseline tests for 16-bit bswap · e0fd2d6d
Matt Arsenault authored 5 years ago

e0fd2d6d
Merge remote-tracking branch 'upstream/master' · 6c5ec5fb
Jenkins CI authored 5 years ago

6c5ec5fb
AMDGPU/GlobalISel: Handle G_BSWAP · a257bde4
Matt Arsenault authored 5 years ago

a257bde4

[libc++] Remove cycle between <type_traits> and <cstddef> · cccf1ef0

Eric Fiselier authored 5 years ago

This was caused by byte depending on traits. This patch moves
the minimal amount of meta-programming into <cstddef> to break the cycle.

cccf1ef0

Fix compilation breakage introduced by 8404aeb5. · 0d2ba657
Alexandre Ganea authored 5 years ago
```
Also fix BitVector unittest failure when DLLVM_ENABLE_ASSERTIONS are OFF, introduced by d110c3a9.
```
0d2ba657

[Driver] Rename AddGoldPlugin to addLTOOptions. NFC · 597dfb3b

Fangrui Song authored 5 years ago

AddGoldPlugin does more than adding `-plugin path/to/LLVMgold.so`.
It works with lld and GNU ld, and adds other LTO options.
So AddGoldPlugin is no longer a suitable name.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74591

597dfb3b

Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with... · cae643d5

Evgeniy Brevnov authored 5 years ago

Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).

cae643d5

add type_traits include as required for std::integral_constant · e337fb07
Eric Fiselier authored 5 years ago

e337fb07

Revert "Reland D74436 "Change clang option -ffp-model=precise to select ffp-contract=on"" · 9122b92f

Melanie Blower authored 5 years ago

This reverts commit 0a1123eb.
Want to revert this because it's causing trouble for PowerPC
I also fixed test fp-model.c which was looking for an incorrect error message

9122b92f

[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups · 8404aeb5

Alexandre Ganea authored 5 years ago

The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775

8404aeb5

[clang-scan-deps] Switch to using a ThreadPool · d9049e87

Alexandre Ganea authored 5 years ago

Use a ThreadPool instead of plain std::threads in clang-scan-deps.
This is needed to further support https://reviews.llvm.org/D71775.

Differential Revision: https://reviews.llvm.org/D74569

d9049e87

[ADT] Support BitVector as a key in DenseSet/Map · d110c3a9

Alexandre Ganea authored 5 years ago

This patch adds DenseMapInfo<> support for BitVector and SmallBitVector.

This is part of https://reviews.llvm.org/D71775, where a BitVector is used as a thread affinity mask.

d110c3a9

Fix line endings produced by update_cc_test_checks.py · c2931070

Alex Richardson authored 5 years ago

Use the same appraoch as update_llc_test_checks.py to always write \n
line endings. This should fix the Windows buildbots.

c2931070

[libc++] Remove unnecessary typenames from std/numerics/c.math/abs.pass.cpp · f54e7b4e

Louis Dionne authored 5 years ago

There are some unnecessary typenames in std/numerics/c.math/abs.pass.cpp;
e.g. they're not in a dependent context.

Patch by Bryce Adelstein Lelbach

Differential Revision: https://reviews.llvm.org/D72106

f54e7b4e

Revert "[clang-tools-extra] fix the check for if '-latomic' is necessary" · 13700c38
Luís Marques authored 5 years ago
```
This reverts commit 1d40c415.
This seemed to have caused build failures on ARM/AArch64.
```
13700c38