Commits · 19e291aac04f3fcaf3de55763e496b187158c938 · Lorenzo Albano / LLVM bpEVL

Oct 13, 2015

Looks like malformed-machos 00000031.a test is just getting a different error · 19e291aa
Kevin Enderby authored Oct 13, 2015
```
on some of the bots.  I’ll remove this test for now.

llvm-svn: 250141
```
19e291aa

DAGCombiner: Don't stop finding better chain on 2 aliases · e5d9515f

Matt Arsenault authored Oct 13, 2015

The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138

e5d9515f

x86: preserve flags when folding atomic operations · 986ed68e

JF Bastien authored Oct 13, 2015

Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135

986ed68e

AMDGPU: Refactor isVGPRToSGPRCopy · f0d9e47d

Matt Arsenault authored Oct 13, 2015

It should now correctly handle physical registers and make
it easier to identify the other direction.

llvm-svn: 250132

f0d9e47d

Remove the correct unstable malformed-machos test mem-crup-0261.macho and · 3c4927b7

Kevin Enderby authored Oct 13, 2015

restore the malformed-machos 00000031.a test.  Hopefully this will get all the
build bots happy again.  I’ll again keep an eye on them.

llvm-svn: 250130

3c4927b7

DAGCombiner: Combine extract_vector_elt from build_vector · 61dc235f

Matt Arsenault authored Oct 12, 2015

This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129

61dc235f

[InstCombine] Tidied up SSE4A tests. · aa0ec7f4
Simon Pilgrim authored Oct 12, 2015
```
First stage of bugfix discussed in D13348

llvm-svn: 250121
```
aa0ec7f4
Temporarily remove the test added in r250117 while I investigate why two · 0b3bfd15
Kevin Enderby authored Oct 12, 2015
```
of the build bots get a different error on that malformed file.

llvm-svn: 250120
```
0b3bfd15

Assign correct edge weights to unwind destinations when lowering invoke statement. · bf22f506

Cong Hou authored Oct 12, 2015

When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.

Differential revision: http://reviews.llvm.org/D13354

llvm-svn: 250119

bf22f506

[SelectionDAG] Add common vector constant folding helper function · c8832fc2

Simon Pilgrim authored Oct 12, 2015

We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).

This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.

After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().

Differential Revision: http://reviews.llvm.org/D13665

llvm-svn: 250118

c8832fc2

Fixed bugs in llvm-obdump while parsing Mach-O files from malformed archives · 90395545

Kevin Enderby authored Oct 12, 2015

that caused aborts.  This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.

rdar://22983603

llvm-svn: 250117

90395545

Oct 12, 2015

[CMake] Adding support for passing in profiling data. · 9ad0380b

Chris Bieneman authored Oct 12, 2015

Adds LLVM_PROFDATA_FILE option to allow specifying a profile data file to be used during compilation of LLVM and subprojects.

llvm-svn: 250108

9ad0380b

Update the branch weight metadata in JumpThreading pass. · 3320bcd8

Cong Hou authored Oct 12, 2015

In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250089

3320bcd8

Make Win64 localescape offsets FP relative instead of SP relative · 4a5f35c0

Reid Kleckner authored Oct 12, 2015

We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088

4a5f35c0

[llvm-symbolizer] Add -print-address option · 80f82fb2
Hemant Kulkarni authored Oct 12, 2015
```
Differential Revision: http://reviews.llvm.org/D13518

llvm-svn: 250086
```
80f82fb2

[x86] Fix wrong lowering of vsetcc nodes (PR25080). · b0fe4eb1

Andrea Di Biagio authored Oct 12, 2015

Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085

b0fe4eb1

Add - and -= operators to BlockFrequency using saturating arithmetic. · 61e13de4
Cong Hou authored Oct 12, 2015
```
llvm-svn: 250077
```
61e13de4
[libFuzzer] mention more trophies and improve the link formatting · 928eb33a
Kostya Serebryany authored Oct 12, 2015
```
llvm-svn: 250076
```
928eb33a
combine predicates; NFCI · 0dc91b31
Sanjay Patel authored Oct 12, 2015
```
llvm-svn: 250075
```
0dc91b31

Turn const/const& into value type for BlockFrequency in functions of this... · 90c6cf8e

Cong Hou authored Oct 12, 2015

Turn const/const& into value type for BlockFrequency in functions of this class. Also fix a naming issue. NFC.

llvm-svn: 250074

90c6cf8e

[llvm-symbolizer] Reverting r250067 · e901616b
Colin LeMahieu authored Oct 12, 2015
```
llvm-svn: 250072
```
e901616b
AMDGPU: Register some more passes so -print-before works · 8c0ef8b3
Matt Arsenault authored Oct 12, 2015
```
llvm-svn: 250071
```
8c0ef8b3

Enable verifier after PeepholeOptimizer · 07a72bad

Matt Arsenault authored Oct 12, 2015

No tests fail with this enabled so I assume it was an accident
that it isn't enabled now.

llvm-svn: 250070

07a72bad

Don't call PrepareEHLandingPad on non EH pads · 9abb3c06

Reid Kleckner authored Oct 12, 2015

This was a minor bug in r249492. Calling PrepareEHLandingPad on a
non-landingpad was a no-op, but it attempted to get the generic pointer
register class, which apparently doesn't exist for some targets.

llvm-svn: 250068

9abb3c06

[llvm-symbolizer] Add -print-address option · c07c7edd
Hemant Kulkarni authored Oct 12, 2015
```
Differential Revision  http://reviews.llvm.org/D13518

llvm-svn: 250067
```
c07c7edd

[WinEH] Remove CatchObjRecoverIdx · 99c1d13e

David Majnemer authored Oct 12, 2015

CatchObjRecoverIdx was used for the old scheme, it is no longer
relevant.

llvm-svn: 250065

99c1d13e

fix typos; NFC · b814ef1a
Sanjay Patel authored Oct 12, 2015
```
llvm-svn: 250059
```
b814ef1a
[mips][micromips] Initial support for micrmomips DSP instructions and addu.qb implementation · 2e386d3d
Zoran Jovanovic authored Oct 12, 2015
```
Differential Revision: http://reviews.llvm.org/D12798

llvm-svn: 250058
```
2e386d3d

[Debug] Look through bitcasts to find argument registers · cca893ff

Oliver Stannard authored Oct 12, 2015

On targets where f32 is not legal, we have to look through a BITCAST SDNode to
find the register that an argument is stored in when emitting debug info, or we
will not be able to emit a DW_AT_location for it.

Differential Revision: http://reviews.llvm.org/D13005

llvm-svn: 250056

cca893ff

[mips][FastISel] Clang-format switch statement. NFC. · 2a95f828
Vasileios Kalintiris authored Oct 12, 2015
```
llvm-svn: 250053
```
2a95f828

[AArch64]Fix bug in function names in test case · 54f3ddfb

Jun Bum Lim authored Oct 12, 2015

Functions in this test case need to be renamed as its names are the same
as the instructions we are comparing with.

llvm-svn: 250052

54f3ddfb

fix capitalization; NFC · 53d1d8b7
Sanjay Patel authored Oct 12, 2015
```
llvm-svn: 250049
```
53d1d8b7

Fix rename() sometimes failing if another process uses openFileForRead() · 7f68a716

Greg Bedwell authored Oct 12, 2015

On Windows, fs::rename() could fail is another process was reading the
file at the same time using fs::openFileForRead().  In most cases the user
wouldn't notice as fs::rename() will continue to retry for 2000ms.  Typically
this is enough for the read to complete and a retry to succeed, but if the
disk is being it too hard then the response time might be longer than the
retry time and the rename would fail with a permission error.

Add FILE_SHARE_DELETE to the sharing flags for CreateFileW() in
fs::openFileForRead() and try ReplaceFileW() prior to MoveFileExW()
in fs::rename().

Based on an initial patch by Edd Dawson!

Differential Revision: http://reviews.llvm.org/D13647

llvm-svn: 250046

7f68a716

[mips][ias] Implement macro expansion when bcc has an immediate where a register belongs. · b1ef88c1

Daniel Sanders authored Oct 12, 2015

Summary: Fixes PR24915.

Reviewers: vkalintiris

Subscribers: emaste, seanbruno, llvm-commits

Differential Revision: http://reviews.llvm.org/D13533

llvm-svn: 250042

b1ef88c1

[mips] Whitespace cleanup in MIPS16 tests to reduce noise in following changes. NFC. · 332cef6c
Daniel Sanders authored Oct 12, 2015
```
Mostly tabs -> spaces and double spacing.

llvm-svn: 250041
```
332cef6c

[mips] Clean up most macro expansions to use the emit*() functions. · 2a5ce1ac

Daniel Sanders authored Oct 12, 2015

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13591

llvm-svn: 250040

2a5ce1ac

[mips] Handle undef when extracting subregs from FP64 registers. · 2fb8564d

Daniel Sanders authored Oct 12, 2015

Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.

Reviewers: vkalintiris

Subscribers: llvm-commits, zoran.jovanovic, petarj

Differential Revision: http://reviews.llvm.org/D13467

llvm-svn: 250039

2fb8564d

GlobalOpt does not treat externally_initialized globals correctly · 939724cd

Oliver Stannard authored Oct 12, 2015

GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.

llvm-svn: 250035

939724cd

[ARM] Mark Swift MISched model as incomplete · fa4e994a

James Molloy authored Oct 12, 2015

The Swift Machine Scheduler Model is incomplete. There are instructions
missing which can trigger the "incomplete machine model" abort. This was
observed when a downstream SchedMachineModel was added to the ARM
target.

Patch by Christof Douma!

llvm-svn: 250033

fa4e994a

[LoopVectorize] Shrink integer operations into the smallest type possible · 55d633bd

James Molloy authored Oct 12, 2015

C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

llvm-svn: 250032

55d633bd