Commits · 73523021d0a97c150a76a5cf4a91e99cd03b9efb · Roger Ferrer / llvm-epi-0.8

Jan 13, 2014

[PM] Split DominatorTree into a concrete analysis result object which · 73523021

Chandler Carruth authored Jan 13, 2014

can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104

73523021

AVX-512: Embedded Rounding Control - encoding and printing · b19c9dc1
Elena Demikhovsky authored Jan 13, 2014
```
Changed intrinsics for vrcp14/vrcp28 vrsqrt14/vrsqrt28 - aligned with GCC.

llvm-svn: 199102
```
b19c9dc1

[PM] Pull the generic graph algorithms and data structures for dominator · e509db41

Chandler Carruth authored Jan 13, 2014

trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095

e509db41

Revert "ReMat: fix overly cavalier attitude to sub-register indices" · 7fdd4857

Tim Northover authored Jan 13, 2014

Very sorry, this was a premature patch that I still need to investigate and
finish off (for some reason beyond me at the moment it doesn't actually fix the
issue in all cases).

This reverts commit r199091.

llvm-svn: 199093

7fdd4857

ReMat: fix overly cavalier attitude to sub-register indices · 59f8d4b4

Tim Northover authored Jan 13, 2014

There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:

    v1 = SOMETHING (v1 is QQ)
    v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
    ...
    uses of v2

In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:

    v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
    ...
    uses of v2:Q1_Q2

For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:

    v1 = SOMETHING (v1 is QQ)
    v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
    ...
    uses of v2 as a QQQ

The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).

This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). Hopefully this situation is rare
enough that it won't matter.

Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).

llvm-svn: 199091

59f8d4b4

[cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c

Chandler Carruth authored Jan 13, 2014

directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082

5ad5f15c

Re-sort #include lines again, prior to moving headers around. · 07baed53
Chandler Carruth authored Jan 13, 2014
```
llvm-svn: 199080
```
07baed53

[PM] Wire up support for writing bitcode with new PM. · b7bdfd65

Chandler Carruth authored Jan 13, 2014

This moves the old pass creation functionality to its own header and
updates the callers of that routine. Then it adds a new PM supporting
bitcode writer to the header file, and wires that up in the opt tool.
A test is added that round-trips code into bitcode and back out using
the new pass manager.

llvm-svn: 199078

b7bdfd65

[AArch64 NEON] Add missing patterns for bitcast from or to v1f64 · cfef55d6
Kevin Qin authored Jan 13, 2014
```
llvm-svn: 199070
```
cfef55d6

[AArch64 NEON] Add more scenarios to use perm instructions when lowering shuffle_vector · 21e8f1c4

Kevin Qin authored Jan 13, 2014

This patch covered 2 more scenarios:

1.  Two operands of shuffle_vector are the same, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> %a, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>

2. One of operands is undef, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>

After this patch, perm instructions will have chance to be emitted instead of lots of INS.

llvm-svn: 199069

21e8f1c4

correct target directive handling error handling · a6505ca4

Saleem Abdulrasool authored Jan 13, 2014

The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.

Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.

llvm-svn: 199068

a6505ca4

Jan 12, 2014

Handle bundled terminators in isBlockOnlyReachableByFallthrough. · 1995b9fe

Jakob Stoklund Olesen authored Jan 12, 2014

Targets like SPARC and MIPS have delay slots and normally bundle the
delay slot instruction with the corresponding terminator.

Teach isBlockOnlyReachableByFallthrough to find any MBB operands on
bundled terminators so SPARC doesn't need to specialize this function.

llvm-svn: 199061

1995b9fe

raw_fd_ostream: Don't change STDERR to O_BINARY, or w*printf() (in assert())... · 4961f7a8

NAKAMURA Takumi authored Jan 12, 2014

raw_fd_ostream: Don't change STDERR to O_BINARY, or w*printf() (in assert()) would barf wide chars after llvm::errs().

llvm-svn: 199057

4961f7a8

raw_stream formatter: [Win32] Use std::signbit() if available, instead of _fpclass(). · 79addb8d
NAKAMURA Takumi authored Jan 12, 2014
```
FIXME: It should be generic to C++11. For now, it is dedicated to mingw-w64.
llvm-svn: 199052
```
79addb8d
Fix non-deterministic SDNodeOrder-dependent codegen · b5262d6d
Nico Rieck authored Jan 12, 2014
```
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
```
b5262d6d

[PM] Add module and function printing passes for the new pass manager. · 52eef887

Chandler Carruth authored Jan 12, 2014

This implements the legacy passes in terms of the new ones. It adds
basic testing using explicit runs of the passes. Next up will be wiring
the basic output mechanism of opt up when the new pass manager is
engaged unless bitcode writing is requested.

llvm-svn: 199049

52eef887

[PM] Simplify the IR printing passes significantly now that a narrower · e0af664c

Chandler Carruth authored Jan 12, 2014

API is exposed.

This removes the support for deleting the ostream, switches the member
and constructor order arround to be consistent with the creation
routines, and switches to using references.

llvm-svn: 199047

e0af664c

[PM] Simplify the interface exposed for IR printing passes. · 9d805139

Chandler Carruth authored Jan 12, 2014

Nothing was using the ability of the pass to delete the raw_ostream it
printed to, and nothing was trying to pass it a pointer to the
raw_ostream. Also, the function variant had a different order of
arguments from all of the others which was just really confusing. Now
the interface accepts a reference, doesn't offer to delete it, and uses
a consistent order. The implementation of the printing passes haven't
been updated with this simplification, this is just the API switch.

llvm-svn: 199044

9d805139

[PM] Run clang-format and remove redundant or obvious comments before · 3dd261d0
Chandler Carruth authored Jan 12, 2014
```
the heavy factoring needed to share logic between the new pass manager
and the old.

llvm-svn: 199043
```
3dd261d0

[PM] Rename the IR printing pass header to a more generic and correct · b8ddc704

Chandler Carruth authored Jan 12, 2014

name to match the source file which I got earlier. Update the include
sites. Also modernize the comments in the header to use the more
recommended doxygen style.

llvm-svn: 199041

b8ddc704

ARM IAS: fix diagnostics of improper qualification · bdae4b87

Saleem Abdulrasool authored Jan 12, 2014

An improper qualifier would result in a superfluous error due to the parser not
consuming the remainder of the statement.  Simply consume the remainder of the
statement to avoid the error.

llvm-svn: 199035

bdae4b87

[Sparc] Add support for parsing floating point instructions. · cd4d9ac6
Venkatraman Govindaraju authored Jan 12, 2014
```
llvm-svn: 199033
```
cd4d9ac6

ARM: change implicit immediate forms of {ld,st}r{,b}t to psuedo-instructions · fb3950ec

Saleem Abdulrasool authored Jan 12, 2014

The implicit immediate 0 forms are assembly aliases, not distinct instruction
encodings.  Fix the initial implementation introduced in r198914 to an alias to
avoid two separate instruction definitions for the same encoding.

An InstAlias is insufficient in this case as the necessary due to the need to
add a new additional operand for the implicit zero.  By using the AsmPsuedoInst,
fall back to the C++ code to transform the instruction to the equivalent
_POST_IMM form, inserting the additional implicit immediate 0.

llvm-svn: 199032

fb3950ec

[Sparc] Replace (unsigned)-1 with ~OU as suggested by Reid Kleckner. · 0b9debf1
Venkatraman Govindaraju authored Jan 12, 2014
```
llvm-svn: 199031
```
0b9debf1

The SPARCv9 ABI returns a float in %f0. · e7084a1c

Jakob Stoklund Olesen authored Jan 12, 2014

This is different from the argument passing convention which puts the
first float argument in %f1.

With this patch, all returned floats are treated as if the 'inreg' flag
were set. This means multiple float return values get packed in %f0,
%f1, %f2, ...

Note that when returning a struct in registers, clang will set the
'inreg' flag on the return value, so that behavior is unchanged. This
also happens when returning a float _Complex.

llvm-svn: 199028

e7084a1c

Add missing mul aliases for armv4 support. Add checks that armv4 can · 485f00fe
Joerg Sonnenberger authored Jan 12, 2014
```
assemble the various mul instructions.

llvm-svn: 199026
```
485f00fe

Switch-to-lookup tables: Don't require a result for the default · ac114a3c

Hans Wennborg authored Jan 12, 2014

case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025

ac114a3c

[Sparc] Add missing processor types: v7 and niagara · a66b314c
Venkatraman Govindaraju authored Jan 11, 2014
```
llvm-svn: 199024
```
a66b314c

ARM IAS: support emitting constant values in target expressions · 2d48edec

Saleem Abdulrasool authored Jan 11, 2014

A 32-bit immediate value can be formed from a constant expression and loaded
into a register.  Add support to emit this into an object file.  Because this
value is a constant, a relocation must *not* be produced for it.

llvm-svn: 199023

2d48edec

Jan 11, 2014

LoopVectorizer: Enable strided memory accesses versioning per default · 66c742ae
Arnold Schwaighofer authored Jan 11, 2014
```
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
```
66c742ae
[Sparc] Bundle instruction with delay slow and its filler. Now, we can use... · 0653218b
Venkatraman Govindaraju authored Jan 11, 2014
```
[Sparc] Bundle instruction with delay slow and its filler. Now, we can use -verify-machineinstrs with SPARC backend.

llvm-svn: 199014
```
0653218b
Fix 'ned' typo in doc comment · 798060e0
Alp Toker authored Jan 11, 2014
```
Patch by Jasper Neumann!

llvm-svn: 199007
```
798060e0

[PM] Add names to passes under the new pass manager, and a debug output · a13f27cc

Chandler Carruth authored Jan 11, 2014

mode that can be used to debug the execution of everything.

No support for analyses here, that will come later. This already helps
show parts of the opt commandline integration that isn't working. Tests
of that will start using it as the bugs are fixed.

llvm-svn: 199004

a13f27cc

LoopVectorize.cpp: Appease MSC16. · 41c409ce

NAKAMURA Takumi authored Jan 11, 2014

Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001

41c409ce

[anyregcc] Fix callee-save mask for anyregcc · 976d94b8

Juergen Ributzka authored Jan 11, 2014

Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
select the proper mask depending on the target cpu we compile for.

llvm-svn: 198985

976d94b8

Revert r198979 - accidental commit. · 942f22c4
Eric Christopher authored Jan 11, 2014
```
llvm-svn: 198981
```
942f22c4
Reformat. · ceec7b02
Eric Christopher authored Jan 11, 2014
```
llvm-svn: 198980
```
ceec7b02
Update function name and add some helpful comments. · 67cde9ac
Eric Christopher authored Jan 11, 2014
```
llvm-svn: 198979
```
67cde9ac

Extend and simplify the sample profile input file. · 9518b63b

Diego Novillo authored Jan 10, 2014

1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973

9518b63b

Propagation of profile samples through the CFG. · 0accb3d2

Diego Novillo authored Jan 10, 2014

This adds a propagation heuristic to convert instruction samples
into branch weights. It implements a similar heuristic to the one
implemented by Dehao Chen on GCC.

The propagation proceeds in 3 phases:

1- Assignment of block weights. All the basic blocks in the function
   are initial assigned the same weight as their most frequently
   executed instruction.

2- Creation of equivalence classes. Since samples may be missing from
   blocks, we can fill in the gaps by setting the weights of all the
   blocks in the same equivalence class to the same weight. To compute
   the concept of equivalence, we use dominance and loop information.
   Two blocks B1 and B2 are in the same equivalence class if B1
   dominates B2, B2 post-dominates B1 and both are in the same loop.

3- Propagation of block weights into edges. This uses a simple
   propagation heuristic. The following rules are applied to every
   block B in the CFG:

   - If B has a single predecessor/successor, then the weight
     of that edge is the weight of the block.

   - If all the edges are known except one, and the weight of the
     block is already known, the weight of the unknown edge will
     be the weight of the block minus the sum of all the known
     edges. If the sum of all the known edges is larger than B's weight,
     we set the unknown edge weight to zero.

   - If there is a self-referential edge, and the weight of the block is
     known, the weight for that edge is set to the weight of the block
     minus the weight of the other incoming edges to that block (if
     known).

Since this propagation is not guaranteed to finalize for every CFG, we
only allow it to proceed for a limited number of iterations (controlled
by -sample-profile-max-propagate-iterations). It currently uses the same
GCC default of 100.

Before propagation starts, the pass builds (for each block) a list of
unique predecessors and successors. This is necessary to handle
identical edges in multiway branches. Since we visit all blocks and all
edges of the CFG, it is cleaner to build these lists once at the start
of the pass.

Finally, the patch fixes the computation of relative line locations.
The profiler emits lines relative to the function header. To discover
it, we traverse the compilation unit looking for the subprogram
corresponding to the function. The line number of that subprogram is the
line where the function begins. That becomes line zero for all the
relative locations.

llvm-svn: 198972

0accb3d2