Commits · cd4d9ac62adc1424d4c952fac1c3e258c9b76a08 · Roger Ferrer / llvm-epi-0.8

Jan 12, 2014

[Sparc] Add support for parsing floating point instructions. · cd4d9ac6
Venkatraman Govindaraju authored Jan 12, 2014
```
llvm-svn: 199033
```
cd4d9ac6

ARM: change implicit immediate forms of {ld,st}r{,b}t to psuedo-instructions · fb3950ec

Saleem Abdulrasool authored Jan 12, 2014

The implicit immediate 0 forms are assembly aliases, not distinct instruction
encodings.  Fix the initial implementation introduced in r198914 to an alias to
avoid two separate instruction definitions for the same encoding.

An InstAlias is insufficient in this case as the necessary due to the need to
add a new additional operand for the implicit zero.  By using the AsmPsuedoInst,
fall back to the C++ code to transform the instruction to the equivalent
_POST_IMM form, inserting the additional implicit immediate 0.

llvm-svn: 199032

fb3950ec

The SPARCv9 ABI returns a float in %f0. · e7084a1c

Jakob Stoklund Olesen authored Jan 12, 2014

This is different from the argument passing convention which puts the
first float argument in %f1.

With this patch, all returned floats are treated as if the 'inreg' flag
were set. This means multiple float return values get packed in %f0,
%f1, %f2, ...

Note that when returning a struct in registers, clang will set the
'inreg' flag on the return value, so that behavior is unchanged. This
also happens when returning a float _Complex.

llvm-svn: 199028

e7084a1c

Typo · 4bde0302
Joerg Sonnenberger authored Jan 12, 2014
```
llvm-svn: 199027
```
4bde0302
Add missing mul aliases for armv4 support. Add checks that armv4 can · 485f00fe
Joerg Sonnenberger authored Jan 12, 2014
```
assemble the various mul instructions.

llvm-svn: 199026
```
485f00fe

Switch-to-lookup tables: Don't require a result for the default · ac114a3c

Hans Wennborg authored Jan 12, 2014

case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025

ac114a3c

[Sparc] Add missing processor types: v7 and niagara · a66b314c
Venkatraman Govindaraju authored Jan 11, 2014
```
llvm-svn: 199024
```
a66b314c

ARM IAS: support emitting constant values in target expressions · 2d48edec

Saleem Abdulrasool authored Jan 11, 2014

A 32-bit immediate value can be formed from a constant expression and loaded
into a register.  Add support to emit this into an object file.  Because this
value is a constant, a relocation must *not* be produced for it.

llvm-svn: 199023

2d48edec

Jan 11, 2014

Fix broken CHECK lines. · c10563d1
Benjamin Kramer authored Jan 11, 2014
```
llvm-svn: 199016
```
c10563d1
[Sparc] Bundle instruction with delay slow and its filler. Now, we can use... · 0653218b
Venkatraman Govindaraju authored Jan 11, 2014
```
[Sparc] Bundle instruction with delay slow and its filler. Now, we can use -verify-machineinstrs with SPARC backend.

llvm-svn: 199014
```
0653218b
[PM] Actually nest pass managers correctly when parsing the pass · 258dbb3b
Chandler Carruth authored Jan 11, 2014
```
pipeline string. Add tests that cover this now that we have execution
dumping in the pass managers.

llvm-svn: 199005
```
258dbb3b
llvm/test/Transforms/SampleProfile/syntax.ll: Eliminate locale-sensitive message check. · a64d0bcc
NAKAMURA Takumi authored Jan 11, 2014
```
llvm-svn: 199000
```
a64d0bcc
llvm/test/CodeGen/X86/anyregcc.ll: Add explicit -mtriple=x86_64-unknown-unknown. · 80a474c1
NAKAMURA Takumi authored Jan 11, 2014
```
XMM(s) are really spilling for targeting Win64.

llvm-svn: 198999
```
80a474c1

[PM] Add (very skeletal) support to opt for running the new pass · 66445382

Chandler Carruth authored Jan 11, 2014

manager. I cannot emphasize enough that this is a WIP. =] I expect it
to change a great deal as things stabilize, but I think its really
important to get *some* functionality here so that the infrastructure
can be tested more traditionally from the commandline.

The current design is looking something like this:

  ./bin/opt -passes='module(pass_a,pass_b,function(pass_c,pass_d))'

So rather than custom-parsed flags, there is a single flag with a string
argument that is parsed into the pass pipeline structure. This makes it
really easy to have nice structural properties that are very explicit.
There is one obvious and important shortcut. You can start off the
pipeline with a pass, and the minimal context of pass managers will be
built around the entire specified pipeline. This makes the common case
for tests super easy:

  ./bin/opt -passes=instcombine,sroa,gvn

But this won't introduce any of the complexity of the fully inferred old
system -- we only ever do this for the *entire* argument, and we only
look at the first pass. If the other passes don't fit in the pass
manager selected it is a hard error.

The other interesting aspect here is that I'm not relying on any
registration facilities. Such facilities may be unavoidable for
supporting plugins, but I have alternative ideas for plugins that I'd
like to try first. My plan is essentially to build everything without
registration until we hit an absolute requirement.

Instead of registration of pass names, there will be a library dedicated
to parsing pass names and the pass pipeline strings described above.
Currently, this is directly embedded into opt for simplicity as it is
very early, but I plan to eventually pull this into a library that opt,
bugpoint, and even Clang can depend on. It should end up as a good home
for things like the existing PassManagerBuilder as well.

There are a bunch of FIXMEs in the code for the parts of this that are
just stubbed out to make the patch more incremental. A quick list of
what's coming up directly after this:
- Support for function passes and building the structured nesting.
- Support for printing the pass structure, and FileCheck tests of all of
  this code.
- The .def-file based pass name parsing.
- IR priting passes and the corresponding tests.

Some obvious things that I'm not going to do right now, but am
definitely planning on as the pass manager work gets a bit further:
- Pull the parsing into library, including the builders.
- Thread the rest of the target stuff into the new pass manager.
- Wire support for the new pass manager up to llc.
- Plugin support.

Some things that I'd like to have, but are significantly lower on my
priority list. I'll get to these eventually, but they may also be places
where others want to contribute:
- Adding nice error reporting for broken pass pipeline descriptions.
- Typo-correction for pass names.

llvm-svn: 198998

66445382

[anyregcc] Fix callee-save mask for anyregcc · 976d94b8

Juergen Ributzka authored Jan 11, 2014

Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
select the proper mask depending on the target cpu we compile for.

llvm-svn: 198985

976d94b8

Extend and simplify the sample profile input file. · 9518b63b

Diego Novillo authored Jan 10, 2014

1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973

9518b63b

Propagation of profile samples through the CFG. · 0accb3d2

Diego Novillo authored Jan 10, 2014

This adds a propagation heuristic to convert instruction samples
into branch weights. It implements a similar heuristic to the one
implemented by Dehao Chen on GCC.

The propagation proceeds in 3 phases:

1- Assignment of block weights. All the basic blocks in the function
   are initial assigned the same weight as their most frequently
   executed instruction.

2- Creation of equivalence classes. Since samples may be missing from
   blocks, we can fill in the gaps by setting the weights of all the
   blocks in the same equivalence class to the same weight. To compute
   the concept of equivalence, we use dominance and loop information.
   Two blocks B1 and B2 are in the same equivalence class if B1
   dominates B2, B2 post-dominates B1 and both are in the same loop.

3- Propagation of block weights into edges. This uses a simple
   propagation heuristic. The following rules are applied to every
   block B in the CFG:

   - If B has a single predecessor/successor, then the weight
     of that edge is the weight of the block.

   - If all the edges are known except one, and the weight of the
     block is already known, the weight of the unknown edge will
     be the weight of the block minus the sum of all the known
     edges. If the sum of all the known edges is larger than B's weight,
     we set the unknown edge weight to zero.

   - If there is a self-referential edge, and the weight of the block is
     known, the weight for that edge is set to the weight of the block
     minus the weight of the other incoming edges to that block (if
     known).

Since this propagation is not guaranteed to finalize for every CFG, we
only allow it to proceed for a limited number of iterations (controlled
by -sample-profile-max-propagate-iterations). It currently uses the same
GCC default of 100.

Before propagation starts, the pass builds (for each block) a list of
unique predecessors and successors. This is necessary to handle
identical edges in multiway branches. Since we visit all blocks and all
edges of the CFG, it is cleaner to build these lists once at the start
of the pass.

Finally, the patch fixes the computation of relative line locations.
The profiler emits lines relative to the function header. To discover
it, we traverse the compilation unit looking for the subprogram
corresponding to the function. The line number of that subprogram is the
line where the function begins. That becomes line zero for all the
relative locations.

llvm-svn: 198972

0accb3d2

Jan 10, 2014

LoopVectorizer: Handle strided memory accesses by versioning · c2e9d759

Arnold Schwaighofer authored Jan 10, 2014

 for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950

c2e9d759

Amending test/MC/ARM/thumb2-mclass.s to match its apparent original purpose... · 4e62c0b2

Artyom Skrobov authored Jan 10, 2014

Amending test/MC/ARM/thumb2-mclass.s to match its apparent original purpose (to test the ARMv6M/ARMv7M commonality), and creating a new test case for the differences between ARMv6M and ARMv7M

llvm-svn: 198946

4e62c0b2

Must not produce Tag_CPU_arch_profile for pre-ARMv7 cores (e.g. cortex-m0) · 4d91d944
Artyom Skrobov authored Jan 10, 2014
```
llvm-svn: 198945
```
4d91d944

ARM: fix regression caused by r198914 · b16c09f2

Saleem Abdulrasool authored Jan 10, 2014

The disassembler would no longer be able to disambiguage between the two
variants (explicit immediate #0 vs implicit, omitted #0) for the ldrt, strt,
ldrbt, strbt mnemonics as both versions indicated the disassembler routine.

llvm-svn: 198944

b16c09f2

Make sure -use-init-array has intended effect on all AArch64 ELF targets, not just linux. · 58306ad9
Kristof Beyls authored Jan 10, 2014
```
llvm-svn: 198937
```
58306ad9

llvm/test/ExecutionEngine/MCJIT/load-object-a.ll: Remove "REQUIRES:shell".... · d38ac746

NAKAMURA Takumi authored Jan 10, 2014

llvm/test/ExecutionEngine/MCJIT/load-object-a.ll: Remove "REQUIRES:shell". This doesn't depend on shell's behavior.

llvm-svn: 198931

d38ac746

llvm/test/ExecutionEngine/MCJIT/lit.local.cfg: Add "AMD64" in the host_arch list. · 566080cc
NAKAMURA Takumi authored Jan 10, 2014
```
FIXME: We should not take CMake's ${CMAKE_SYSTEM_PROCESSOR}...
llvm-svn: 198930
```
566080cc
llvm/test/ExecutionEngine/MCJIT/load-object-a.ll: Fix not to use %t.cachedir/%p. · 52f9d381
NAKAMURA Takumi authored Jan 10, 2014
```
%p is like X:\foo\bar.

llvm-svn: 198926
```
52f9d381

ARM IAS: support #:{lower,upper}16: for GNU compatibility · 435f4565

Saleem Abdulrasool authored Jan 10, 2014

The GNU assembler supports prefixing the expression with a '#' to indiciate that
the value that is being moved is infact a constant.  This improves the
compatibility of the integrated assembler's parser for this.

llvm-svn: 198916

435f4565

ARM IAS: support GNU extension for ldrd, strd · e6e6d714

Saleem Abdulrasool authored Jan 10, 2014

The GNU assembler has an extension that allows for the elision of the paired
register (dt2) for the LDRD and STRD mnemonics.  Add support for this in the
assembly parser.  Canonicalise the usage during the instruction parsing from
the specified version.

llvm-svn: 198915

e6e6d714

ARM IAS: support implicit immediate 0s for {LD,ST}R{B,}T · 5bfefb6a

Saleem Abdulrasool authored Jan 10, 2014

The ARM ARM indicates the mnemonics as follows:

  ldrbt{<c>}{<q>} <Rt>, [<Rn>], {, #+/-<imm>}
  ldrt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}
  strbt{<c>}{<q>} <Rt>, [<Rn>] {, #<imm>}
  strt{<c>}{<q>} <Rt>, [<Rn>] {, #+/-<imm>}

This improves the parser to deal with the implicit immediate 0 for the mnemonics
as per the specification.

Thanks to Joerg Sonnenberger for the tests!

llvm-svn: 198914

5bfefb6a

[Sparc] Emit retl/ret instead of jmp instruction. It improves the readability... · ad40dfcb
Venkatraman Govindaraju authored Jan 10, 2014
```
[Sparc] Emit retl/ret instead of jmp instruction. It improves the readability of the assembly generated.

llvm-svn: 198910
```
ad40dfcb
[Sparc] Add support for parsing jmpl instruction and make indirect call and... · 0d288d31
Venkatraman Govindaraju authored Jan 10, 2014
```
[Sparc] Add support for parsing jmpl instruction and make indirect call and jmp instructions as aliases to jmpl.

llvm-svn: 198909
```
0d288d31

Revert "Revert r198851, "Prototype of skeleton type units for fission"" · 15ed5ebf

David Blaikie authored Jan 10, 2014

This reverts commit r198865 which reverts r198851.

ASan identified a use-of-uninitialized of the DwarfTypeUnit::Ty variable
in skeleton type units.

llvm-svn: 198908

15ed5ebf

Fix a bug with the ARM thumb2 CBNZ and CBNZ instructions that · 9bd296ab

Kevin Enderby authored Jan 10, 2014

branch to the next instruction.  This can not be encoded but can be
turned into a NOP.

rdar://15062072

llvm-svn: 198904

9bd296ab

Jan 09, 2014

Revert r198851, "Prototype of skeleton type units for fission" · c5bf5729
NAKAMURA Takumi authored Jan 09, 2014
```
It caused undefined behavior. DwarfTypeUnit::Ty might not be initialized properly, I guess.

llvm-svn: 198865
```
c5bf5729

Fixed old typo in ScalarEvolution, that caused wrong SCEVs zext operation. · 431993b5

Stepan Dyatkovskiy authored Jan 09, 2014

Detailed description is here:
http://llvm.org/bugs/show_bug.cgi?id=18000#c16

For participation in bugfix process special thanks to David Wiberg.

llvm-svn: 198863

431993b5

[SystemZ] Fix RNSBG bug introduced by r197802 · 3875cb60

Richard Sandiford authored Jan 09, 2014

The zext handling added in r197802 wasn't right for RNSBG.  This patch
restricts it to ROSBG, RXSBG and RISBG.  (The tests for RISBG were added
in r197802 since RISBG was the motivating example.)

llvm-svn: 198862

3875cb60

Handle masked rotate amounts · 15cfc1c3

Richard Sandiford authored Jan 09, 2014

At the moment we expect rotates to have the form:

   (or (shl X, Y), (shr X, Z))

where Y == bitsize(X) - Z or Z == bitsize(X) - Y.  This form means that
the (or ...) is undefined for Y == 0 or Z == 0.  This undefinedness can
be avoided by using Y == (C * bitsize(X) - Z) & (bitsize(X) - 1) or
Z == (C * bitsize(X) - Y) & (bitsize(X) - 1) for any integer C
(including 0, the most natural choice).

llvm-svn: 198861

15cfc1c3

Match the InstCombine form of rotates by X+C · 0f264db3

Richard Sandiford authored Jan 09, 2014

InstCombine converts (sub 32, (add X, C)) into (sub 32-C, X),
so a rotate left of a 32-bit Y by X+C could appear as either:

   (or (shl Y, (add X, C)), (shr Y, (sub 32, (add X, C))))

without InstCombine or:

   (or (shl Y, (add X, C)), (shr Y, (sub 32-C, X)))

with it.

We already matched the first form.  This patch handles the second too.

llvm-svn: 198860

0f264db3

Add an "-object-cache-dir=<string>" option to LLI. This option specifies the · 1ddecc07

Lang Hames authored Jan 09, 2014

root path to which object files managed by the LLIObjectCache instance should be
written. This option defaults to "", in which case objects are cached in the
same directory as the bitcode they are derived from.

The load-object-a.ll test has been rewritten to use this option to support
testing in environments where the test directory is not writable.

llvm-svn: 198852

1ddecc07

Prototype of skeleton type units for fission · a588365d
David Blaikie authored Jan 09, 2014
```
llvm-svn: 198851
```
a588365d

llvm-readobj: address review comments for ARM EHABI printing · 5b060a92

Saleem Abdulrasool authored Jan 09, 2014

Rename bytecode to opcodes to make it more clear.  Change an impossible case to
llvm_unreachable instead.  Avoid allocation of a buffer by modifying the
PrintOpcodes iteration.

llvm-svn: 198848

5b060a92