Commits · d6432f838ec42488ba4c68d1c36b97eb8e367728 · Roger Ferrer / llvm-epi-0.8

Jan 17, 2014

[mips][sched] Split IIImul and IIImult into subclasses. · e95a137b

Daniel Sanders authored Jan 17, 2014

IIImul -> II_MUL
IIImult -> II_MULT, II_MULTU, II_MADD, II_MADDU, II_MSUB, II_MSUBU, II_DMULT, II_DMULTU

No functional change since the InstrItinData's have been duplicated.

llvm-svn: 199495

e95a137b

[mips][sched] Split IIHiLo into II_MFHI_MFLO and II_MTHI_MTLO · 9342557e
Daniel Sanders authored Jan 17, 2014
```
No functional change since the InstrItinData's have been duplicated.

llvm-svn: 199493
```
9342557e

Add MLA alias for ARMv4 support. · afc43a1c

Renato Golin authored Jan 17, 2014

Fix MLA defs to use register class GPRnopc.
Add encoding tests for multiply instructions.
(Alias for MUL/SMLAL/UMLAL added by r199026.)

Patch by Zhaoshi.

llvm-svn: 199491

afc43a1c

[PM] [cleanup] Rename some of the Verifier's members, re-arrange them, · bf2b652c

Chandler Carruth authored Jan 17, 2014

and tweak comments prior to more invasive surgery. Also clean up some
other non-doxygen comments, and run clang-format over the parts that are
going to change dramatically in subsequent commits so that those don't
get cluttered with formatting changes.

No functionality changed.

llvm-svn: 199489

bf2b652c

[asan] extend asan-coverage (still experimental). · 714c67c3

Kostya Serebryany authored Jan 17, 2014

 - add a mode for collecting per-block coverage (-asan-coverage=2).
   So far the implementation is naive (all blocks are instrumented),
   the performance overhead on top of asan could be as high as 30%.
 - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
   which in turn required to copy the original debug info into the call insn.

Here is the performance data on SPEC 2006
(train data, comparing asan with asan-coverage={0,1,2}):

                             asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
       400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
           401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
             403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
             429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
           445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
           456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
           458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
      462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
         464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
         471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
           473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
       483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
            433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
            444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
          447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
          450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
          453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
             470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
         482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06

llvm-svn: 199488

714c67c3

[PM] Remove the preverifier and directly compute the DominatorTree for · 7677760e

Chandler Carruth authored Jan 17, 2014

the verifier after ensuring the CFG is at least usefully formed.

This fixes a number of problems:
1) The PreVerifier was missing the controls the Verifier provides over
   *how* an invalid module is handled -- it just aborted the program!
   Now it uses the same logic as the Verifier which is significantly
   more library-friendly.
2) The DominatorTree used previously could have been cached and not
   updated due to bugs in prior passes and we would silently use the
   stale tree. This could cause dominance errors to not be as quickly
   diagnosed.
3) We can now (in the next patch) pull the functionality of the verifier
   apart from the pass infrastructure so that you can verify IR without
   having any form of pass manager. This in turn frees the code to share
   logic between old and new pass manager variants.

Along the way I fixed at least one annoying bug -- the state for
'Broken' wasn't being cleared from run to run causing all functions
visited after the first broken function to be marked as broken
regardless of whether *they* were a problem. Fortunately, I don't really
know much of a way to observe this peculiarity.

In case folks are worried about the runtime cost, its negligible.
I looked at running the entire regression test suite (which should be
a relatively good use of the verifier) before and after but was unable
to even measure the time spent on the verifier and there was no
regresion from before to after. I checked both with debug builds and
optimized builds.

llvm-svn: 199487

7677760e

[AArch64 NEON] Expand vector for UDIV/SDIV/UREM/SREM/FREM as neon doesn't support these operations. · e0faea11
Kevin Qin authored Jan 17, 2014
```
llvm-svn: 199485
```
e0faea11
Switch a few instructions to use RI instead I so they don't require REX_W to... · 80ab268b
Craig Topper authored Jan 17, 2014
```
Switch a few instructions to use RI instead I so they don't require REX_W to be explicitly specified.

llvm-svn: 199479
```
80ab268b
Add OpSize16 flags to 32-bit CRC32 instructions so they can be encoded correctly in 16-bit mode. · f124c6a5
Craig Topper authored Jan 17, 2014
```
llvm-svn: 199478
```
f124c6a5
Teach x86 asm parser to handle 'opaque ptr' in Intel syntax. · 2d4b3c97
Craig Topper authored Jan 17, 2014
```
llvm-svn: 199477
```
2d4b3c97
Teach X86 asm parser to understand 'ZMMWORD PTR' in Intel syntax. · 9ac290ad
Craig Topper authored Jan 17, 2014
```
llvm-svn: 199476
```
9ac290ad
Fix intel syntax for 64-bit version of FXSAVE/FXRSTOR to use '64' suffix instead of 'q' · a49c2960
Craig Topper authored Jan 17, 2014
```
llvm-svn: 199474
```
a49c2960

VEX_PREFIX_66 doesn't need to set the hasOpSize flag since VEX instructions... · 5a444969

Craig Topper authored Jan 17, 2014

VEX_PREFIX_66 doesn't need to set the hasOpSize flag since VEX instructions don't use the size fields it controls.

llvm-svn: 199470

5a444969

Replace duplicated code with a existing helper function. · 3cbe1606
Craig Topper authored Jan 17, 2014
```
llvm-svn: 199468
```
3cbe1606

[AArch64]Fix the problem can't select f16_to_f32 and f32_to_f16. · 17457a2e

Hao Liu authored Jan 17, 2014

Also add copy support for FPR16.
Also add a missing test case file belongs to commit r197361.

llvm-svn: 199463

17457a2e

[AArch64 NEON] Custom lower conversion between vector integer and vector... · 212d9b4a

Kevin Qin authored Jan 17, 2014

[AArch64 NEON] Custom lower conversion between vector integer and vector floating point if element bit-width doesn't match.

llvm-svn: 199462

212d9b4a

[AArch64]Fix the problem can't select concat_vectors of two v1i32 types. · 18d92262
Hao Liu authored Jan 17, 2014
```
Also fix the problem can't select scalar_to_vector from f32 to v2f32/v4f32.

llvm-svn: 199461
```
18d92262

Jan 16, 2014

Change inalloca rules to make it only apply to the last parameter · 60d3a835

Reid Kleckner authored Jan 16, 2014

This makes things a lot easier, because we can now talk about the
"argument allocation", which allocates all the memory for the call in
one shot.

The only functional change is to the verifier for a feature that hasn't
shipped yet.

llvm-svn: 199434

60d3a835

[opt][PassInfo] Allow opt to run passes that need target machine. · dc0b2ea2

Quentin Colombet authored Jan 16, 2014

When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.

Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).

Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll

llvm-svn: 199430

dc0b2ea2

Fix two cases where we could lose fast math flags when optimizing FADD expressions. · e7321660
Owen Anderson authored Jan 16, 2014
```
llvm-svn: 199427
```
e7321660
Fix an instance where we would drop fast math flags when performing an fdiv to... · 4557a156
Owen Anderson authored Jan 16, 2014
```
Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation.

llvm-svn: 199425
```
4557a156
Fix a bug in InstCombine where we failed to preserve fast math flags when... · e8537fc7
Owen Anderson authored Jan 16, 2014
```
Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression.

llvm-svn: 199424
```
e8537fc7
llvm-objdump/COFF: Print DLL name in the export table header. · da49d0d4
Rui Ueyama authored Jan 16, 2014
```
llvm-svn: 199422
```
da49d0d4
Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which... · f74cfe03
Owen Anderson authored Jan 16, 2014
```
Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X).

llvm-svn: 199420
```
f74cfe03
Use static instead of anonymous namespace. · 686738e2
Rui Ueyama authored Jan 16, 2014
```
llvm-svn: 199419
```
686738e2
Reduce nesting. · 5efa665f
Rui Ueyama authored Jan 16, 2014
```
llvm-svn: 199418
```
5efa665f
Use the current local variable naming style. · 8ff24d25
Rui Ueyama authored Jan 16, 2014
```
llvm-svn: 199417
```
8ff24d25

Tweak the MCExternalSymbolizer to print references to C string literals · 8f4921c3

Kevin Enderby authored Jan 16, 2014

with raw_ostream's write_escaped() method.

For example darwin's otool(1) program that uses the llvm
disassembler now produces disassembly like this:

leaq	0x7b(%rip), %rdi ## literal pool for: "%f\ntoto\n"

and not print the new lines which messes up the output.

rdar://15145300

llvm-svn: 199407

8f4921c3

[mips][sched] Removed IIXfer. No instructions use it. · 59f99150
Daniel Sanders authored Jan 16, 2014
```
llvm-svn: 199403
```
59f99150

[mips][sched] Put AND, OR, XOR, MOVT_I, and MOVF_I in the same itinerary class... · 4aefdc7b

Daniel Sanders authored Jan 16, 2014

[mips][sched] Put AND, OR, XOR, MOVT_I, and MOVF_I in the same itinerary class as their non-microMIPS counterparts.

No functional change since both classes have the same InstrItinData definition.

llvm-svn: 199402

4aefdc7b

Add an emitRawComment function and use it to simplify some uses of EmitRawText. · 0b694814
Rafael Espindola authored Jan 16, 2014
```
llvm-svn: 199397
```
0b694814
[mips][sched] Split IIseb into II_SEB and II_SEH · 4d20f0c0
Daniel Sanders authored Jan 16, 2014
```
No functional change since there are no InstrItinData's.

llvm-svn: 199396
```
4d20f0c0

[mips][sched] Split IILogic into II_AND, II_OR, II_XOR, II_ANDI, II_ORI, II_XORI · 306ef07b

Daniel Sanders authored Jan 16, 2014

This is necessary because the classes are shared between all implementations.

No functional change since the InstrItinData's have been duplicated.

llvm-svn: 199394

306ef07b

[mips][sched] Split IIArith in preparation for the first scheduler targeting a specific MIPS CPU. · 980589a8

Daniel Sanders authored Jan 16, 2014

IIArith -> II_ADD, II_ADDU, II_AND, II_CL[ZO], II_DADDIU, II_DADDU,
II_DROTR, II_DROTR32, II_DROTRV, II_DSLL, II_DSLL32, II_DSLLV,
II_DSR[AL], II_DSR[AL]32, II_DSR[AL]V, II_DSUBU, II_LUI, II_MOV[ZFNT],
II_NOR, II_OR, II_RDHWR, II_ROTR, II_ROTRV, II_SLL, II_SLLV, II_SR[AL],
II_SR[AL]V, II_SUBU, II_XOR

No functional change since the InstrItinData's have been duplicated.

This is necessary because the classes are shared between all schedulers.

Once this patch series is committed there will be an InstrItinClass for
each mnemonic with minimal grouping. This does increase the size of the
itinerary tables for each MIPS scheduler but we have a few options for dealing
with that later. These options include reducing the number of classes once
we see the best way to simplify them, or by extending tablegen to be able
to compress the table by eliminating duplicates entries, etc.

llvm-svn: 199391

980589a8

[mips] Correct itin class for MULT_MM and MULTu_MM to IIImult. · bfe1830a
Daniel Sanders authored Jan 16, 2014
```
This matches the itin class used by the non-microMIPS equivalents of these
instructions.

llvm-svn: 199389
```
bfe1830a

[mips] IIImult should have an InstrItinData in the generic scheduler. Used the... · 818058b2

Daniel Sanders authored Jan 16, 2014

[mips] IIImult should have an InstrItinData in the generic scheduler. Used the same one as for IIImul.

Affects:
  DMULT, DMULTu, MADD, MADD_MM, MADDU, MADDU_MM, MSUB, MSUB_MM, MSUBU,
  MSUBU_MM, MULT, MULTu

Does not affect MULT_MM, MULTu_MM since they are currently miscategorised
as IIImul.

llvm-svn: 199381

818058b2

ReMat: fix overly cavalier attitude to sub-register indices · 3657cb03

Tim Northover authored Jan 16, 2014

There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:

    v1 = SOMETHING (v1 is QQ)
    v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
    ...
    uses of v2

In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:

    v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
    ...
    uses of v2:Q1_Q2

For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:

    v1 = SOMETHING (v1 is QQ)
    v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
    ...
    uses of v2 as a QQQ

The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).

This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). This situation can be
spotted by the fact that both source and destination in the
not-quite-coalesced pair have a sub-register index and
rematerialisation is skipped in that situation.

Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).

rdar://problem/15775279

llvm-svn: 199376

3657cb03

[asan] Remove -fsanitize-address-zero-base-shadow command line · 13665367

Evgeniy Stepanov authored Jan 16, 2014

flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.

- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.

This is the LLVM part of a larger change.

llvm-svn: 199371

13665367

For ARM, fix assertuib failures for some ld/st 3/4 instruction with wirteback. · 4df2363a
Jiangning Liu authored Jan 16, 2014
```
llvm-svn: 199369
```
4df2363a
AVX-512: fixed a compare pattern · d1487261
Elena Demikhovsky authored Jan 16, 2014
```
llvm-svn: 199366
```
d1487261