Commits · 05bcde6d9a79c2356a25bde8d6d9d63794bc6029 · Roger Ferrer / llvm-epi-0.8

Sep 16, 2013

This patch implements Mips load/store instructions from/to coprocessor 2. Test cases are added. · 05bcde6d
Vladimir Medic authored Sep 16, 2013
```
llvm-svn: 190780
```
05bcde6d
ARM: Deduplicate ConstantPoolValues. · 2ef689ca
Benjamin Kramer authored Sep 16, 2013
```
llvm-svn: 190779
```
2ef689ca

[SystemZ] Improve extload handling · 109a7c6f

Richard Sandiford authored Sep 16, 2013

The port originally had special patterns for extload, mapping them to the
same instructions as sextload.  It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".

This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.

llvm-svn: 190777

109a7c6f

Make F16C feature flag imply AVX rather than just checking both at the patterns. · a6d204ec
Craig Topper authored Sep 16, 2013
```
llvm-svn: 190775
```
a6d204ec

PPC: Don't restrict lvsl generation to after type legalization · 40c34781

Hal Finkel authored Sep 15, 2013

This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190771

40c34781

Replace some unnecessary vector copies with references. · 7d605268
Benjamin Kramer authored Sep 15, 2013
```
llvm-svn: 190770
```
7d605268

Sep 15, 2013

Revert r190764: PPC: Don't restrict lvsl generation to after type legalization · 31025a63

Hal Finkel authored Sep 15, 2013

This is causing test-suite failures.

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190765

31025a63

PPC: Don't restrict lvsl generation to after type legalization · 2945d4e9

Hal Finkel authored Sep 15, 2013

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190764

2945d4e9

Expand the mask capability for deciding which functions are mips16 and mips32 · 65553152
Reed Kotler authored Sep 15, 2013
```
so it can be better used for general interoperability testing between mips32
and mips16.

llvm-svn: 190762
```
65553152

Sep 14, 2013
- Add the remaining Intel SHA instructions · 8eb45a4e
  Ben Langmuir authored Sep 14, 2013
```
Also assembly/disassembly tests, and for sha256rnds2, aliases with an explicit
xmm0 dependency.

llvm-svn: 190754
```
  8eb45a4e
- Fix spelling. · 516be56f
  Robert Wilhelm authored Sep 14, 2013
```
llvm-svn: 190749
```
  516be56f
- Fixed bug when generating Load Upper Immediate microMIPS instruction. · fc26cfcd
  Zoran Jovanovic authored Sep 14, 2013
```
llvm-svn: 190746
```
  fc26cfcd
- Support for microMIPS DIV instructions. · 3671a544
  Zoran Jovanovic authored Sep 14, 2013
```
llvm-svn: 190745
```
  3671a544
- Support for misc microMIPS instructions. · ab852781
  Zoran Jovanovic authored Sep 14, 2013
```
llvm-svn: 190744
```
  ab852781
Sep 13, 2013

Add missing break statement in PPCISelLowering · c3cfbf86
Hal Finkel authored Sep 13, 2013
```
As it turns out, not a problem in practice, but it should be there.

llvm-svn: 190720
```
c3cfbf86

Adds support for Atom Silvermont (SLM) - -march=slm · 3fe264d6

Preston Gurd authored Sep 13, 2013

Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.

Auto detects SLM.

Turns on post RA scheduler when generating code for SLM.

llvm-svn: 190717

3fe264d6

[ARMv8] Change hasV8Fp to hasFPARMv8, and other command line options · ccd04894
Joey Gouly authored Sep 13, 2013
```
to be more consistent.

llvm-svn: 190692
```
ccd04894
[ARMv8] Emit the proper .fpu directive. · 3c0e5567
Joey Gouly authored Sep 13, 2013
```
Patch by Bradley Smith!

llvm-svn: 190683
```
3c0e5567
Test commit to verify that commit access works. · def5d347
Zoran Jovanovic authored Sep 13, 2013
```
llvm-svn: 190676
```
def5d347
[SystemZ] Use getTarget{Insert,Extract}Subreg rather than getMachineNode · d8163208
Richard Sandiford authored Sep 13, 2013
```
Just a clean-up, no behavioral change intended.

llvm-svn: 190673
```
d8163208
[SystemZ] Try to fold shifts into TMxx · 030c1657
Richard Sandiford authored Sep 13, 2013
```
E.g. "SRL %r2, 2; TMLL %r2, 1" => "TMLL %r2, 4".

llvm-svn: 190672
```
030c1657

AArch64: use RegisterOperand for NEON registers. · 635a9790

Tim Northover authored Sep 13, 2013

Previously we modelled VPR128 and VPR64 as essentially identical
register-classes containing V0-V31 (which had Q0-Q31 as "sub_alias"
sub-registers). This model is starting to cause significant problems
for code generation, particularly writing EXTRACT/INSERT_SUBREG
patterns for converting between the two.

The change here switches to classifying VPR64 & VPR128 as
RegisterOperands, which are essentially aliases for RegisterClasses
with different parsing and printing behaviour. This fits almost
exactly with their real status (VPR128 == FPR128 printed strangely,
VPR64 == FPR64 printed strangely).

llvm-svn: 190665

635a9790

Move operator to end of previous line to match coding standards. · 21a916b6
Craig Topper authored Sep 13, 2013
```
llvm-svn: 190659
```
21a916b6
R600: Move clamp handling code to R600IselLowering.cpp · 0167a313
Vincent Lejeune authored Sep 12, 2013
```
llvm-svn: 190645
```
0167a313
R600: Move code handling literal folding into R600ISelLowering. · 9a248e5c
Vincent Lejeune authored Sep 12, 2013
```
llvm-svn: 190644
```
9a248e5c
R600: Move fabs/fneg/sel folding logic into PostProcessIsel · ab3baf80
Vincent Lejeune authored Sep 12, 2013
```
This move makes possible to correctly handle multiples instructions
from a single pattern.

llvm-svn: 190643
```
ab3baf80
Remove an unused variable, fixing -Werror build with latest Clang. · 51428e36
Chandler Carruth authored Sep 12, 2013
```
llvm-svn: 190640
```
51428e36

Fix PPC ABI for ByVal structs with vector members · 262a2247

Hal Finkel authored Sep 12, 2013

When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).

This should resolve PR16641.

llvm-svn: 190636

262a2247

Sep 12, 2013

Make the PPC fast-math sqrt expansion safe at 0 · 1e2e3ea5

Hal Finkel authored Sep 12, 2013

In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.

llvm-svn: 190624

1e2e3ea5

Implement asm support for a few PowerPC bookIII that are needed for assembling · 62cb6354
Roman Divacky authored Sep 12, 2013
```
FreeBSD kernel.

llvm-svn: 190618
```
62cb6354

Partial support for Intel SHA Extensions (sha1rnds4) · 1650175d

Ben Langmuir authored Sep 12, 2013

Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.

Support for the remaining instructions will follow in a separate patch.

llvm-svn: 190611

1650175d

Mark PPC MFTB and DST (and friends) as deprecated · 0096dbd5

Hal Finkel authored Sep 12, 2013

Use the new instruction deprecation feature to mark mftb (now replaced with
mfspr) and dst (along with the other Altivec cache control instructions) as
deprecated when targeting cores supporting at least ISA v2.03.

llvm-svn: 190605

0096dbd5

Add an instruction deprecation feature to TableGen. · 0e76fa7d

Joey Gouly authored Sep 12, 2013

The 'Deprecated' class allows you to specify a SubtargetFeature that the
instruction is deprecated on.

The 'ComplexDeprecationPredicate' class allows you to define a custom
predicate that is called to check for deprecation.
For example:
  ComplexDeprecationPredicate<"MCR">

would mean you would have to define the following function:
  bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI,
                             std::string &Info)

Which returns 'false' for not deprecated, and 'true' for deprecated
and store the warning message in 'Info'.

The MCTargetAsmParser constructor was chaned to take an extra argument of
the MCInstrInfo class, so out-of-tree targets will need to be changed.

llvm-svn: 190598

0e76fa7d

AVX-512: implemented extractelement with variable index. · 8952974e
Elena Demikhovsky authored Sep 12, 2013
```
Added parsing of mask register and "zeroing" semantic, like {%k1} {z}.

llvm-svn: 190595
```
8952974e

PPC: Enable aggressive anti-dependency breaking · 7fe6a539

Hal Finkel authored Sep 12, 2013

Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.

Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.

llvm-svn: 190587

7fe6a539

R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback · afcf12f3

Tom Stellard authored Sep 12, 2013



For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.

The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.

The maximum number of input SGPRs is bumped to 17.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575

afcf12f3

R600: Don't use trans slot for instructions that read LDS source registers · 7f6fa4c4

Tom Stellard authored Sep 12, 2013

This fixes some regressions in the piglit local memory store tests
introduced by recent commits which made the scheduler aware of the trans
slot.

It's not possible to test this using lit, because there is no way to
determine from the assembly dumps whether or not an instruction is in
the trans slot.

Even if this were possible, the test would be highly sensitive to
changes in the scheduler and might generate confusing false negatives.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 190574

7f6fa4c4

Greatly simplify the PPC A2 scheduling itinerary · f574c277

Hal Finkel authored Sep 11, 2013

As Andy pointed out to me a long time ago, there are no structural hazards in
the later pipeline stages of the A2, and so modeling them is useless. Also,
modeling the top pre-dispatch stages is deceiving because, when multiple
hardware threads are active, those resources are shared among the threads. The
bypass definitions were mostly wrong, and so those have been removed. The
resulting itinerary is much simpler, and more accurate.

llvm-svn: 190562

f574c277

Enable MI scheduling (and CodeGen AA) by default for embedded PPC cores · 21442b24

Hal Finkel authored Sep 11, 2013

For embedded PPC cores (especially the A2 core), using the MI scheduler with AA
is far superior to the other scheduling options.

llvm-svn: 190558

21442b24

Sep 11, 2013
- Use the appropriate return type for the compact unwind encoding. · 7b650a75
  Bill Wendling authored Sep 11, 2013
```
llvm-svn: 190551
```
  7b650a75