Commits · bbd10792c2deefec6625981d0fc08d2560abcb04 · Roger Ferrer / llvm-epi-0.8

Aug 30, 2012

Introduce 'UseSSEx' to force SSE legacy encoding · bbd10792

Michael Liao authored Aug 30, 2012

- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919

bbd10792

PPCISelLowering.cpp: Fix r162725. · ac49029f

NAKAMURA Takumi authored Aug 30, 2012

[Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good!

Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good.

llvm-svn: 162916

ac49029f

PPCISelLowering.cpp: Whitespace. · 8ad54e04
NAKAMURA Takumi authored Aug 30, 2012
```
llvm-svn: 162915
```
8ad54e04
Add support for moving pure S-register to NEON pipeline if desired · ca9f384f
Tim Northover authored Aug 30, 2012
```
llvm-svn: 162898
```
ca9f384f
Only perform DAG combine on FMAs of legal types. · e39ad7b5
Craig Topper authored Aug 30, 2012
```
llvm-svn: 162892
```
e39ad7b5

Fix PR13727 · 3c898064

Michael Liao authored Aug 30, 2012

- The root cause is that target constant materialization in X86 fast-isel
  creates a PC-rel addressing which may overflow 32-bit range in non-Small code
  model if .rodata section is allocated too far away from code segment in
  MCJIT, which uses Large code model so far.
- Follow the similar logic to fix non-Small code model in fast-isel by skipping
  non-Small code model.

llvm-svn: 162881

3c898064

Aug 29, 2012

Rename hasVolatileMemoryRef() to hasOrderedMemoryRef(). · cea3e774

Jakob Stoklund Olesen authored Aug 29, 2012

Ordered memory operations are more constrained than volatile loads and
stores because they must be ordered with respect to all other memory
operations.

llvm-svn: 162861

cea3e774

Reserve space for the mandatory traceback fields on PPC64. · 1859d265

Hal Finkel authored Aug 29, 2012

We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

llvm-svn: 162854

1859d265

Refactor setExecutionDomain to be clearer about what it's doing and more robust. · 771f1607
Tim Northover authored Aug 29, 2012
```
llvm-svn: 162844
```
771f1607
Make helper function static. · 8f5c5ded
Benjamin Kramer authored Aug 29, 2012
```
llvm-svn: 162843
```
8f5c5ded

Make MemoryBuiltins aware of TargetLibraryInfo. · 8bcc9711

Benjamin Kramer authored Aug 29, 2012

This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.

Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.

Fixes PR13694 and probably others.

llvm-svn: 162841

8bcc9711

Convert FMA4 patterns to use target specific nodes instead of intrinsics to align with FMA3. · a999c662
Craig Topper authored Aug 29, 2012
```
llvm-svn: 162829
```
a999c662
Cleanup sloppy code. Jakob's review. · b57e2257
Andrew Trick authored Aug 29, 2012
```
llvm-svn: 162825
```
b57e2257
[arm-fast-isel] Add support for ARM PIC. · e87e559e
Jush Lu authored Aug 29, 2012
```
llvm-svn: 162823
```
e87e559e

Fix ARM vector copies of overlapping register tuples. · bd0073dd

Andrew Trick authored Aug 29, 2012

I have tested the fix, but have not been successfull in generating
a robust unit test. This can only be exposed through particular
register assignments.

llvm-svn: 162821

bd0073dd

cleanup · 4cc6949a
Andrew Trick authored Aug 29, 2012
```
llvm-svn: 162820
```
4cc6949a
Typo. · 3b1336ce
Chad Rosier authored Aug 28, 2012
```
llvm-svn: 162807
```
3b1336ce
Add comments on the literal value used. · 407d659f
Michael Liao authored Aug 28, 2012
```
llvm-svn: 162805
```
407d659f

Aug 28, 2012

The instruction DEXT may be transformed into DEXTU or DEXTM depending · cd6b0e13

Jack Carter authored Aug 28, 2012

on the size of the extraction and its position in the 64 bit word.

This patch allows support of the dext transformations with mips64 direct
object output.

0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32
DINS
The field is entirely contained in the right-most word of the doubleword

32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64
DINSM
The field straddles the words of the doubleword

32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32
DINSU
The field is entirely contained in the left-most word of the doubleword

llvm-svn: 162782

cd6b0e13

Explicitly update the number of nodes to be traversed · 710e1a59
Michael Liao authored Aug 28, 2012
```
llvm-svn: 162780
```
710e1a59

Some instructions are passed to the assembler to be · c20a21b8

Jack Carter authored Aug 28, 2012

transformed to the final instruction variant. An
example would be dsrll which is transformed into 
dsll32 if the shift value is greater than 32.

For direct object output we need to do this transformation
in the codegen. If the instruction was inside branch
delay slot, it was being missed. This patch corrects this
oversight.

llvm-svn: 162779

c20a21b8

Emit word of zeroes after the last instruction as a start of the mandatory · 8c4b6a30

Roman Divacky authored Aug 28, 2012

traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.

llvm-svn: 162778

8c4b6a30

Follow-up patch to r162731. · 206cefe6

Akira Hatanaka authored Aug 28, 2012

Fix a couple of bugs in mips' long branch pass.
This patch was supposed to be committed along with r162731, so I don't have a
new test case.

llvm-svn: 162777

206cefe6

Add PPC Freescale e500mc and e5500 subtargets. · 742b535e

Hal Finkel authored Aug 28, 2012

Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

llvm-svn: 162764

742b535e

The commutative flag is already correctly set within the multiclass. If we set · cc567180
Bill Wendling authored Aug 28, 2012
```
it here, then a 'register-memory' version would wrongly get the commutative
flag.
<rdar://problem/12180135>

llvm-svn: 162741
```
cc567180
Convert V_SETALLONES/AVX_SETALLONES/AVX2_SETALLONES to Post-RA pseudos. · 72f51c39
Craig Topper authored Aug 28, 2012
```
llvm-svn: 162740
```
72f51c39
Merge AVX_SET0PSY/AVX_SET0PDY/AVX2_SET0 into a single post-RA pseudo. · bd509eea
Craig Topper authored Aug 28, 2012
```
llvm-svn: 162738
```
bd509eea

Fix PR12312 · b7d85b63

Michael Liao authored Aug 28, 2012

- Add a target-specific DAG optimization to recognize a pattern PTEST-able.
  Such a pattern is a OR'd tree with X86ISD::OR as the root node. When
  X86ISD::OR node has only its flag result being used as a boolean value and
  all its leaves are extracted from the same vector, it could be folded into an
  X86ISD::PTEST node.

llvm-svn: 162735

b7d85b63

Revert r162713: "Add ATOMIC_LDR* pseudo-instructions to model atomic_load on ARM." · b3de7b17

Jakob Stoklund Olesen authored Aug 28, 2012

This wasn't the right way to enforce ordering of atomics.

We are already setting the isVolatile bit on memory operands of atomic
operations which is good enough to enforce the correct ordering.

llvm-svn: 162732

b3de7b17

Fix mips' long branch pass. · b5af7121

Akira Hatanaka authored Aug 28, 2012

Instructions emitted to compute branch offsets now use immediate operands
instead of symbolic labels. This change was needed because there were problems
when R_MIPS_HI16/LO16 relocations were used to make shared objects.

llvm-svn: 162731

b5af7121

Split several PPC instruction classes. · 679c73cb

Hal Finkel authored Aug 28, 2012

Slight reorganisation of PPC instruction classes for scheduling. No
functionality change for existing subtargets.
 - Clearly separate load/store-with-update instructions from regular loads and stores.
 - Split IntRotateD -> IntRotateD and IntRotateDI
 - Split out fsub and fadd from FPGeneral -> FPAddSub
 - Update existing itineraries

Patch by Tobias von Koch.

llvm-svn: 162729

679c73cb

Allow remat of LI on PPC. · 686f2ee2

Hal Finkel authored Aug 28, 2012

Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

llvm-svn: 162727

686f2ee2

Eliminate redundant CR moves on PPC32. · 5ab37803

Hal Finkel authored Aug 28, 2012

The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

llvm-svn: 162725

5ab37803

Optimize zext on PPC64. · e39526a7

Hal Finkel authored Aug 28, 2012

The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

llvm-svn: 162724

e39526a7

More missing mayLoad flags on AVX multiclasses. · 89d6b29d
Jakob Stoklund Olesen authored Aug 28, 2012
```
llvm-svn: 162714
```
89d6b29d

Add ATOMIC_LDR* pseudo-instructions to model atomic_load on ARM. · b24cb8c5

Jakob Stoklund Olesen authored Aug 27, 2012

It is not safe to use normal LDR instructions because they may be
reordered by the scheduler. The ATOMIC_LDR pseudos have a mayStore flag
that prevents reordering.

Atomic loads are also prevented from participating in rematerialization
and load folding.

llvm-svn: 162713

b24cb8c5

Make sure we add the predicate after all of the registers are added. · 988a47d7
Bill Wendling authored Aug 27, 2012
```
<rdar://problem/12183003>

llvm-svn: 162703
```
988a47d7

Aug 27, 2012
- Remove MMX shift intrinsic handling code that also exists in SelectionDAGBuilder. · a737ef89
  Craig Topper authored Aug 27, 2012
```
llvm-svn: 162661
```
  a737ef89
- Don't allow vextractf128 to be folded with unaligned stores. We don't fold... · 5af2fed5
  Craig Topper authored Aug 27, 2012
```
Don't allow vextractf128 to be folded with unaligned stores. We don't fold unaligned loads so shouldn't fold unaligned stores as it can cause an alignment fault to occur.

llvm-svn: 162658
```
  5af2fed5
- Fold some patterns into instruction definitons so tablegen can infer flags... · 6d44554c
  Craig Topper authored Aug 27, 2012
```
Fold some patterns into instruction definitons so tablegen can infer flags removing the need for an explicit 'neverHasSideEffects = 1'

llvm-svn: 162656
```
  6d44554c