Commits · c1d1a4816eb11dcdd205e0916801e62b97c81f05 · Roger Ferrer / llvm-epi-0.8

Apr 02, 2013

Add 64-bit shift instructions. · c1d1a481

Jakob Stoklund Olesen authored Apr 02, 2013

SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.

This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.

llvm-svn: 178525

c1d1a481

Add predicates for distinguishing 32-bit and 64-bit modes. · 739d722e

Jakob Stoklund Olesen authored Apr 02, 2013

The 'sparc' architecture produces 32-bit code while 'sparcv9' produces
64-bit code.

It is also possible to run 32-bit code using SPARC v9 instructions with:

  llc -march=sparc -mattr=+v9

llvm-svn: 178524

739d722e

Add support for 64-bit calling convention. · 0b21f35a

Jakob Stoklund Olesen authored Apr 02, 2013

This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.

Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.

llvm-svn: 178523

0b21f35a

Add an I64Regs register class for 64-bit registers. · 5ad3b353

Jakob Stoklund Olesen authored Apr 02, 2013

We are going to use the same registers for 32-bit and 64-bit values, but
in two different register classes. The I64Regs register class has a
larger spill size and alignment.

The addition of an i64 register class confuses TableGen's type
inference, so it is necessary to clarify the type of some immediates and
the G0 register.

In 64-bit mode, pointers are i64 and should use the I64Regs register
class. Implement getPointerRegClass() to dynamically provide the pointer
register class depending on the subtarget. Use ptr_rc and iPTR for
memory operands.

Finally, add the i64 type to the IntRegs register class. This register
class is not used to hold i64 values, I64Regs is for that. The type is
required to appease TableGen's type checking in output patterns like this:

  def : Pat<(add i64:$a, i64:$b), (ADDrr $a, $b)>;

SPARC v9 uses the same ADDrr instruction for i32 and i64 additions, and
TableGen doesn't know to check the type of register sub-classes.

llvm-svn: 178522

5ad3b353

Fix typo in PPCISelLowering · 93d75ea0

Hal Finkel authored Apr 02, 2013

Thanks to Bill Schmidt for finding this in review of r178480.

llvm-svn: 178521

93d75ea0

The divide unit is not pipeline, but it is still buffered. · e1d88cfb

Andrew Trick authored Apr 02, 2013

Buffered means a later divide may be executed out-of-order while a
prior divide is sitting (buffered) in a reservation station.

You can tell it's not pipelined, because operations that use it
reserve it for more than one cycle:

def : WriteRes<WriteIDiv, [HWPort0, HWDivider]> {
  let Latency = 25;
  let ResourceCycles = [1, 10];
}

We don't currently distinguish between an unpipeline operation and one
that is split into multiple micro-ops requiring the same unit. Except
that the later may have NumMicroOps > 1 if they also consume
issue/dispatch resources.

llvm-svn: 178519

e1d88cfb

Target/R600: Fix CMake build to add missing files. · fd98f7f2
NAKAMURA Takumi authored Apr 01, 2013
```
llvm-svn: 178508
```
fd98f7f2

Apr 01, 2013

Mips direct object exception handling regression · 9423f507

Jack Carter authored Apr 01, 2013

Revision 177141 caused a regression in all but
mips64 little endian. That is because none of the
other Mips targets had test cases checking the 
contents of the .eh_frame section. This patch fixes
both the llvm code and adds an assembler test case 
to include the current 4 flavors.

The test cases unfortunately rely on llvm-objdump. A
preferable method would be to use a pretty printer output
such as what readelf -wf <elf_file> would give.

I also changed the name of the test case to correct a typo.

llvm-svn: 178506

9423f507

R600: Add support for native control flow · bfaa63a6
Vincent Lejeune authored Apr 01, 2013
```
llvm-svn: 178505
```
bfaa63a6
R600/SI: Share code recording ShaderTypeAttribute between generations · ace6f735
Vincent Lejeune authored Apr 01, 2013
```
llvm-svn: 178504
```
ace6f735
R600: Emit CF_ALU and use true kcache register. · f43bc57b
Vincent Lejeune authored Apr 01, 2013
```
llvm-svn: 178503
```
f43bc57b
Fix top-comment header and some indentation · e60fc2f6
Eli Bendersky authored Apr 01, 2013
```
llvm-svn: 178492
```
e60fc2f6
Fix a bad assert in PPCTargetLowering · 3f88d089
Hal Finkel authored Apr 01, 2013
```
llvm-svn: 178489
```
3f88d089
Correct assertion condition · 6662fd0f
Shuxin Yang authored Apr 01, 2013
```
llvm-svn: 178484
```
6662fd0f

Merge load/store sequences with adresses: base + index + offset · 6752366e

Arnold Schwaighofer authored Apr 01, 2013

We would also like to merge sequences that involve a variable index like in the
example below.

    int index = *idx++
    int i0 = c[index+0];
    int i1 = c[index+1];
    b[0] = i0;
    b[1] = i1;

By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.

The dag for the code above will look something like:

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i8 load %index))))

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))

The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (add (i8 load %index)
                                     (i8 1))))
 vs

 (load (i64 add (i64 copyfromreg %c)
                (i64 signextend (i32 add (i32 signextend (i8 load %index))
                                         (i32 1)))))
radar://13536387

llvm-svn: 178483

6752366e

Add more PPC floating-point conversion instructions · f6d45f23

Hal Finkel authored Apr 01, 2013

The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).

llvm-svn: 178480

f6d45f23

Use ImmToIdxMap.count in PPCRegisterInfo · 39caf9f5

Hal Finkel authored Apr 01, 2013

Code improvement suggested by Jakob (in review of r178450). No functionality
change intended.

llvm-svn: 178473

39caf9f5

Add the PPC popcntw instruction · 290376dd

Hal Finkel authored Apr 01, 2013

The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).

llvm-svn: 178470

290376dd

Add support for vector data types in the LLVM interpreter. · be79a7ac
Nadav Rotem authored Apr 01, 2013
```
Patch by:
Veselov, Yuri <Yuri.Veselov@intel.com>

llvm-svn: 178469
```
be79a7ac

Treat PPCISD::STFIWX like the memory opcode that it is · 60c75107

Hal Finkel authored Apr 01, 2013

PPCISD::STFIWX is really a memory opcode, and so it should come after
FIRST_TARGET_MEMORY_OPCODE, and we should use DAG.getMemIntrinsicNode to create
nodes using it.

No functionality change intended (although there could be optimization benefits
from preserving the MMO information).

llvm-svn: 178468

60c75107

Remove unused typedef. · fee96f83
Duncan Sands authored Apr 01, 2013
```
llvm-svn: 178462
```
fee96f83

ARM Scheduler Model: Add resources instructions, map resources in subtargets · 6793aebb

Arnold Schwaighofer authored Apr 01, 2013

Reapply r177968:
After commit 178074 we can now have undefined scheduler variants.

Move the CortexA9 resources into the CortexA9 SchedModel namespace. Define
resource mappings under the CortexA9 SchedModel. Define resources and mappings
for the SwiftModel.

Incooperate Andrew's feedback.

llvm-svn: 178460

6793aebb

X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. · 52ceb443
Benjamin Kramer authored Apr 01, 2013
```
llvm-svn: 178459
```
52ceb443
Whitespace cleanup · bc6f4bae
Joe Abbey authored Apr 01, 2013
```
llvm-svn: 178454
```
bc6f4bae

Mar 31, 2013

R600: Emit native instructions for tex · 53f3525d
Vincent Lejeune authored Mar 31, 2013
```
llvm-svn: 178452
```
53f3525d
There is no longer any need to silence this compiler warning as the warning has · e1aa194a
Duncan Sands authored Mar 31, 2013
```
been turned off globally.

llvm-svn: 178451
```
e1aa194a

Cleanup ImmToIdxMap and noImmForm in PPCRegisterInfo · 8540f777

Hal Finkel authored Mar 31, 2013

ImmToIdxMap should be a DenseMap (not a std::map) because there
is no ordering requirement. Also, we don't need a separate list
of instructions for noImmForm in eliminateFrameIndex, because this
list is essentially the complement of the keys in ImmToIdxMap.

No functionality change intended.

llvm-svn: 178450

8540f777

X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available. · b60633fb
Benjamin Kramer authored Mar 31, 2013
```
A vector sext + sitofp is a lot cheaper than 8 scalar conversions.

llvm-svn: 178448
```
b60633fb

Add the PPC lfiwax instruction · beb296be

Hal Finkel authored Mar 31, 2013

This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).

llvm-svn: 178446

beb296be

Cleanup PPC(64) i32 -> float/double conversion · e53429a1

Hal Finkel authored Mar 31, 2013

The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode.  Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

llvm-svn: 178438

e53429a1

Mar 30, 2013

DAGCombine: visitXOR can replace a node without returning it, bail out in that case. · 93354432
Benjamin Kramer authored Mar 30, 2013
```
Fixes the crash reported in PR15608.

llvm-svn: 178429
```
93354432

Change '@SECREL' suffix to GAS-compatible '@SECREL32'. · 9c9e0a2c

Benjamin Kramer authored Mar 30, 2013

'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).

Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429

llvm-svn: 178427

9c9e0a2c

Put private class into an anonmyous namespace. · a73cc5ee
Benjamin Kramer authored Mar 30, 2013
```
llvm-svn: 178420
```
a73cc5ee
[NVPTX] Remove support for SM < 2.0. This was never fully supported anyway. · 59fd8ba5
Justin Holewinski authored Mar 30, 2013
```
llvm-svn: 178417
```
59fd8ba5

[NVPTX] Add NVVMReflect pass to allow compile-time selection of · b94bd05b

Justin Holewinski authored Mar 30, 2013

specific code paths.

This allows us to write code like:

  if (__nvvm_reflect("FOO"))
    // Do something
  else
    // Do something else

and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.

llvm-svn: 178416

b94bd05b

[NVPTX] Run clang-format on all NVPTX sources. · 0497ab14

Justin Holewinski authored Mar 30, 2013

Hopefully this resolves any outstanding style issues and gives us
an automated way of ensuring we conform to the style guidelines.

llvm-svn: 178415

0497ab14

Implement XOR reassociation. It is based on following rules: · 7b0c94e2

Shuxin Yang authored Mar 30, 2013

  rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409

7b0c94e2

[mips] Add patterns for DSP indexed load instructions. · b3c1847b
Akira Hatanaka authored Mar 30, 2013
```
llvm-svn: 178408
```
b3c1847b
[mips] Define reg+imm load/store pattern templates. · b1457304
Akira Hatanaka authored Mar 30, 2013
```
llvm-svn: 178407
```
b1457304

[mips] Fix DSP instructions to have explicit accumulator register operands. · fb221c19

Akira Hatanaka authored Mar 30, 2013

Check that instruction selection can select multiply-add/sub DSP instructions
from a pattern that doesn't have intrinsics.

llvm-svn: 178406

fb221c19