Commits · caf3d89ff5e6b93bd5dc4d845f9e26e70f121754 · Roger Ferrer / llvm-epi-0.8

Mar 19, 2013
- Annotate a lot of X86InstrInfo.td with SchedRW lists. · caf3d89f
  Jakob Stoklund Olesen authored Mar 19, 2013
```
llvm-svn: 177417
```
  caf3d89f
- [ms-inline asm] Move the size directive asm rewrite into the target specific · 120eefd1
  Chad Rosier authored Mar 19, 2013
```
logic as a QOI cleanup.
rdar://13445327

llvm-svn: 177413
```
  120eefd1
- [ms-inline asm] Avoid emitting a redundant sizing directive, if we've already · 2707d534
  Chad Rosier authored Mar 18, 2013
```
parsed one.  Test case coming shortly.
rdar://13446980

llvm-svn: 177347
```
  2707d534
Mar 18, 2013

Add SchedRW annotations to most of X86InstrSSE.td. · a5158c8f

Jakob Stoklund Olesen authored Mar 18, 2013

We hitch a ride with the existing OpndItins class that was used to add
instruction itinerary classes in the many multiclasses in this file.

Use the link provided by the X86FoldableSchedWrite.Folded to find the
right SchedWrite for folded loads.

llvm-svn: 177326

a5158c8f

Annotate X86 arithmetic instructions with SchedRW lists. · e2289b78

Jakob Stoklund Olesen authored Mar 18, 2013

This new-style scheduling information is going to replace the
instruction iteneraries.

This also serves as a test case for Andy's fix in r177317.

llvm-svn: 177323

e2289b78

TLS support for MinGW targets. · 3e7005f1

Anton Korobeynikov authored Mar 18, 2013

MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available.

Patch by David Nadlinger!

llvm-svn: 177257

3e7005f1

Post process ADC/SBB and use a shorter encoding if they use a sign extended immediate. · 0498b88d
Craig Topper authored Mar 18, 2013
```
llvm-svn: 177243
```
0498b88d
Refactor some duplicated code into helper functions. · 7e9a1cb1
Craig Topper authored Mar 18, 2013
```
llvm-svn: 177242
```
7e9a1cb1

Mar 16, 2013

Add X86 code emitter support AVX encoded MRMDestReg instructions. · 612f7bfa
Craig Topper authored Mar 16, 2013
```
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao.

llvm-svn: 177221
```
612f7bfa

Define more SchedWrites for annotating X86 instructions. · 63bff2eb

Jakob Stoklund Olesen authored Mar 16, 2013

Since almost all X86 instructions can fold loads, use a multiclass to
define register/memory pairs of SchedWrites.

An X86FoldableSchedWrite represents the register version of an
instruction. It holds a reference to the SchedWrite to use when the
instruction folds a load.

This will be used inside multiclasses that define rr and rm instruction
versions together.

llvm-svn: 177210

63bff2eb

Mar 15, 2013
- Silence anonymous type in anonymous union warnings. · 8996c5d4
  Eric Christopher authored Mar 15, 2013
```
llvm-svn: 177135
```
  8996c5d4
- Unaligned loads should use the VMOVUPS opcode. · adfa5eaf
  Nadav Rotem authored Mar 14, 2013
```
llvm-svn: 177130
```
  adfa5eaf
Mar 14, 2013

Prepare for adding InstrSchedModel annotations to X86 instructions. · 71236682

Jakob Stoklund Olesen authored Mar 14, 2013

The new InstrSchedModel is easier to use than the instruction
itineraries. It will be used to model instruction latency and throughput
in modern Intel microarchitectures like Sandy Bridge.

InstrSchedModel should be able to coexist with instruction itinerary
classes, but for cleanliness we should switch the Atom processor model
to the new InstrSchedModel as well.

llvm-svn: 177122

71236682

[fast-isel] The X86FastISel::FastLowerArguments function doesn't properly handle · 4b54f594
Chad Rosier authored Mar 14, 2013
```
the win64 calling convention.
rdar://13423768

llvm-svn: 177113
```
4b54f594
Fix the name of a variable to match its declaration. Fixes build failure from r177014. · ba824298
Craig Topper authored Mar 14, 2013
```
llvm-svn: 177015
```
ba824298

Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit... · 87299973

Craig Topper authored Mar 14, 2013

Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.

llvm-svn: 177014

87299973

Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded... · a66d81d5

Craig Topper authored Mar 14, 2013

Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.

llvm-svn: 177011

a66d81d5

Fix PR15309 · 20d28704
Michael Liao authored Mar 14, 2013
```
- Fix the typo on type checking

llvm-svn: 177010
```
20d28704

Mar 11, 2013
- Fixes disassembler crashes on 2013 Haswell RTM instructions. · f15856eb
  Kevin Enderby authored Mar 11, 2013
```
rdar://13318048

llvm-svn: 176828
```
  f15856eb
Mar 08, 2013

DAGCombiner: Use correct value type for checking legality of BR_CC v3 · b1588fc0

Tom Stellard authored Mar 08, 2013

LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694

b1588fc0

Mar 07, 2013

X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR. · 2c3d0df8

Benjamin Kramer authored Mar 07, 2013

That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634

2c3d0df8

Fix two remaining issue after fixing PR15355 when CMOV is not available · d5cac37d

Michael Liao authored Mar 07, 2013

- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598

d5cac37d

Mar 06, 2013

Fix PR15355 · da22b30b

Michael Liao authored Mar 06, 2013

- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538

da22b30b

Mar 05, 2013

The current X86 NOP padding uses one long NOP followed by the remainder in · 4c8979cd

David Sehr authored Mar 05, 2013

one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.

llvm-svn: 176464

4c8979cd

Mar 04, 2013

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

Mar 02, 2013

X86 cost model: Adjust cost for custom lowered vector multiplies · 20ef54f4

Arnold Schwaighofer authored Mar 02, 2013

This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403

20ef54f4

Mar 01, 2013

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

GCC thinks that this variable might be used uninitialized (it isn't). · 2cb41d37
Duncan Sands authored Mar 01, 2013
```
llvm-svn: 176341
```
2cb41d37

Feb 28, 2013
- Re-format comments (and check commit access) · d4842e5e
  Yiannis Tsiouris authored Feb 28, 2013
```
llvm-svn: 176270
```
  d4842e5e
Feb 27, 2013
- Revert r176166 because it broke one of the lit tests. · 08ab877c
  Nadav Rotem authored Feb 27, 2013
```
llvm-svn: 176171
```
  08ab877c
- std::string to StringRef. · 85e1211f
  Nadav Rotem authored Feb 27, 2013
```
llvm-svn: 176166
```
  85e1211f
Feb 26, 2013
- [fast-isel] Make sure the FastLowerArguments function checks to make sure the · 1b33e8d6
  Chad Rosier authored Feb 26, 2013
```
arguments type is a simple type.
rdar://13290455

llvm-svn: 176066
```
  1b33e8d6
- Refine fix to PR10499, no functionality change · 609a5272
  Michael Liao authored Feb 25, 2013
```
- Put expensive checking after simple one

llvm-svn: 176060
```
  609a5272
- Fix PR10499 · ab976680
  Michael Liao authored Feb 25, 2013
```
- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058
```
  ab976680
Feb 25, 2013

[fast-isel] Add X86FastIsel::FastLowerArguments to handle functions with 6 or · a92ef4ba

Chad Rosier authored Feb 25, 2013

fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.

Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.

llvm-svn: 176052

a92ef4ba

[ms-inline asm] Add support for the pushad/popad mnemonics. · 669bb3ee
Chad Rosier authored Feb 25, 2013
```
rdar://13254235

llvm-svn: 176036
```
669bb3ee

Feb 24, 2013
- Revert r169638 because it broke Mesa llvmpipe tests. · b532fca9
  Nadav Rotem authored Feb 24, 2013
```
Fix PR15239.

llvm-svn: 175985
```
  b532fca9
Feb 23, 2013
- X86: Disable cmov-memory patterns on subtargets without cmov. · ee23dcb4
  Benjamin Kramer authored Feb 23, 2013
```
Fixes PR15115.

llvm-svn: 175962
```
  ee23dcb4
Feb 22, 2013
- x86_64: designate most general purpose and SSE registers as callee save under coldcc · 7b57621f
  Peter Collingbourne authored Feb 22, 2013
```
llvm-svn: 175911
```
  7b57621f
Feb 21, 2013

Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo · 8da87163

Eli Bendersky authored Feb 21, 2013

to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.

There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.

The refactoring was OK'd by Anton Korobeynikov on llvmdev.

Note: this touches the target interfaces, so out-of-tree targets may
be affected.

llvm-svn: 175788

8da87163