- Mar 19, 2013
-
-
Jakob Stoklund Olesen authored
llvm-svn: 177417
-
Chad Rosier authored
logic as a QOI cleanup. rdar://13445327 llvm-svn: 177413
-
Chad Rosier authored
parsed one. Test case coming shortly. rdar://13446980 llvm-svn: 177347
-
- Mar 18, 2013
-
-
Jakob Stoklund Olesen authored
We hitch a ride with the existing OpndItins class that was used to add instruction itinerary classes in the many multiclasses in this file. Use the link provided by the X86FoldableSchedWrite.Folded to find the right SchedWrite for folded loads. llvm-svn: 177326
-
Jakob Stoklund Olesen authored
This new-style scheduling information is going to replace the instruction iteneraries. This also serves as a test case for Andy's fix in r177317. llvm-svn: 177323
-
Anton Korobeynikov authored
MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available. Patch by David Nadlinger! llvm-svn: 177257
-
Craig Topper authored
llvm-svn: 177243
-
Craig Topper authored
llvm-svn: 177242
-
- Mar 16, 2013
-
-
Craig Topper authored
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao. llvm-svn: 177221
-
Jakob Stoklund Olesen authored
Since almost all X86 instructions can fold loads, use a multiclass to define register/memory pairs of SchedWrites. An X86FoldableSchedWrite represents the register version of an instruction. It holds a reference to the SchedWrite to use when the instruction folds a load. This will be used inside multiclasses that define rr and rm instruction versions together. llvm-svn: 177210
-
- Mar 15, 2013
-
-
Eric Christopher authored
llvm-svn: 177135
-
Nadav Rotem authored
llvm-svn: 177130
-
- Mar 14, 2013
-
-
Jakob Stoklund Olesen authored
The new InstrSchedModel is easier to use than the instruction itineraries. It will be used to model instruction latency and throughput in modern Intel microarchitectures like Sandy Bridge. InstrSchedModel should be able to coexist with instruction itinerary classes, but for cleanliness we should switch the Atom processor model to the new InstrSchedModel as well. llvm-svn: 177122
-
Chad Rosier authored
the win64 calling convention. rdar://13423768 llvm-svn: 177113
-
Craig Topper authored
llvm-svn: 177015
-
Craig Topper authored
Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185. llvm-svn: 177014
-
Craig Topper authored
Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior. llvm-svn: 177011
-
Michael Liao authored
- Fix the typo on type checking llvm-svn: 177010
-
- Mar 11, 2013
-
-
Kevin Enderby authored
rdar://13318048 llvm-svn: 176828
-
- Mar 08, 2013
-
-
Tom Stellard authored
LegalizeDAG.cpp uses the value of the comparison operands when checking the legality of BR_CC, so DAGCombiner should do the same. v2: - Expand more BR_CC value types for NVPTX v3: - Expand correct BR_CC value types for Hexagon, Mips, and XCore. llvm-svn: 176694
-
- Mar 07, 2013
-
-
Benjamin Kramer authored
That can usually be lowered efficiently and is common in sandybridge code. It would be nice to do this in DAGCombiner but we can't insert arbitrary BUILD_VECTORs this late. Fixes PR15462. llvm-svn: 176634
-
Michael Liao authored
- Phi nodes should be replaced/updated after lowering CMOV into branch because 'mainMBB' updating operand in Phi node is changed. - Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as we will reuse the EFLAGS generated before the 1st lowered CMOV, which won't clobber EFLAGS. However, we need explicitly specify that. - '-attr=-cmov' test case are added. llvm-svn: 176598
-
- Mar 06, 2013
-
-
Michael Liao authored
- Clear 'mayStore' flag when loading from the atomic variable before the spin loop - Clear kill flag from one use to multiple use in registers forming the address to that atomic variable - don't use a physical register as live-in register in BB (neither entry nor landing pad.) by copying it into virtual register (patch by Cameron Zwarich) llvm-svn: 176538
-
- Mar 05, 2013
-
-
David Sehr authored
one-byte NOPs. If the processor actually executes those NOPs, as it sometimes does with aligned bundling, this can have a performance impact. From my micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve one-byte NOPs is about 20% worse than a 15 followed by a 12. This patch changes NOP emission to emit as many 15-byte (the maximum) as possible followed by at most one shorter NOP. llvm-svn: 176464
-
- Mar 04, 2013
-
-
Preston Gurd authored
* Only apply divide bypass optimization when not optimizing for size. * Fixed bug caused by constant for 0 value of type Int32, used dividend type to generate the constant instead. * For atom x86-64 apply the divide bypass to use 16-bit divides instead of 64-bit divides when operand values are small enough. * Added lit tests for 64-bit divide bypass. Patch by Tyler Nowicki! llvm-svn: 176442
-
- Mar 02, 2013
-
-
Arnold Schwaighofer authored
This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403
-
- Mar 01, 2013
-
-
Michael Liao authored
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands but TLI.getShiftAmountTy() so far only return scalar type. As a result, backend logic assuming that breaks. - Rename the original TLI.getShiftAmountTy() to TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to return target-specificed scalar type or the same vector type as the 1st operand. - Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar type. llvm-svn: 176364
-
Duncan Sands authored
llvm-svn: 176341
-
- Feb 28, 2013
-
-
Yiannis Tsiouris authored
llvm-svn: 176270
-
- Feb 27, 2013
-
-
Nadav Rotem authored
llvm-svn: 176171
-
Nadav Rotem authored
llvm-svn: 176166
-
- Feb 26, 2013
-
-
Chad Rosier authored
arguments type is a simple type. rdar://13290455 llvm-svn: 176066
-
Michael Liao authored
- Put expensive checking after simple one llvm-svn: 176060
-
Michael Liao authored
- Check whether SSE is available before lowering all 1s vector building with PCMPEQD, which is only available from SSE2 llvm-svn: 176058
-
- Feb 25, 2013
-
-
Chad Rosier authored
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need for SDISel for trivial functions. Also, add the new llc -fast-isel-abort-args option, which is similar to -fast-isel-abort option, but for formal argument lowering. llvm-svn: 176052
-
Chad Rosier authored
rdar://13254235 llvm-svn: 176036
-
- Feb 24, 2013
-
-
Nadav Rotem authored
Fix PR15239. llvm-svn: 175985
-
- Feb 23, 2013
-
-
Benjamin Kramer authored
Fixes PR15115. llvm-svn: 175962
-
- Feb 22, 2013
-
-
Peter Collingbourne authored
llvm-svn: 175911
-
- Feb 21, 2013
-
-
Eli Bendersky authored
to TargetFrameLowering, where it belongs. Incidentally, this allows us to delete some duplicated (and slightly different!) code in TRI. There are potentially other layering problems that can be cleaned up as a result, or in a similar manner. The refactoring was OK'd by Anton Korobeynikov on llvmdev. Note: this touches the target interfaces, so out-of-tree targets may be affected. llvm-svn: 175788
-