- Jan 26, 2016
-
-
Dan Gohman authored
For historic reasons, the behavior of .align differs between targets. Fortunately, there are alternatives, .p2align and .balign, which make the interpretation of the parameter explicit, and which behave consistently across targets. This patch teaches MC to use .p2align instead of .align, so that people reading code for multiple architectures don't have to remember which way each platform does its .align directive. Differential Revision: http://reviews.llvm.org/D16549 llvm-svn: 258750
-
- Jan 25, 2016
-
-
Matthias Braun authored
There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729
-
Simon Pilgrim authored
Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16520 llvm-svn: 258688
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16519 llvm-svn: 258686
-
Asaf Badouh authored
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680
-
Igor Breger authored
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675
-
- Jan 24, 2016
-
-
Elena Demikhovsky authored
Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659
-
Igor Breger authored
Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657
-
- Jan 23, 2016
-
-
Simon Pilgrim authored
llvm-svn: 258646
-
Sanjay Patel authored
For the moment, this file takes way too long to run (see inline comments), but that should be a temporary problem. The fact that the compile time is so slow for a target that doesn't support maskmov may be a bug worth investigating too. llvm-svn: 258629
-
Simon Pilgrim authored
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622
-
Manuel Jacob authored
llvm-svn: 258615
-
- Jan 22, 2016
-
-
Sanjay Patel authored
llvm-svn: 258568
-
Dan Gohman authored
This reapplies r258296 and r258366, and also fixes an existing bug in SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the offset in a GlobalAddressSDNode, which is uncovered by those patches. llvm-svn: 258482
-
Reid Kleckner authored
This reverts r258296 and the follow up r258366. With this change, we miscompiled the following program on Windows: #include <string> #include <iostream> static const char kData[] = "asdf jkl;"; int main() { std::string s(kData + 3, sizeof(kData) - 3); std::cout << s << '\n'; } llvm-svn: 258465
-
- Jan 21, 2016
-
-
Reid Kleckner authored
The X86 musttail implementation finds register parameters to forward by running the calling convention algorithm until a non-register location is returned. However, assigning a vector memory location has the side effect of increasing the function's stack alignment. We shouldn't increase the stack alignment when we are only looking for register parameters, so this change conditionalizes it. llvm-svn: 258442
-
Simon Pilgrim authored
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440
-
Igor Breger authored
Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16398 llvm-svn: 258397
-
Dan Gohman authored
This fixes a miscompile in MultiSource/Benchmarks/MiBench/consumer-lame introduced in r258296. llvm-svn: 258366
-
- Jan 20, 2016
-
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16296 llvm-svn: 258316
-
Igor Breger authored
Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309
-
Dan Gohman authored
SelectionDAG previously missed opportunities to fold constants into GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to create `(add GA+c, y)`. This isn't often visible on targets such as X86 which effectively reassociate adds in their complex address-mode folding logic, however it is currently visible on WebAssembly since it currently has very simple address mode folding code that doesn't reassociate anything. This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses at the same times that it folds constants together, so that it doesn't miss any opportunities to perform such folding. Differential Revision: http://reviews.llvm.org/D16090 llvm-svn: 258296
-
Sanjoy Das authored
Summary: This teaches MachineSink to not sink instructions that might break the implicit null check optimization that runs later. This should not affect frontends that do not use implicit null checks. Reviewers: aadg, reames, hfinkel, atrick Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D14632 llvm-svn: 258254
-
Quentin Colombet authored
calling convention. The implementation of the related callbacks in the x86 backend for such functions are not ready to deal with a prologue block that is not the entry block of the function. This fixes PR26107, but the longer term solution would be to fix those callbacks. llvm-svn: 258221
-
Simon Pilgrim authored
Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215
-
- Jan 19, 2016
-
-
Simon Pilgrim authored
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205
-
Asaf Badouh authored
cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124
-
- Jan 18, 2016
-
-
Simon Pilgrim authored
llvm-svn: 258094
-
Simon Pilgrim authored
llvm-svn: 258091
-
Simon Pilgrim authored
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081
-
Igor Breger authored
Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047
-
Igor Breger authored
code example , previous implementation. movzbl %dil, %eax kmovw %eax, %k0 new code kmovw %edi, %k0 Differential Revision: http://reviews.llvm.org/D16287 llvm-svn: 258045
-
- Jan 17, 2016
-
-
Simon Pilgrim authored
llvm-svn: 258013
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16189 llvm-svn: 258008
-
Michael Zuckerman authored
Differential Revision: http://reviews.llvm.org/D16194 llvm-svn: 258006
-
- Jan 16, 2016
-
-
Simon Pilgrim authored
Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000
-
Simon Pilgrim authored
llvm-svn: 257998
-
Manman Ren authored
%RBP can't be handled explicitly. We generate the following code: pushq %rbp movq %rsp, %rbp ... movq %rbx, (%rbp) ## 8-byte Spill where %rbp will be overwritten by the spilled value. The fix is to let PEI handle %RBP. PR26136 llvm-svn: 257997
-