- Sep 05, 2016
-
-
Simon Pilgrim authored
llvm-svn: 280661
-
Igor Breger authored
Differential Revision: http://reviews.llvm.org/D23983 llvm-svn: 280650
-
Craig Topper authored
[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648
-
Craig Topper authored
llvm-svn: 280645
-
- Sep 04, 2016
-
-
Simon Pilgrim authored
llvm-svn: 280634
-
Craig Topper authored
[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR. llvm-svn: 280633
-
Simon Pilgrim authored
llvm-svn: 280631
-
Simon Pilgrim authored
llvm-svn: 280629
-
Igor Breger authored
https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625
-
Simon Pilgrim authored
llvm-svn: 280624
-
Craig Topper authored
llvm-svn: 280611
-
- Sep 03, 2016
-
-
Xinliang David Li authored
CGP currently drops select's MD_prof profile data when generating conditional branch which can lead to bad code layout. The patch fixes the issue. Differential Revision: http://reviews.llvm.org/D24169 llvm-svn: 280600
-
Craig Topper authored
[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test. llvm-svn: 280593
-
Craig Topper authored
llvm-svn: 280581
-
Wei Mi authored
Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86. llvm-svn: 280568
-
- Sep 02, 2016
-
-
Andrea Di Biagio authored
This fixes a regression introduced by revision 268094. Revision 268094 added the following dag combine rule: // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2 That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2). The problem is that the constraint on the shift count is incorrect, so the rule doesn't work well in some cases involving vector types. The combine rule should have been written instead like this: // trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits() Basically, if K is smaller than the "scalar size in bits" of the truncated value then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined. This patch fixes the check on the shift count, and adds test cases to make sure that we don't regress the behavior. Differential Revision: https://reviews.llvm.org/D24154 llvm-svn: 280482
-
Craig Topper authored
[AVX-512] Move tests for masked floating point logical operations to avx512dqvl-intrinsics-upgrade.ll since they have now been autoupgraded. llvm-svn: 280467
-
Craig Topper authored
[AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type. These are needed in order to remove the masked floating point logical operation intrinsics and use native IR. llvm-svn: 280465
-
Craig Topper authored
[AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same. llvm-svn: 280464
-
Michael Kuperstein authored
When expanding a SETCC for which the low half is known to evaluate to false, we can only throw it away for LT/GT comparisons, not LE/GE. This fixes PR29170. Differential Revision: https://reviews.llvm.org/D24151 llvm-svn: 280424
-
- Sep 01, 2016
-
-
Michael Kuperstein authored
Prior to this, we could generate a vector_shuffle from an IR shuffle when the size of the result was exactly the sum of the sizes of the input vectors. If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts and inserts. Instead, we can form a larger vector_shuffle, and then extract a subvector of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8> and then extract a <12 x i8>. This also includes a target-specific X86 combine that in the presence of AVX2 combines: (vector_shuffle <mask> (concat_vectors t1, undef) (concat_vectors t2, undef)) into: (vector_shuffle <mask> (concat_vectors t1, t2), undef) in cases where this allows us to form VPERMD/VPERMQ. (This is not a separate commit, as that pattern does not appear without the DAGBuilder change.) llvm-svn: 280418
-
Andrey Turetskiy authored
According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned to 16 bytes. This patch removes this requirement from the memory folding table. Differential Revision: https://reviews.llvm.org/D23919 llvm-svn: 280402
-
Michael Kuperstein authored
Legalization tends to create anyext(trunc) patterns. This should always be combined - into either a single trunc, a single ext, or nothing if the types match exactly. But if we happen to combine the trunc first, we may pull the trunc away from the anyext or make it implicit (e.g. the truncate(extract) -> extract(bitcast) fold). To prevent this, we can avoid doing the fold, similarly to how we already handle fpround(fpextend). Differential Revision: https://reviews.llvm.org/D23893 llvm-svn: 280386
-
Elena Demikhovsky authored
-(a*b+c) and FNEG + FMA, like a*b-c or (-a)*b+c. The bug description is here : https://llvm.org/bugs/show_bug.cgi?id=28892 Differential revision: https://reviews.llvm.org/D23313 llvm-svn: 280368
-
Dean Michael Berris authored
Summary: This change promotes the 'isTailCall(...)' member function to TargetInstrInfo as a query interface for determining on a per-target basis whether a given MachineInstr is a tail call instruction. We build upon this in the XRay instrumentation pass to emit special sleds for tail call optimisations, where we emit the correct kind of sled. The tail call sleds look like a mix between the function entry and function exit sleds. Form-wise, the sled comes before the "jmp" instruction that implements the tail call similar to how we do it for the function entry sled. Functionally, because we know this is a tail call, it behaves much like an exit sled -- i.e. at runtime we may use the exit trampolines instead of a different kind of trampoline. A follow-up change to recognise these sleds will be done in compiler-rt, so that we can start intercepting these initially as exits, but also have the option to have different log entries to more accurately reflect that this is actually a tail call. Reviewers: echristo, rSerge, majnemer Subscribers: mehdi_amini, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D23986 llvm-svn: 280334
-
- Aug 31, 2016
-
-
Tim Northover authored
More preparation for dropping source types from MachineInstrs: regsters coming out of already-selected code (i.e. non-generic instructions) don't have a type, but that information is needed so we must add it manually. This is done via a new G_TYPE instruction. llvm-svn: 280292
-
Philip Reames authored
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250
-
Simon Pilgrim authored
Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles. Differential Revision: https://reviews.llvm.org/D23797 llvm-svn: 280249
-
Simon Pilgrim authored
Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps Differential Revision: https://reviews.llvm.org/D23797 llvm-svn: 280214
-
Craig Topper authored
This is needed in order to replace the masked floating point logical op intrinsics with native IR. llvm-svn: 280195
-
Craig Topper authored
[AVX-512] Add test cases for masked floating point logic operations with bitcasts between the logic ops and the select. We don't currently select masked operations for these cases. Test cases taken from optimized clang output after trying to convert the masked floating point logical op intrinsics to native IR. llvm-svn: 280194
-
Craig Topper authored
llvm-svn: 280193
-
Dean Michael Berris authored
Add a .mir test to catch this case, and fix the xray-instrumentation pass to handle it appropriately. llvm-svn: 280192
-
- Aug 29, 2016
-
-
Reid Kleckner authored
We should revert this change once we drop support for MSVC 2013. llvm-svn: 279979
-
Sanjay Patel authored
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a similar bug in Constant::canTrap(). This bug manifests in PR29114: https://llvm.org/bugs/show_bug.cgi?id=29114 ...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> type. Differential Revision: https://reviews.llvm.org/D23974 llvm-svn: 279970
-
Krzysztof Parzyszek authored
MRI::getMaxLaneMaskForVReg does not always cover the whole register. For example, on X86 the upper 16 bits of EAX cannot be accessed via any subregister. Consequently, there is no lane mask that only covers that part of EAX. The getMaxLaneMaskForVReg will return the union of the lane masks for all subregisters, and in case of EAX, that union will not cover the upper 16 bits. This fixes https://llvm.org/bugs/show_bug.cgi?id=29132 llvm-svn: 279969
-
Igor Breger authored
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist. In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain). Differential Revision: http://reviews.llvm.org/D23756 llvm-svn: 279961
-
Igor Breger authored
Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960
-
Craig Topper authored
[X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered. This allows broadcast loads to used when available. llvm-svn: 279958
-
Craig Topper authored
llvm-svn: 279956
-