- Oct 24, 2016
-
-
Ehsan Amiri authored
https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995
-
Eli Friedman authored
The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993
-
Evandro Menezes authored
Add support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series. Differential revision: https://reviews.llvm.org/D25291 llvm-svn: 284986
-
Ehsan Amiri authored
https://reviews.llvm.org/D24924 This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review. llvm-svn: 284983
-
Nicolai Haehnle authored
Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980
-
Pavel Labath authored
llvm-svn: 284975
-
Joel Jones authored
Summary: Add relocations for AArch64 ILP32. Includes: - Addition of definitions for R_AARCH32_* - Definition of new -target-abi: ilp32 - Definition of data layout string - Tests for added relocations. Not comprehensive, but matches existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64". - Tests for llvm-readobj Reviewers: zatrazz, peter.smith, echristo, t.p.northover Subscribers: aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D25159 llvm-svn: 284973
-
Krzysztof Parzyszek authored
llvm-svn: 284972
-
Simon Dardis authored
Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci as not being part of microMIPS. This does not cover the sync instruction alias, as that will be handled with a different patch. Add sync to the valid tests for microMIPS. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D25795 llvm-svn: 284962
-
Craig Topper authored
Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow. llvm-svn: 284955
-
- Oct 23, 2016
-
-
Simon Pilgrim authored
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939
-
Simon Pilgrim authored
More obvious implementation and faster too. llvm-svn: 284937
-
Dylan McKay authored
This adds a super basic implementation of a machine code disassembler. It doesn't support any operands with custom encoding. llvm-svn: 284930
-
- Oct 22, 2016
-
-
Simon Pilgrim authored
llvm-svn: 284922
-
Simon Pilgrim authored
llvm-svn: 284921
-
James Molloy authored
tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917
-
Craig Topper authored
llvm-svn: 284915
-
Craig Topper authored
[X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front. llvm-svn: 284914
-
Craig Topper authored
[X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC llvm-svn: 284913
-
Craig Topper authored
[X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask. I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse. llvm-svn: 284912
-
Simon Pilgrim authored
llvm-svn: 284911
-
Konstantin Zhuravlyov authored
This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/cvt_f32_ubyte.ll Differential Revision: https://reviews.llvm.org/D25805 llvm-svn: 284891
-
- Oct 21, 2016
-
-
Tom Stellard authored
Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25782 llvm-svn: 284875
-
Peter Collingbourne authored
If a 64-bit value is tested against a bit which is known to be in the range [0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly shorter encoding. Differential Revision: https://reviews.llvm.org/D25862 llvm-svn: 284864
-
Simon Pilgrim authored
llvm-svn: 284863
-
Simon Pilgrim authored
llvm-svn: 284860
-
Simon Pilgrim authored
[X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw llvm-svn: 284858
-
Krzysztof Parzyszek authored
llvm-svn: 284857
-
Krzysztof Parzyszek authored
After register allocation it is possible to have a spill of a register that is only partially defined. That in itself it fine, but creates a problem for double vector registers. Stores of such registers are pseudo instructions that are expanded into pairs of individual vector stores, and in case of a partially defined source, one of the stores may use an entirely undefined register. To avoid this, track the defined parts and only generate actual stores for those. llvm-svn: 284841
-
Derek Schuff authored
Summary: Need to reorder the operands to have the callee as the last argument. Adds a pseudo-instruction, and a pass to lower it into a real call_indirect. This is the first of two options for how to fix the problem. Reviewers: dschuff, sunfish Subscribers: jfb, beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25708 llvm-svn: 284840
-
Abderrazek Zaafrani authored
llvm-svn: 284839
-
Simon Pilgrim authored
llvm-svn: 284835
-
Abderrazek Zaafrani authored
llvm-svn: 284832
-
Artem Tamazov authored
Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 llvm-svn: 284825
-
Simon Pilgrim authored
llvm-svn: 284823
-
Simon Pilgrim authored
llvm-svn: 284821
-
Bjorn Pettersson authored
Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 llvm-svn: 284814
-
- Oct 20, 2016
-
-
Michael Kuperstein authored
This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779
-
Konstantin Zhuravlyov authored
Differential Revision: https://reviews.llvm.org/D25746 llvm-svn: 284760
-
Konstantin Zhuravlyov authored
[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759
-