- Apr 04, 2018
-
-
Lei Huang authored
Legalize and emit code for the following quad-precision fma: * xsmaddqp * xsnmaddqp * xsmsubqp * xsnmsubqp Differential Revision: https://reviews.llvm.org/D44843 llvm-svn: 329206
-
Dmitry Preobrazhensky authored
See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197
-
Dmitry Preobrazhensky authored
See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187
-
Nico Weber authored
Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181
-
Nico Weber authored
They were added in r285274, in what looks like a merge mishap. AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that revision. Also sort the tablegen lines to make this easier to spot in the future. llvm-svn: 329178
-
Benjamin Kramer authored
llvm-svn: 329170
-
Nicolai Haehnle authored
Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166
-
Nicolai Haehnle authored
Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164
-
John Brawn authored
Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163
-
Mikhail Maltsev authored
Summary: Patch https://reviews.llvm.org/D44467 implements conversion of invalid vmov instructions into valid ones. It turned out that some valid instructions also get converted, for example vmov.i64 d2, #0xff00ff00ff00ff00 -> vmov.i16 d2, #0xff00 Such behavior is incorrect because according to the ARM ARM section F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD instructions, "On assembly, the data type must be matched in the table if possible." This patch fixes the isNEONmovReplicate check so that the above instruction is not modified any more. Reviewers: rengolin, olista01 Reviewed By: rengolin Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D44678 llvm-svn: 329158
-
Craig Topper authored
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154
-
Craig Topper authored
llvm-svn: 329153
-
Craig Topper authored
[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152
-
Craig Topper authored
These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147
-
Craig Topper authored
llvm-svn: 329146
-
Vlad Tsyrklevich authored
llvm-svn: 329140
-
Vlad Tsyrklevich authored
Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139
-
Jessica Paquette authored
This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function *truly* uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134
-
Farhana Aleen authored
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131
-
Evandro Menezes authored
Fix typo and simplify matching expression. llvm-svn: 329130
-
Ikhlas Ajbar authored
Move the check canPeel() to Hexagon Target before setting PeelCount. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329129
-
- Apr 03, 2018
-
-
Jessica Paquette authored
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120
-
Farhana Aleen authored
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119
-
Farhana Aleen authored
llvm-svn: 329114
-
Jun Bum Lim authored
Summary: This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs property, so now the MachineFunctionPass can enforce this check. These passes are disabled in NVPTX & WebAssembly. Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier Reviewed By: dschuff, thegameg Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D45183 llvm-svn: 329095
-
Krzysztof Parzyszek authored
Specifying the HVX vector length should be done via the -mhvx-length option. llvm-svn: 329079
-
Andrea Di Biagio authored
[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
-
Hiroshi Inoue authored
Reorder entries added in my previous commit (rL328969) to keep alphabetical order. llvm-svn: 329064
-
Craig Topper authored
TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx|0x4. llvm-svn: 329049
-
Yonghong Song authored
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by:
Yonghong Song <yhs@fb.com> Acked-by:
Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
-
Ikhlas Ajbar authored
For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
-
- Apr 02, 2018
-
-
Dmitry Preobrazhensky authored
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847 Differential Revision: https://reviews.llvm.org/D45097 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328988
-
Dmitry Preobrazhensky authored
Fixed a bug which caused Tablegen crash. See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328983
-
Krzysztof Parzyszek authored
llvm-svn: 328981
-
Nico Weber authored
llvm-svn: 328978
-
Dmitry Preobrazhensky authored
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837 Differential Revision: https://reviews.llvm.org/D45085 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328975
-
Lama Saba authored
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973
-
Hiroshi Inoue authored
This patch adds L(D|W|H|B)XTLS instructions introduced by https://reviews.llvm.org/rL327635 in P9InstrResources.td. llvm-svn: 328969
-
Craig Topper authored
[X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962
-
Craig Topper authored
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961
-