- Jun 12, 2017
-
-
Tony Jiang authored
Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214
-
Tony Jiang authored
Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210
-
Matt Arsenault authored
For the last component, the same register use was added as an implicit use and another implicit kill use. llvm-svn: 305205
-
Geoff Berry authored
Summary: This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU target triples. This optimization was being inhibited when -ffast-math wasn't set because sincos in GLibC does not set errno, while sin and cos do. However, this optimization will only run if the attributes on the sin/cos calls include readnone, which is how clang represents the fact that it doesn't care about the errno values set by these functions (via the -fno-math-errno flag). Reviewers: hfinkel, bogner Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond Differential Revision: https://reviews.llvm.org/D32921 llvm-svn: 305204
-
Matt Arsenault authored
Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203
-
Matt Arsenault authored
llvm-svn: 305201
-
Matt Arsenault authored
For convenience the operand is always present in the instruction, but it isn't valid to use except on GFX9. llvm-svn: 305200
-
Haicheng Wu authored
SW prefetch is good for Falkor. Differential Revision: http://reviews.llvm.org/D34084 llvm-svn: 305199
-
Matt Arsenault authored
llvm-svn: 305194
-
Than McIntosh authored
Summary: The old check for slot overlap treated 2 slots `S` and `T` as overlapping if there existed a CFG node in which both of the slots could possibly be active. That is overly conservative and caused stack blowups in Rust programs. Instead, check whether there is a single CFG node in which both of the slots are possibly active *together*. Fixes PR32488. Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com> Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk Reviewed By: thanm Subscribers: dotdash Differential Revision: https://reviews.llvm.org/D31583 llvm-svn: 305193
-
Sanjay Patel authored
This step is just intended to reduce code duplication rather than change any functionality. A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper. Differential Revision: https://reviews.llvm.org/D33649 llvm-svn: 305192
-
Sanjay Patel authored
This is a follow-up to https://reviews.llvm.org/D33879 / https://reviews.llvm.org/rL304939 , and was discussed in https://reviews.llvm.org/D33338. We prefer this form because a narrower shift may be cheaper, and we can more easily fold a zext than a sext. http://rise4fun.com/Alive/slVe Name: shz %s = sext i8 %x to i12 %r = lshr i12 %s, 4 => %a = ashr i8 %x, 4 %r = zext i8 %a to i12 llvm-svn: 305190
-
Daniel Neilson authored
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189
-
Simon Pilgrim authored
First possible step towards merging SSE/AVX memory folding pattern fragments. Also allows us to remove the duplicate non-temporal load logic. Differential Revision: https://reviews.llvm.org/D33902 llvm-svn: 305184
-
Craig Topper authored
llvm-svn: 305180
-
http://bugs.llvm.org/pr32207Yaron Keren authored
Address http://bugs.llvm.org/pr32207 by making BannerPrinted local to runOnSCC and skipping banner for function declarations. Reviewed By: Mehdi AMINI Differential Revision: https://reviews.llvm.org/D34086 llvm-svn: 305179
-
- Jun 11, 2017
-
-
Sanjay Patel authored
I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171
-
Xinliang David Li authored
Differential Revision: http://reviews.llvm.org/D33847 llvm-svn: 305170
-
Simon Pilgrim authored
llvm-svn: 305163
-
Amaury Sechet authored
Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid. Reviewers: joerg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34088 llvm-svn: 305162
-
Davide Italiano authored
llvm-svn: 305160
-
- Jun 10, 2017
-
-
David Blaikie authored
llvm-svn: 305152
-
Vedant Kumar authored
lib/Object/WindowsResource.cpp:578:3: runtime error: store to misaligned address 0x7fa09aedebbe for type 'unsigned int', which requires 4 byte alignment 0x7fa09aedebbe: note: pointer points here 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ llvm-svn: 305149
-
Geoff Berry authored
[EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default). Summary: Use MemorySSA for memory dependency checking in the EarlyCSE pass at the start of the function simplification portion of the pipeline. We rely on the fact that GVNHoist runs just after this pass of EarlyCSE to amortize the MemorySSA construction cost since GVNHoist uses MemorySSA and EarlyCSE preserves it. This is turned off by default. A follow-up change will turn it on to allow for easier reversion in case it breaks something. llvm-svn: 305146
-
Galina Kistanova authored
llvm-svn: 305143
-
Wei Ding authored
Differential Revision: http://reviews.llvm.org/D28531 llvm-svn: 305137
-
Andrew Kaylor authored
Differential Revision: https://reviews.llvm.org/D33737 llvm-svn: 305132
-
Sanjay Patel authored
We're currently passing endian-ness around as a param (and not uniformly), so this eliminates the need for that. I'd like to add a constant fold call too, and that requires a DL. llvm-svn: 305129
-
I-Jui (Ray) Sung authored
Summary: - Fix assertion failures on F16 to/from int types in FastISel by falling back to regular ISel - Add a testcase of various conversion cases with FastISel (-O0) Reviewers: kristof.beyls, jmolloy, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D33734 llvm-svn: 305127
-
- Jun 09, 2017
-
-
Craig Topper authored
llvm-svn: 305115
-
Craig Topper authored
Previously it was non-const reference named Result which would tend to make someone think that it was an outparam when really its an input. llvm-svn: 305114
-
Zachary Turner authored
llvm-svn: 305108
-
Yaxun Liu authored
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in GEPOperator::accumulateConstantOffset. Basically it does not consider the situation that the pointer operand of load or store may be in a non-zero address space and its size may be different from the size of a pointer in address space 0. This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend. Diffferential Revision: https://reviews.llvm.org/D33298 llvm-svn: 305107
-
Keno Fischer authored
Summary: isSafeToSpeculativelyExecute is the wrong predicate to use here. All that checks for is whether it is safe to hoist a value due to unaligned/un-dereferencable accesses. However, not only are we doing sinking rather than hoisting, our concern is that the location we're loading from may have been modified. Instead forbid sinking any load across a critical edge. Reviewers: majnemer Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D33179 llvm-svn: 305102
-
Stanislav Mekhanoshin authored
Differential Revision: https://reviews.llvm.org/D34046 llvm-svn: 305098
-
Zachary Turner authored
Previously extractors tried to be stateless with any additional context information needed in order to parse items being passed in via the extraction method. This led to quite cumbersome implementation challenges and awkwardness of use. This patch brings back support for stateful extractors, making the implementation and usage simpler. llvm-svn: 305093
-
Eric Beckmann authored
Summary: Add the WindowsResourceCOFFWriter class for producing the final COFF after all parsing is done. Reviewers: hiraditya!, zturner, ruiu Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34020 llvm-svn: 305092
-
Simon Pilgrim authored
If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091
-
Craig Topper authored
[LazyValueInfo] Don't run the more complex predicate handling code for EQ and NE in getPredicateResult Summary: Unless I'm mistaken, the special handling for EQ/NE should cover everything and there is no reason to fallthrough to the more complex code. For that matter I'm not sure there's any reason to special case EQ/NE other than avoiding creating temporary ConstantRanges. This patch moves the complex code into an else so we only do it when we are handling a predicate other than EQ/NE. Reviewers: anna, reames, resistor, Farhana Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34000 llvm-svn: 305086
-
Krzysztof Parzyszek authored
- Add some missing patterns. - Use C4_cmplte in branch patterns. - Fix signedness of immediate operand in M2_accii. llvm-svn: 305085
-