- Jul 10, 2018
-
-
Scott Linder authored
Move all metadata construction into AMDGPUHSAMetadataStreamer. Differential Revision: https://reviews.llvm.org/D48176 llvm-svn: 336707
-
Alexander Ivchenko authored
The instruction selection is automatically handled by tablegen llvm-svn: 336703
-
Eugene Leviant authored
This fixes PR38120 llvm-svn: 336702
-
Simon Pilgrim authored
udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts. llvm-svn: 336701
-
Jonas Devlieghere authored
This reverts r336529 because an alternative approach turned out to be a better fit for dsymuil. llvm-svn: 336698
-
Konstantin Zhuravlyov authored
amdgpu-implicitarg-num-bytes attribute Differential Revision: https://reviews.llvm.org/D49096 llvm-svn: 336697
-
Sanjay Patel authored
This corresponds with the code for the single binop pattern added in rL336684. llvm-svn: 336696
-
Simon Pilgrim authored
Match the tests in combine-sdiv.ll llvm-svn: 336694
-
Ulrich Weigand authored
The llvm_gcov_... routines in compiler-rt are regular C functions that need to be called using the proper C ABI for the target. The current code simply calls them using plain LLVM IR types. Since the type are mostly simple, this happens to just work on certain targets. But other targets still need special handling; in particular, it may be necessary to sign- or zero-extended sub-word values to comply with the ABI. This caused gcov failures on SystemZ in particular. Now the very same problem was already fixed for the llvm_profile_ calls here: https://reviews.llvm.org/D21736 This patch uses the same method to fix the llvm_gcov_ calls, in particular calls to llvm_gcda_start_file, llvm_gcda_emit_function, and llvm_gcda_emit_arcs. Reviewed By: marco-c Differential Revision: https://reviews.llvm.org/D49134 llvm-svn: 336692
-
Heejin Ahn authored
llvm-svn: 336691
-
Konstantin Zhuravlyov authored
hsa-metadata-enqueu-kernel.ll -> hsa-metadata-enqueue-kernel.ll llvm-svn: 336689
-
Jonas Devlieghere authored
When manually finishing the object writer in dsymutil, it's possible that there are pending labels that haven't been resolved. This results in an assertion when the assembler tries to fixup a label that doesn't have an address yet. Differential revision: https://reviews.llvm.org/D49131 llvm-svn: 336688
-
Paul Robinson authored
llvm-svn: 336687
-
Sanjay Patel authored
This was originally intended with D48893, but as discussed there, we have to make the folds safe from producing extra poison. This should give the single binop folds the same capabilities as the existing folds for 2-binops+shuffle. LLVM binary opcode review: there are a total of 18 binops. There are 7 commutative binops (add, mul, and, or, xor, fadd, fmul) which we already fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr, fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't bother with sub/fsub with constant operand 1 because those are canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18. llvm-svn: 336684
-
Rui Ueyama authored
This accessor is useful and could be slightly more efficient than Str.val().data() because you can avoid StringRef instantiation. Differential Revision: https://reviews.llvm.org/D49133 llvm-svn: 336683
-
Krzysztof Parzyszek authored
If a machine function satisfies SSA, the IsSSA property is assumed even if the pass to be executed runs after existing from SSA. If the pass output then does not conform to SSA, a verifier error will be flagged (with expensive checks enabled). llvm-svn: 336682
-
Paul Robinson authored
debug compilation dir when compiling assembly files with -g. Part of PR38050. Patch by Siddhartha Bagaria! Differential Revision: https://reviews.llvm.org/D48988 llvm-svn: 336680
-
Sanjay Patel authored
llvm-svn: 336679
-
Sander de Smalen authored
This patch adds support for the following instructions: CLS (Count Leading Sign bits) CLZ (Count Leading Zeros) CNT (Count non-zero bits) CNOT (Logically invert boolean condition in vector) NOT (Bitwise invert vector) FABS (Floating-point absolute value) FNEG (Floating-point negate) All operations are predicated and unary, e.g. clz z0.s, p0/m, z1.s - CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32 and 64 bit elements. - FABS and FNEG have variants for 16, 32 and 64 bit elements. llvm-svn: 336677
-
Matt Arsenault authored
This reverts commit r336623 llvm-svn: 336675
-
Sanjay Patel authored
The case with 2 variables is more complicated than the case where we eliminate the shuffle entirely because a shuffle with an undef mask element creates an undef result. I'm not aware of any current analysis/transform that recognizes that undef propagating to a div/rem/shift, but we have to guard against the possibility. llvm-svn: 336668
-
Anastasis Grammenos authored
Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667
-
Simon Pilgrim authored
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656. llvm-svn: 336664
-
Krzysztof Parzyszek authored
An explicit untied use is not sufficient to maintain liveness of a register redefined in a predicated instruction. For example %1 = COPY %0 ... %1 = A2_paddif %2, %1, 1 could become $r1 = COPY $r0 ... $r1 = A2_paddif $p0, $r1, 1 and later $r1 = COPY $r0 ;; this is not really dead! ... $r1 = A2_paddif $p0, $r0, 1 llvm-svn: 336662
-
Karl-Johan Karlsson authored
Summary: Fixed two cases of where PHI nodes need to be updated by lowerswitch. When lowerswitch find out that the switch default branch is not reachable it remove the old default and replace it with the most popular block from the cases, but it forget to update the PHI nodes in the default block. The PHI nodes also need to be updated when the switch is replaced with a single branch. Reviewers: hans, reames, arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D47203 llvm-svn: 336659
-
Sam McCall authored
Parsing invalid UTF-8 input is now a parse error. Creating JSON values from invalid UTF-8 now triggers an assertion, and (in no-assert builds) substitutes the unicode replacement character. Strings retrieved from json::Value are always valid UTF-8. llvm-svn: 336657
-
Simon Pilgrim authored
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM. llvm-svn: 336656
-
Florian Hahn authored
Reviewers: dcaballe, hsaito, rengolin Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D49032 llvm-svn: 336653
-
Simon Pilgrim authored
llvm-svn: 336649
-
Chandler Carruth authored
llvm-svn: 336647
-
Chandler Carruth authored
switch unswitching. The core problem was that the way we handled unswitching trivial exit edges through the default successor of a switch. For some reason I thought the right way to do this was to add a block containing unreachable and point the default successor at this block. In retrospect, this has an amazing number of problems. The first issue is the one that this pass has always worked around -- we have to *detect* such edges and avoid unswitching them again. This seemed pretty easy really. You juts look for an edge to a block containing unreachable. However, this pattern is woefully unsound. So many things can break it. The amazing thing is that I found a test case where *simple-loop-unswitch itself* breaks this! When we do a *non-trivial* unswitch of a switch we will end up splitting this exit edge. The result will be a default successor that is an exit and terminates in ... a perfectly normal branch. So the first test case that I started trying to fix is added to the nontrivial test cases. This is a ridiculous example that did just amazing things previously. With just unswitch, it would create 10+ copies of this stuff stamped out. But if you combine it *just right* with a bunch of other passes (like simplify-cfg, loop rotate, and some LICM) you can get it to do this infinitely. Or at least, I never got it to finish. =[ This, in turn, uncovered another related issue. When we are manipulating these switches after doing a trivial unswitch we never correctly updated PHI nodes to reflect our edits. As soon as I started changing how these edges were managed, it became obvious there were more issues that I couldn't realistically leave unaddressed, so I wrote more test cases around PHI updates here and ensured all of that works now. And this, in turn, required some adjustment to how we collect and manage the exit successor when it is the default successor. That showed a clear bug where we failed to include it in our search for the outer-most loop reached by an unswitched exit edge. This was actually already tested and the test case didn't work. I (wrongly) thought that was due to SCEV failing to analyze the switch. In fact, it was just a simple bug in the code that skipped the default successor. While changing this, I handled it correctly and have updated the test to reflect that we now get precise SCEV analysis of trip counts for the outer loop in one of these cases. llvm-svn: 336646
-
Mikhail Dvoretckii authored
This patch adds fast-isel tests for the IR patterns produced for truncation intrinsics in rC336643. Differential Revision: https://reviews.llvm.org/D48822 llvm-svn: 336645
-
Simon Pilgrim authored
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data. This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage. Differential Revision: https://reviews.llvm.org/D48936 llvm-svn: 336642
-
Craig Topper authored
[X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 and 64 bit checks together. NFC llvm-svn: 336641
-
Craig Topper authored
[X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority. This is a much more direct way to solve the issue than just giving extra priority. llvm-svn: 336639
-
Craig Topper authored
We're missing the EVEX equivalents of these patterns and seem to get along fine. I think we end up with X86vzload for the obvious IR cases that would produce this DAG. llvm-svn: 336638
-
Craig Topper authored
We no longer need custom handling in clang. llvm-svn: 336627
-
Craig Topper authored
[X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer. DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load. llvm-svn: 336626
-
Craig Topper authored
[X86] Add test cases that show failure to fold load into vfixupimm instructions due to bad isel pattern. llvm-svn: 336625
-
Craig Topper authored
The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it. llvm-svn: 336624
-