- Feb 19, 2019
-
-
Andrew Scheidecker authored
llvm-svn: 354378
-
Craig Topper authored
[X86] Don't consider functions ABI compatible for ArgumentPromotion pass if they view 512-bit vectors differently. The use of the -mprefer-vector-width=256 command line option mixed with functions using vector intrinsics can create situations where one function thinks 512 vectors are legal, but another fucntion does not. If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion might try to pass by value instead. This will result in type legalization for the two functions handling the 512 bit vector differently leading to runtime failures. Had the 512 bit vector been passed by value from clang codegen, both functions would have been tagged with a min-legal-vector-width=512 function attribute. That would make them be legalized the same way. I observed this issue in 32-bit mode where a union containing a 512 bit vector was being passed by a function that used intrinsics to one that did not. The caller ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1. The fix implemented here is just to consider it a mismatch if two functions would handle 512 bit differently without looking at the types that are being considered. This is the easist and safest fix, but it can be improved in the future. Differential Revision: https://reviews.llvm.org/D58390 llvm-svn: 354376
-
Matthew Voss authored
- Tests that use multiple short switches now test them grouped and ungrouped. - Ensure the output of ungrouped and grouped variants is identical Differential Revision: https://reviews.llvm.org/D57904 llvm-svn: 354375
-
Daniel Sanders authored
Surprisingly, check_symbol_exists is not sufficient. The macOS linker checks the called functions against a compatibility list for the given deployment target and check_symbol_exists doesn't trigger this check as it never calls the function. This fixes the GreenDragon bots where the deployment target is 10.9 llvm-svn: 354374
-
Daniel Sanders authored
Summary: Instruments is a useful tool for finding performance issues in LLVM but it can be difficult to identify regions of interest on the timeline that we can use to filter the profiler or allocations instrument. Xcode 10 and the latest macOS/iOS/etc. added support for the os_signpost() API which allows us to annotate the timeline with information that's meaningful to LLVM. This patch causes timer start and end events to emit signposts. When used with -time-passes, this causes the passes to be annotated on the Instruments timeline. In addition to visually showing the duration of passes on the timeline, it also allows us to filter the profile and allocations instrument down to an individual pass allowing us to find the issues within that pass without being drowned out by the noise from other parts of the compiler. Using this in conjunction with the Time Profiler (in high frequency mode) and the Allocations instrument is how I found the SparseBitVector that should have been a BitVector and the DenseMap that could be replaced by a sorted vector a couple months ago. I added NamedRegionTimers to TableGen and used the resulting annotations to identify the slow portions of the Register Info Emitter. Some of these were placed according to educated guesses while others were placed according to hot functions from a previous profile. From there I filtered the profile to a slow portion and the aforementioned issues stood out in the profile. To use this feature enable LLVM_SUPPORT_XCODE_SIGNPOSTS in CMake and run the compiler under Instruments with -time-passes like so: instruments -t 'Time Profiler' bin/llc -time-passes -o - input.ll' Then open the resulting trace in Instruments. There was a talk at WWDC 2018 that explained the feature which can be found at https://developer.apple.com/videos/play/wwdc2018/405/ if you'd like to know more about it. Reviewers: bogner Reviewed By: bogner Subscribers: jdoerfert, mgorny, kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D52954 llvm-svn: 354365
-
Jordan Rupprecht authored
Summary: As suggested in rL353995 Reviewers: compnerd Reviewed By: compnerd Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58298 llvm-svn: 354364
-
Simon Pilgrim authored
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361 Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 354363
-
Serge Guelton authored
llvm-svn: 354362
-
Simon Pilgrim authored
llvm-svn: 354361
-
Simon Pilgrim authored
Reduced test case for the regression caused in D57888/rL353610 llvm-svn: 354360
-
Nikita Popov authored
Directly use the correct shift amount type if it is possible, and future-proof the code against vectors. The added test makes sure that bitwidths that do not fit into the shift amount type do not assert. Split out from D57997. llvm-svn: 354359
-
Simon Pilgrim authored
This was the cause of the regression in D57888 - the commuted load pattern wasn't hidden by the predicate so once we enabled v4i32 blends on SSE41+ targets then isel was incorrectly matched against AVX2+ instructions. llvm-svn: 354358
-
Craig Topper authored
klocwork critical issues in CG files: Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D58363 llvm-svn: 354357
-
Craig Topper authored
When parsing a sequence of tokens beginning with {, it will hit an assert and crash if the token afterwards is not an identifier. Instead of this, return a more verbose error as seen elsewhere in the function. Patch by Brandon Jones (BrandonTJones) Differential Revision: https://reviews.llvm.org/D57375 llvm-svn: 354356
-
Craig Topper authored
[X86] Filter out tuning feature flags and a few ISA feature flags when checking for function inline compatibility. Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining. There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present. Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible. I've based the implementation here of the similar code in AMDGPU. Differential Revision: https://reviews.llvm.org/D58371 llvm-svn: 354355
-
Matt Arsenault authored
llvm-svn: 354354
-
Hans Wennborg authored
llvm-svn: 354353
-
Hans Wennborg authored
llvm-svn: 354352
-
Matt Arsenault authored
llvm-svn: 354348
-
Simon Pilgrim authored
The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands. Follow up to D58053 llvm-svn: 354346
-
Matt Arsenault authored
llvm-svn: 354345
-
James Henderson authored
yaml2obj/obj2yaml previously supported SHT_LOOS, SHT_HIOS, and SHT_LOPROC for section types. These are simply values that delineate a range and don't really make sense as valid values. For example if a section has type value 0x70000000, obj2yaml shouldn't print this value as SHT_LOPROC. Additionally, this was missing the three other range markers (SHT_HIPROC, SHT_LOUSER and SHT_HIUSER). This change removes these three range markers. It also adds support for specifying the type as an integer, to allow section types that LLVM doesn't know about. Reviewed by: grimar Differential Revision: https://reviews.llvm.org/D58383 llvm-svn: 354344
-
Simon Pilgrim authored
llvm-svn: 354343
-
Matt Arsenault authored
llvm-svn: 354342
-
Simon Pilgrim authored
This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads. It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load. Differential Revision: https://reviews.llvm.org/D58053 llvm-svn: 354340
-
George Rimar authored
This patch adds support for parsing dumping the .gnu.version section. Description of the section is: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symversion.html#SYMVERTBL Differential revision: https://reviews.llvm.org/D58280 llvm-svn: 354338
-
George Rimar authored
Recommit r354328, r354329 "[obj2yaml][yaml2obj] - Add support of parsing/dumping of the .gnu.version_r section." Fix: Replace assert(!IO.getContext() && "The IO context is initialized already"); with assert(IO.getContext() && "The IO context is not initialized"); (this was introduced in r354329, where I tried to quickfix the darwin BB and seems copypasted the assert from the wrong place). Original commit message: The section is described here: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverrqmts.html Patch just teaches obj2yaml/yaml2obj to dump and parse such sections. We did the finalization of string tables very late, and I had to move the logic to make it a bit earlier. That was needed in this patch since .gnu.version_r adds strings to .dynstr. This might also be useful for implementing other special sections. Everything else changed in this patch seems to be straightforward. Differential revision: https://reviews.llvm.org/D58119 llvm-svn: 354335
-
Alex Bradbury authored
llvm-svn: 354333
-
George Rimar authored
Revert r354328, r354329 "[obj2yaml][yaml2obj] - Add support of parsing/dumping of the .gnu.version_r section." Something went wrong. Bots are unhappy: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/44113/steps/test/logs/stdio llvm-svn: 354332
-
George Rimar authored
Bot: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/30188/steps/build_Lld/logs/stdio Error: /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/lib/ObjectYAML/ELFYAML.cpp:1013:15: error: unused variable 'Object' [-Werror,-Wunused-variable] const auto *Object = static_cast<ELFYAML::Object *>(IO.getContext()); ^ /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/lib/ObjectYAML/ELFYAML.cpp:1023:15: error: unused variable 'Object' [-Werror,-Wunused-variable] const auto *Object = static_cast<ELFYAML::Object *>(IO.getContext()); Fix: change const auto *Object = static_cast<ELFYAML::Object *>(IO.getContext()); assert(Object && "The IO context is not initialized"); to assert(!IO.getContext() && "The IO context is initialized already"); llvm-svn: 354329
-
George Rimar authored
The section is described here: https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/symverrqmts.html Patch just teaches obj2yaml/yaml2obj to dump and parse such sections. We did the finalization of string tables very late, and I had to move the logic to make it a bit earlier. That was needed in this patch since .gnu.version_r adds strings to .dynstr. This might also be useful for implementing other special sections. Everything else changed in this patch seems to be straightforward. Differential revision: https://reviews.llvm.org/D58119 llvm-svn: 354328
-
Alex Bradbury authored
Re-organise calling convention tests to prepare for ilp32f and ilp32d hard float ABI tests. It's also clear that we need to introduce similar tests for lp64. llvm-svn: 354323
-
George Rimar authored
Fix BB after r354319 "[yaml2obj] - Do not skip zeroes blocks if there are relocations against them." Fix: move the test to x86 folder. Seems it is needed, because llvm-objdump invocation used in test has -D (disasm) flag. BB: http://lab.llvm.org:8011/builders/clang-hexagon-elf/builds/23016 /local/buildbot/slaves/hexagon-build-02/clang-hexagon-elf/stage1/bin/llvm-objdump: error: '/local/buildbot/slaves/hexagon-build-02/clang-hexagon-elf/stage1/test/tools/llvm-objdump/Output/disasm-zeroes-relocations.test.tmp': can't find target: : error: unable to get target for 'x86_64--', see --version and --triple. . llvm-svn: 354322
-
George Rimar authored
This is for -D -reloc combination. With this patch, we do not skip the zero bytes that have a relocation against them when -reloc is used. If -reloc is not used, then the behavior will be the same. Differential revision: https://reviews.llvm.org/D58174 llvm-svn: 354319
-
George Rimar authored
This fixes https://bugs.llvm.org/show_bug.cgi?id=40339 Previously if the addresses were set in YAML they were ignored for .dynsym and .dynstr sections. The patch fixes that. Differential revision: https://reviews.llvm.org/D58168 llvm-svn: 354318
-
Diana Picus authored
Both files mentioned in the comment now include TargetOpcodes.def. Just mention that directly. llvm-svn: 354316
-
Max Kazantsev authored
We are planning to be able to delete the current loop in LoopSimplifyCFG in the future. Add API to notify the loop pass manager that it happened. llvm-svn: 354314
-
Max Kazantsev authored
llvm-svn: 354313
-
Diana Picus authored
Same as arm mode. llvm-svn: 354310
-
Fangrui Song authored
Summary: After (x,y) is inserted, depth-based search finds all affected v that satisfies: depth(nca(x,y))+1 < depth(v) && there exists a path P from y to v where every w on P satisfies depth(v) <= depth(w) This reduces to a widest path problem (maximizing the depth of the minimum vertex in the path) which can be solved by a modified version of Dijkstra with a bucket queue (named depth-based search in the paper). The algorithm visits vertices in decreasing order of bucket number. However, the current code misused priority_queue to extract them in increasing order. I cannot think of a failing scenario but it surely may process vertices more than once due to the local usage of Processed. This patch fixes this bug and simplifies/optimizes the code a bit. Also add more comments. Reviewers: kuhar Reviewed By: kuhar Subscribers: kristina, jdoerfert, llvm-commits, NutshellySima, brzycki Tags: #llvm Differential Revision: https://reviews.llvm.org/D58349 llvm-svn: 354306
-