- Aug 20, 2017
-
-
Sam Elliott authored
Summary: This updates the Inliner to only add a single Optimization Remark when Inlining, rather than an Analysis Remark and an Optimization Remark. Fixes https://bugs.llvm.org/show_bug.cgi?id=33786 Reviewers: anemet, davidxl, chandlerc Reviewed By: anemet Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D36054 llvm-svn: 311273
-
Igor Breger authored
Differential Revision: https://reviews.llvm.org/D34978 llvm-svn: 311272
-
Sam Elliott authored
Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 llvm-svn: 311271
-
Chandler Carruth authored
cmov self-refrencing. Pointed out by Amjad Aboud in code review, test case minorly simplified from the one he posted. llvm-svn: 311267
-
Craig Topper authored
We can load the memory VT and check for natural alignment. This also adds a new preferNonTemporalLoad helper that checks the correct subtarget feature based on the load size. This shrinks the isel table by at least 5000 bytes by allowing more reordering and combining to occur. llvm-svn: 311266
-
Craig Topper authored
We can read the memoryVT and get its store size directly from the SDNode to check its alignment. llvm-svn: 311265
-
Craig Topper authored
[AVX512] Use alignedstore256 in a pattern that's emitting a 256-bit movaps from an extract subvector operation. llvm-svn: 311263
-
- Aug 19, 2017
-
-
Victor Leschuk authored
Otherwise it can be used uninitialized in move ctor. llvm-svn: 311262
-
Martin Storsjö authored
Differential Revision: https://reviews.llvm.org/D36930 llvm-svn: 311260
-
Martin Storsjö authored
This is the exact same fix as in SVN r247254. In that commit, the fix was applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue is present for VZIP/VUZP as well. This fixes PR33921. Differential Revision: https://reviews.llvm.org/D36899 llvm-svn: 311258
-
Teresa Johnson authored
The tests added in r311254 require a target triple since they are running through code generation. Fix bot failures by requiring an x86 target. llvm-svn: 311257
-
Konstantin Zhuravlyov authored
- Move *load* functions before *atomic* functions - Move *store* functions before *atomic* functions llvm-svn: 311256
-
Jatin Bhateja authored
Summary: If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Reviewers: zvi, delena, RKSimon, thakis Reviewed By: RKSimon Subscribers: chandlerc, eladcohen, llvm-commits Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 311255
-
Teresa Johnson authored
Summary: Follow up to fix in r311023, which fixed the case where the combined index is written to disk. The same samplePGO logic exists for the in-memory index when computing imports, so we need to filter out GlobalVariable summaries there too. Reviewers: davidxl Subscribers: inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D36919 llvm-svn: 311254
-
Craig Topper authored
The SSE MOVDDUP instruction only loads 64-bits with no alignment restriction. llvm-svn: 311253
-
Jatin Bhateja authored
Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311252
-
Jatin Bhateja authored
llvm-svn: 311247
-
Jatin Bhateja authored
Summary: This reverts commit rL311242. Differential Revision: https://reviews.llvm.org/D36924 llvm-svn: 311246
-
Jatin Bhateja authored
llvm-svn: 311242
-
Victor Leschuk authored
llvm-svn: 311238
-
Victor Leschuk authored
llvm-svn: 311236
-
Victor Leschuk authored
When run manually it fails, but when run under buildbot it causes hang. llvm-svn: 311230
-
Chandler Carruth authored
a function into itself. We tried to fix this before in r306495 but that got reverted as the assert was actually hit. This fixes the original bug (which we seem to have lost track of with the revert) by blocking a second remapping when the function being inlined is also the caller and the remapping could succeed but erroneously. The included test case would actually load from an inlined copy of the alloca before this change, failing to load the stored value and miscompiling. Many thanks to Richard Smith for diagnosing a user miscompile to this bug, and to Kyle for the first attempt and initial analysis and David Li for remembering the issue and how to fix it and suggesting the patch. I'm just stitching it together and landing it. =] llvm-svn: 311229
-
Chandler Carruth authored
tested and why. llvm-svn: 311228
-
Chandler Carruth authored
llvm-svn: 311227
-
Chandler Carruth authored
operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | | 1.0 | | | | | CP | mov rcx, rdi | 0* | | | | | | | | xor edi, edi | 2^ | 0.1 | 0.6 | 0.5 0.5 | 0.5 0.5 | | 0.4 | CP | cmp byte ptr [rsi+0xf], 0xf | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov rax, qword ptr [rsi] | 3 | 1.8 | 0.6 | | | | 0.6 | CP | cmovbe rax, rdi | 2^ | | | 0.5 0.5 | 0.5 0.5 | | 1.0 | | cmp byte ptr [rcx+0xf], 0x10 | 0F | | | | | | | | jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | | --------------------------------------------------------------------- | 1 | 0.5 | 0.5 | | | | | | mov rax, rdi | 0* | | | | | | | | xor edi, edi | 2^ | 0.5 | 0.5 | 1.0 1.0 | | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | 0.5 | 0.5 | | | | | | mov ecx, 0x0 | 1 | | | | | | 1.0 | CP | jnbe 0x39 | 2^ | | | | 1.0 1.0 | | 1.0 | CP | cmp byte ptr [rax+0xf], 0x10 | 0F | | | | | | | | jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 0* | | | | | | | | | | mov rcx, rdi | 0* | | | | | | | | | | xor edi, edi | 2^ | | | 0.5 0.5 | 0.5 0.5 | | 1.0 | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | | | mov rax, qword ptr [rsi] | 3 | 1.0 | 1.0 | | | | | 1.0 | | | cmovbe rax, rdi | 2^ | 0.5 | | 0.5 0.5 | 0.5 0.5 | | | 0.5 | | | cmp byte ptr [rcx+0xf], 0x10 | 0F | | | | | | | | | | jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd | Num Of | Ports pressure in cycles | | | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | | --------------------------------------------------------------------------------- | 0* | | | | | | | | | | mov rax, rdi | 0* | | | | | | | | | | xor edi, edi | 2^ | | | 1.0 1.0 | | | 1.0 | | | | cmp byte ptr [rsi+0xf], 0xf | 1 | | 1.0 | | | | | | | | mov ecx, 0x0 | 1 | | | | | | | 1.0 | | | jnbe 0x39 | 2^ | 1.0 | | | 1.0 1.0 | | | | | | cmp byte ptr [rax+0xf], 0x10 | 0F | | | | | | | | | | jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226
-
Chandler Carruth authored
The primary thing that this accomplishes is to allow future re-use of these routines in more contexts and clarify the behavior w.r.t. loops. For example, if handling outer loops is desirable, doing so in a inside-out order becomes straight forward because it walks the loop nest itself (rather than walking the function's basic blocks) and de-couples the CMOV rewriting from the loop structure as there isn't actually anything loop-specific about this transformation. This patch should be essentially a no-op. It potentially changes the order in which we visit the inner loops, but otherwise should merely set the stage for subsequent changes. Differential Revision: https://reviews.llvm.org/D36783 llvm-svn: 311225
-
Dinar Temirbulatov authored
[SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI. llvm-svn: 311223
-
Dinar Temirbulatov authored
[SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions. Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36766 llvm-svn: 311221
-
Matthias Braun authored
This doesn't really change anything as Tablegen would have inferred those indices anyway; defining them gives us shorter names that are easier to read while debugging (i.e. "ssub_4" rather than "dsub2_then_ssub_0") llvm-svn: 311218
-
Adrian Prantl authored
They won't affect the DWARF output, but they will mess with the sorting of the fragments. This fixes the crash reported in PR34159. https://bugs.llvm.org/show_bug.cgi?id=34159 llvm-svn: 311217
-
Eric Beckmann authored
mt.exe performs a tree merge where certain element nodes are combined into one. This introduces the possibility of xml namespaces conflicting with each other. The original mt.exe has a hierarchy whereby certain namespace names can override others, and nodes that would then end up in ambigious namespaces have their namespaces explicitly defined. This namespace handles this merging process. llvm-svn: 311215
-
Eugene Zelenko authored
[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311212
-
Xinliang David Li authored
llvm-svn: 311209
-
Xinliang David Li authored
Differential Revsion: http://reviews.llvm.org/D36864 llvm-svn: 311208
-
Amjad Aboud authored
Differential Revision: https://reviews.llvm.org/D36679 llvm-svn: 311206
-
Max Kazantsev authored
Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations. In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned predicates, but we did not check this for `S` which can in theory be negative. This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative, and it adds a test where such situation may lead to incorrect conditions calculation. Differential Revision: https://reviews.llvm.org/D36873 llvm-svn: 311205
-
- Aug 18, 2017
-
-
Justin Bogner authored
Since stripDebugInfo runs before the verifier when reading IR, we can end up in a situation where we read some invalid IR but don't know its invalid yet. Before this patch we would crash in stripDebugInfo when given IR with a completely empty basic block, and after we get a nice error from the verifier instead. llvm-svn: 311202
-
Jonas Devlieghere authored
This patch hides the .debug_str offset and DIE reference offsets into the CU when llvm-dwarfdump is invoked with -brief. Differential Revision: https://reviews.llvm.org/D36835 llvm-svn: 311201
-
Simon Pilgrim authored
llvm-svn: 311198
-