- Feb 06, 2014
-
-
Tim Northover authored
The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 llvm-svn: 200928
-
Eli Bendersky authored
llvm-svn: 200927
-
Sergey Matveev authored
llvm-svn: 200926
-
Sergey Matveev authored
llvm-svn: 200925
-
Alexander Kornienko authored
Summary: This patch introduces several improvements to clang-tidy diagnostic; 1. Make filtering of messages from non-user code more reliable. Output an error when it or any of the related notes touches user code. This fixes an assertion when an error has a location in a system header, and one of the notes relates to user code. 2. In order for 1. to work, subscribe to the static analyzer diagnostics using a custom PathDiagnosticConsumer. 3. Enable colors on supported terminals. 4. Output FixItHints. Reviewers: djasper Reviewed By: djasper CC: cfe-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2714 llvm-svn: 200924
-
https://code.google.com/p/thread-sanitizer/issues/detail?id=47Alexander Potapenko authored
llvm-svn: 200923
-
Alexander Potapenko authored
This should fix https://code.google.com/p/thread-sanitizer/issues/detail?id=47. llvm-svn: 200922
-
David Majnemer authored
Properly support fields that come from anonymous unions and structs when used as template arguments for pointer to data member params. llvm-svn: 200921
-
David Majnemer authored
Properly determine the inheritance model when dealing with nullptr: - If a nullptr template argument is being checked against pointer-to-member parameter, nail down an inheritance model. N.B. We will chose an inheritance model even if we won't ultimately choose the template to instantiate! Cooky, right? - Null pointer-to-datamembers have a virtual base table offset of -1, not zero. Previously, we chose an offset of 0. llvm-svn: 200920
-
Puyan Lotfi authored
The aim in this patch is to reduce work that VirtRegRewriter needs to do when telling MachineRegisterInfo which physregs are in use. Up until now VirtRegRewriter::rewrite has been doing rewriting and populating def info and then proceeding to set whether a physreg is used based this info for every physreg that the target provides. This can be expensive when a target has an unusually high number of supported physregs, and is a noticeable chunk of compile time for small programs on such targets. So to reduce compile time, this patch simply adds the use of a SparseSet to the rewrite function that is used to flag each physreg that is encountered in a MachineFunction. Afterward, rather than iterating over the set of all physregs for a given target to set the physregs used in MachineRegisterInfo, the new way is to iterate over the set of physregs that were actually encountered and set in the SparseSet. This improves compile time because the existing rewrite function was iterating over all MachineOperands already, and because the iterations afterward to setPhysRegUsed is reduced by use of the SparseSet data. llvm-svn: 200919
-
Tim Northover authored
I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918
-
Puyan Lotfi authored
programs on targets with large register files. The root of the compile time overhead was in the use of llvm::SmallVector to hold PhysRegEntries, which resulted in slow-down from calling llvm::SmallVector::assign(N, 0). In contrast std::vector uses the faster __platform_bzero to zero out primitive buffers when assign is called, while SmallVector uses an iterator. The fix for this was simply to replace the SmallVector with a dynamically allocated buffer and to initialize or reinitialize the buffer based on the total registers that the target architecture requires. The changes support cases where a pass manager may be reused for different targets, and note that the PhysRegEntries is allocated using calloc mainly for good for, and also to quite tools like Valgrind (see comments for more info on this). There is an rdar to track the fact that SmallVector doesn't have platform specific speedup optimizations inside of it for things like this, and I'll create a bugzilla entry at some point soon as well. TL;DR: This fix replaces the expensive llvm::SmallVector<unsigned char>::assign(N, 0) with a call to calloc for N bytes which is much faster because SmallVector's assign uses iterators. llvm-svn: 200917
-
Dmitry Vyukov authored
we don't use assembly files llvm-svn: 200916
-
Dmitry Vyukov authored
llvm-svn: 200915
-
Dmitry Vyukov authored
llvm-svn: 200914
-
Puyan Lotfi authored
large register files. The omission of Queries.clear() is perfectly safe because LiveIntervalUnion::Query doesn't contain any data that needs freeing and because LiveRegMatrix::runOnFunction happens to reset the OwningArrayPtr holding Queries every time it is run, so there's no need to zero out the queries either. Not having to do this for very large numbers of physregs is a noticeable constant cost reduction in compilation of small programs. llvm-svn: 200913
-
Simon Atanasyan authored
llvm-svn: 200911
-
Kostya Serebryany authored
llvm-svn: 200910
-
NAKAMURA Takumi authored
clang/test/FixIt/fixit-unicode-with-utf8-output.c has begun complained since LLVM r200885. Although it is changes for StringRef, it brought LLVM_ON_WIN32 to Support/Locale.cpp. Before r200885, LLVM_ON_WIN32 was undefined in Locale.cpp! FIXME: We should consider i18n on win32. llvm-svn: 200909
-
Kostya Serebryany authored
[asan] introduce two functions that will allow implementations of C++ garbage colection to work with asan's fake stack llvm-svn: 200908
-
Nick Lewycky authored
llvm-svn: 200907
-
Craig Topper authored
llvm-svn: 200906
-
Chandler Carruth authored
build but spectacularly changed behavior of the C++98 build. =] This shows my one problem with not having unittests -- basic API expectations aren't well exercised by the integration tests because they *happen* to not come up, even though they might later. I'll probably add a basic unittest to complement the integration testing later, but I wanted to revive the bots. llvm-svn: 200905
-
Marshall Clow authored
Fix PR17221 - can't catch virtual base classes when throwing derived NULL pointers. Specifically, libc++abi would crash when you tried it. llvm-svn: 200904
-
Chandler Carruth authored
The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it *really* hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. llvm-svn: 200903
-
Chandler Carruth authored
in my next patch. Sorry for the breakage. llvm-svn: 200902
-
Chandler Carruth authored
necessary until we add analyses to the driver, but I have such an analysis ready and wanted to split this out. This is actually exercised by the existing tests of the new pass manager as the analysis managers are cross-checked and validated by the function and module managers. llvm-svn: 200901
-
Juergen Ributzka authored
During DAGCombine visitShiftByConstant assumes that certain binary operations with only constant operands can always be folded successfully. This is no longer true when the constant is opaque. This commit fixes visitShiftByConstant by not performing the optimization for opaque constants. Otherwise we would end up in an infinite DAGCombine loop. llvm-svn: 200900
-
Serge Pavlov authored
In the following code: struct A { static const int sz; }; template<class T> void f() { T arr[A::sz]; } the array 'arr' is represented as a variable size array in the template. If 'A::sz' gets value below in the translation unit, the array in instantiation can turn into constant size array. This change fixes PR18633. Differential Revision: http://llvm-reviews.chandlerc.com/D2688 llvm-svn: 200899
-
Manman Ren authored
225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898
-
Richard Smith authored
using-declaration, and they declare the same function (either because the using-declaration is in the same namespace as the declaration it imports, or because they're both extern "C"), they do not conflict. llvm-svn: 200897
-
Kevin Enderby authored
the << and >> bitwise operators. rdar://15975725 llvm-svn: 200896
-
Rafael Espindola authored
It is only used in MachObjectWriter.cpp. Another leftover from early days of ELF in MC. llvm-svn: 200895
-
Rafael Espindola authored
This is a nop. doesSectionRequireSymbols is only used from isSymbolLinkerVisible. isSymbolLinkerVisible only use from ELF was in if (!Asm.isSymbolLinkerVisible(Symbol) && !Symbol.isUndefined()) return false; if (Symbol.isTemporary()) return false; If the symbol is a temporary this code returns false and it is irrelevant if we take the first if or not. If the symbol is not a temporary, Asm.isSymbolLinkerVisible returns true without ever calling doesSectionRequireSymbols. This was an horrible leftover from when support for ELF was first added. llvm-svn: 200894
-
Manman Ren authored
llvm-svn: 200893
-
Paul Robinson authored
Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892
-
Manman Ren authored
llvm-svn: 200891
-
Rafael Espindola authored
llvm-svn: 200890
-
Nick Lewycky authored
has a more precise type. llvm-svn: 200889
-
Matt Arsenault authored
llvm-svn: 200888
-