- Feb 07, 2014
-
-
Venkatraman Govindaraju authored
[Sparc] Emit correct encoding for atomic instructions. Also, add support for parsing CAS instructions to test the CAS encoding. llvm-svn: 200963
-
Venkatraman Govindaraju authored
llvm-svn: 200962
-
Venkatraman Govindaraju authored
llvm-svn: 200961
-
Manman Ren authored
Fix a bug triggered in IfConverterTriangle when CvtBB has multiple predecessors by getting the weights before removing a successor. llvm-svn: 200958
-
Jim Grosbach authored
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use them to match the relevant pextr store instructions. The test widen_load-2.ll requires a slight change because with the stores gone, the remaining instructions are scheduled in a different order. Add test cases for SSE4 and AVX variants. Resolves rdar://13414672. Patch by Adam Nemet <anemet@apple.com>. llvm-svn: 200957
-
Rafael Espindola authored
llvm-svn: 200955
-
- Feb 06, 2014
-
-
Quentin Colombet authored
mode. Basically the idea is to transform code like this: %idx = add nsw i32 %a, 1 %sextidx = sext i32 %idx to i64 %gep = gep i8* %myArray, i64 %sextidx load i8* %gep Into: %sexta = sext i32 %a to i64 %idx = add nsw i64 %sexta, 1 %gep = gep i8* %myArray, i64 %idx load i8* %gep That way the computation can be folded into the addressing mode. This transformation is done as part of the addressing mode matcher. If the matching fails (not profitable, addressing mode not legal, etc.), the matcher will revert the related promotions. <rdar://problem/15519855> llvm-svn: 200947
-
Tom Stellard authored
llvm-svn: 200935
-
Tom Stellard authored
llvm-svn: 200934
-
Tom Stellard authored
llvm-svn: 200933
-
Tom Stellard authored
There was a problem with the old pattern, so we were copying some larger immediates into registers when we could have been encoding them in the instruction. llvm-svn: 200932
-
Tim Northover authored
The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 llvm-svn: 200928
-
Nick Lewycky authored
llvm-svn: 200907
-
Chandler Carruth authored
The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it *really* hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. llvm-svn: 200903
-
Juergen Ributzka authored
During DAGCombine visitShiftByConstant assumes that certain binary operations with only constant operands can always be folded successfully. This is no longer true when the constant is opaque. This commit fixes visitShiftByConstant by not performing the optimization for opaque constants. Otherwise we would end up in an infinite DAGCombine loop. llvm-svn: 200900
-
Manman Ren authored
225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898
-
Kevin Enderby authored
the << and >> bitwise operators. rdar://15975725 llvm-svn: 200896
-
Paul Robinson authored
Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892
-
- Feb 05, 2014
-
-
Manman Ren authored
Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886
-
Quentin Colombet authored
find a register. The idea is to choose a color for the variable that cannot be allocated and recolor its interferences around. Unlike the current register allocation scheme, it is allowed to change the color of an already assigned (but maybe not splittable or spillable) live interval while propagating this change to its neighbors. In other word, there are two things that may help finding an available color: - Already assigned variables (RS_Done) can be recolored to different color. - The recoloring allows to catch solutions that needs to touch more that just the neighbors of the current allocated variable. E.g., vA can use {R1, R2 } vB can use { R2, R3} vC can use {R1 } Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and they all interfere. vA is assigned R1 vB is assigned R2 vC tries to evict vA but vA is already done. => Regular register allocation heuristic fails. Last chance recoloring kicks in: vC does as if vA was evicted => vC uses R1. vC is marked as fixed. vA needs to find a color. None are available. vA cannot evict vC: vC is a fixed virtual register now. vA does as if vB was evicted => vA uses R2. vB needs to find a color. R3 is available. Recoloring => vC = R1, vA = R2, vB = R3. <rdar://problem/15947839> llvm-svn: 200883
-
Rafael Espindola authored
Clang itself was not using this. The only way to access it was via llc. llvm-svn: 200862
-
Petar Jovanovic authored
This patch adds NaCl target for Mips. It also forbids indexed loads and stores if the target is NaCl. Patch by Sasa Stankovic. Differential Revision: http://llvm-reviews.chandlerc.com/D2690 llvm-svn: 200855
-
Petar Jovanovic authored
Small code model (and default reloc model) set Reloc::PIC_ in this test, and PIC is not yet supported in MCJIT for MIPS. llvm-svn: 200852
-
Elena Demikhovsky authored
llvm-svn: 200849
-
Logan Chien authored
In Thumb1 mode, bl instruction might be selected for branches between basic blocks in the function if the offset is greater than 2KB. However, this might cause SEGV because the destination symbol is not marked as thumb function and the execution mode will be reset to ARM mode. Since we are sure that these symbols are in the same data fragment, we can simply resolve these local symbols, and don't emit any relocation information for this bl instruction. llvm-svn: 200842
-
Elena Demikhovsky authored
llvm-svn: 200837
-
Michel Danzer authored
Fixes opencl-example if_* tests with radeonsi. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469 Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200830
-
Kai Nacke authored
This fixes PR18554. Reviewers: Renato Golin, Keith Walker llvm-svn: 200826
-
Elena Demikhovsky authored
Added VPTESTNM instruction. Added a pattern to vselect (lit tests will follow). llvm-svn: 200823
-
Rafael Espindola authored
llvm-svn: 200818
-
Rafael Espindola authored
llvm-svn: 200808
-
Manman Ren authored
llvm-svn: 200807
-
Rafael Espindola authored
llvm-svn: 200803
-
- Feb 04, 2014
-
-
Benjamin Kramer authored
For the odd case of platforms with exp2 available but not ldexp. llvm-svn: 200795
-
Petar Jovanovic authored
Patch implements %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions for MIPS by creating target expression class MipsMCExpr. Patch by Sasa Stankovic. Differential Revision: http://llvm-reviews.chandlerc.com/D2592 llvm-svn: 200783
-
David Peixotto authored
This patch fixes the ldr-pseudo implementation to work when used in inline assembly. The fix is to move arm assembler constant pools from the ARMAsmParser class to the ARMTargetStreamer class. Previously we kept the assembler generated constant pools in the ARMAsmParser object. This does not work for inline assembly because a new parser object is created for each blob of inline assembly. This patch moves the constant pools to the ARMTargetStreamer class so that the constant pool will remain alive for the entire code generation process. An ARMTargetStreamer class is now required for the arm backend. There was no existing implementation for MachO, only Asm and ELF. Instead of creating an empty MachO subclass, we decided to make the ARMTargetStreamer a non-abstract class and provide default (llvm_unreachable) implementations for the non constant-pool related methods. Differential Revision: http://llvm-reviews.chandlerc.com/D2638 llvm-svn: 200777
-
Tom Stellard authored
llvm-svn: 200774
-
Tom Stellard authored
The OpenCL specs say: "The vector versions of the math functions operate component-wise. The description is per-component." Patch by: Jan Vesely Reviewed-by:
Tom Stellard <thomas.stellard@amd.com> Signed-off-by:
Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 200773
-
Tim Northover authored
rdar://problem/13729466 llvm-svn: 200771
-
Tim Northover authored
There was an extremely confusing proliferation of LLVM intrinsics to implement the vacge & vacgt instructions. This combines them all into two polymorphic intrinsics, shared across both backends. llvm-svn: 200768
-