- Apr 08, 2012
-
-
Chandler Carruth authored
optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294
-
Chandler Carruth authored
in TargetLowering. There was already a FIXME about this location being odd. The interface is simplified as a consequence. This will also make it easier to change TLS models when compiling with PIE. llvm-svn: 154292
-
Benjamin Kramer authored
EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it. llvm-svn: 154288
-
Chandler Carruth authored
where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is *not* in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287
-
Chandler Carruth authored
debugging. llvm-svn: 154286
-
Chandler Carruth authored
GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285
-
Nadav Rotem authored
Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284
-
Bill Wendling authored
llvm-svn: 154283
-
Bill Wendling authored
llvm-svn: 154282
-
Bill Wendling authored
llvm-svn: 154281
-
Bill Wendling authored
An MDNode has a list of MDNodeOperands allocated directly after it as part of its allocation. Therefore, the Parent of the MDNodeOperands can be found by walking back through the operands to the beginning of that list. Mark the first operand's value pointer as being the 'first' operand so that we know where the beginning of said list is. This saves a *lot* of space during LTO with -O0 -g flags. llvm-svn: 154280
-
Bill Wendling authored
value pointer by making the value pointer into a pointer-int pair with 2 bits available for flags. llvm-svn: 154279
-
Craig Topper authored
Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272
-
- Apr 07, 2012
-
-
Craig Topper authored
Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268
-
Craig Topper authored
llvm-svn: 154267
-
Nadav Rotem authored
shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266
-
Duncan Sands authored
reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265
-
Chandler Carruth authored
optimizers could do this for us, but expecting partial SROA of classes with template methods through cloning is probably expecting too much heroics. With this change, the begin/end pointer pairs which indicate the status of each loop iteration are actually passed directly into each layer of the combine_data calls, and the inliner has a chance to see when most of the combine_data function could be deleted by inlining. Similarly for 'length'. We have to be careful to limit the places where in/out reference parameters are used as those will also defeat the inliner / optimizers from properly propagating constants. With this change, LLVM is able to fully inline and unroll the hash computation of small sets of values, such as two or three pointers. These now decompose into essentially straight-line code with no loops or function calls. There is still one code quality problem to be solved with the hashing -- LLVM is failing to nuke the alloca. It removes all loads from the alloca, leaving only lifetime intrinsics and dead(!!) stores to the alloca. =/ Very unfortunate. llvm-svn: 154264
-
Chandler Carruth authored
speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263
-
Benjamin Kramer authored
Found by inspection. llvm-svn: 154262
-
rdar://problem/11203543Bob Wilson authored
The tLDRr instruction with the last register operand set to the zero register prints in assembly as if no register was specified, and the assembler encodes it as a tLDRi instruction with a zero immediate. With the integrated assembler, that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which is broken. Emit the instruction as tLDRi with a zero immediate. I don't know if there's a good way to write a testcase for this. Suggestions welcome. Opportunities for follow-up work: 1) The asm printer should complain if a non-optional register operand is set to the zero register, instead of silently dropping it. 2) The integrated assembler should complain in the same situation, instead of silently emitting the operand as "r0". llvm-svn: 154261
-
Hongbin Zheng authored
llvm-svn: 154249
-
NAKAMURA Takumi authored
Cygwin-1.7 supports dw2. Some recent mingw distros support one, too. I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin. llvm-svn: 154247
-
Alexis Hunt authored
string. llvm-svn: 154243
-
Alexis Hunt authored
by default. This is a behaviour configurable in the MCAsmInfo. I've decided to turn it on by default in (possibly optimistic) hopes that most assemblers are reasonably sane. If this proves a problem, switching to default seems reasonable. I'm not sure if this is the opportune place to test, but it seemed good to make sure it was tested somewhere. llvm-svn: 154235
-
Jim Grosbach authored
llvm-svn: 154226
-
- Apr 06, 2012
-
-
Jakob Stoklund Olesen authored
llvm-svn: 154210
-
Jakob Stoklund Olesen authored
After register masks were introdruced to represent the call clobbers, it is no longer necessary to have duplicate instruction for iOS. llvm-svn: 154209
-
Akira Hatanaka authored
llvm-svn: 154202
-
Chandler Carruth authored
which exists for this purpose. llvm-svn: 154199
-
Sean Callanan authored
disassembler requires a MCSubtargetInfo and a MCInstrInfo to exist in order to initialize the instruction printer and disassembler; however, although the printer and disassembler keep references to these objects they do not own them. Previously, the MCSubtargetInfo and MCInstrInfo objects were just leaked. I have extended LLVMDisasmContext to own these objects and delete them when it is destroyed. llvm-svn: 154192
-
Jakob Stoklund Olesen authored
ARM and Thumb2 mode can use cmn instructions to compare against negative immediates. Thumb1 mode can't. llvm-svn: 154183
-
David Chisnall authored
parameter until we have a more sensible API for doing the same thing. Reviewed by Chandler. llvm-svn: 154180
-
Chandler Carruth authored
simplification has been performed. This is a bit less efficient (requires another ilist walk of the basic blocks) but shouldn't matter in practice. More importantly, it's just too much work to keep track of all the various ways the return instructions can be mutated while simplifying them. This fixes yet another crasher, reported by Daniel Dunbar. llvm-svn: 154179
-
Chandler Carruth authored
Smith for pointing this out in review. llvm-svn: 154178
-
Duncan Sands authored
The modifications are a lot more trivial than they appear to be in the diff! llvm-svn: 154174
-
Craig Topper authored
llvm-svn: 154172
-
Benjamin Kramer authored
llvm-svn: 154171
-
Benjamin Kramer authored
DenseMap: Perform the pod-like object optimization when the value type is POD-like, not the DenseMapInfo for it. Purge now unused template arguments. This has been broken since r91421. Patch by Lubos Lunak! llvm-svn: 154170
-
Craig Topper authored
Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413. llvm-svn: 154166
-