- Apr 08, 2012
-
-
Chandler Carruth authored
where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is *not* in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287
-
Chandler Carruth authored
debugging. llvm-svn: 154286
-
Chandler Carruth authored
GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285
-
Nadav Rotem authored
Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284
-
Bill Wendling authored
llvm-svn: 154283
-
Bill Wendling authored
llvm-svn: 154282
-
Bill Wendling authored
llvm-svn: 154281
-
Bill Wendling authored
An MDNode has a list of MDNodeOperands allocated directly after it as part of its allocation. Therefore, the Parent of the MDNodeOperands can be found by walking back through the operands to the beginning of that list. Mark the first operand's value pointer as being the 'first' operand so that we know where the beginning of said list is. This saves a *lot* of space during LTO with -O0 -g flags. llvm-svn: 154280
-
Bill Wendling authored
value pointer by making the value pointer into a pointer-int pair with 2 bits available for flags. llvm-svn: 154279
-
Richard Smith authored
converting from std::nullptr_t, the subexpression might have side-effects. llvm-svn: 154278
-
Michael J. Spencer authored
llvm-svn: 154277
-
Michael J. Spencer authored
llvm-svn: 154276
-
Michael J. Spencer authored
llvm-svn: 154275
-
Michael J. Spencer authored
llvm-svn: 154274
-
Francois Pichet authored
ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse. Fixes PR12383. llvm-svn: 154273
-
Craig Topper authored
Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272
-
Simon Atanasyan authored
llvm-svn: 154270
-
Simon Atanasyan authored
llvm-svn: 154269
-
- Apr 07, 2012
-
-
Craig Topper authored
Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268
-
Craig Topper authored
llvm-svn: 154267
-
Nadav Rotem authored
shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266
-
Duncan Sands authored
reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265
-
Chandler Carruth authored
optimizers could do this for us, but expecting partial SROA of classes with template methods through cloning is probably expecting too much heroics. With this change, the begin/end pointer pairs which indicate the status of each loop iteration are actually passed directly into each layer of the combine_data calls, and the inliner has a chance to see when most of the combine_data function could be deleted by inlining. Similarly for 'length'. We have to be careful to limit the places where in/out reference parameters are used as those will also defeat the inliner / optimizers from properly propagating constants. With this change, LLVM is able to fully inline and unroll the hash computation of small sets of values, such as two or three pointers. These now decompose into essentially straight-line code with no loops or function calls. There is still one code quality problem to be solved with the hashing -- LLVM is failing to nuke the alloca. It removes all loads from the alloca, leaving only lifetime intrinsics and dead(!!) stores to the alloca. =/ Very unfortunate. llvm-svn: 154264
-
Chandler Carruth authored
speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263
-
Benjamin Kramer authored
Found by inspection. llvm-svn: 154262
-
rdar://problem/11203543Bob Wilson authored
The tLDRr instruction with the last register operand set to the zero register prints in assembly as if no register was specified, and the assembler encodes it as a tLDRi instruction with a zero immediate. With the integrated assembler, that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which is broken. Emit the instruction as tLDRi with a zero immediate. I don't know if there's a good way to write a testcase for this. Suggestions welcome. Opportunities for follow-up work: 1) The asm printer should complain if a non-optional register operand is set to the zero register, instead of silently dropping it. 2) The integrated assembler should complain in the same situation, instead of silently emitting the operand as "r0". llvm-svn: 154261
-
Hongbin Zheng authored
performance, patched by Johannes Doerfert <johannes@jdoerfert.de>. llvm-svn: 154260
-
Hongbin Zheng authored
llvm-svn: 154259
-
Hongbin Zheng authored
patched by Johannes Doerfert <johannes@jdoerfert.de>. llvm-svn: 154258
-
Benjamin Kramer authored
llvm-svn: 154255
-
NAKAMURA Takumi authored
llvm-svn: 154254
-
Jason Molenda authored
llvm-svn: 154252
-
Tobias Grosser authored
Grouped unrolling means that we unroll a loop such that the different instances of a certain statement are scheduled right after each other, but we do not generate any vector code. The idea here is that we can schedule the bb vectorizer right afterwards and use it heuristics to decide when vectorization should be performed. llvm-svn: 154251
-
Jason Molenda authored
nanoseconds in 32-bit expression would cause pthread_cond_timedwait to time out immediately. Add explicit casts to the TimeValue::TimeValue ctor that takes a struct timeval and change the NanoSecsPerSec etc constants defined in TimeValue to be uint64_t so any other calculations involving these should be promoted to 64-bit even when lldb is built for 32-bit. <rdar://problem/11204073>, <rdar://problem/11179821>, <rdar://problem/11194705>. llvm-svn: 154250
-
Hongbin Zheng authored
llvm-svn: 154249
-
John McCall authored
- The [class.protected] restriction is non-trivial for any instance member, even if the access lacks an object (for example, if it's a pointer-to-member constant). In this case, it is equivalent to requiring the naming class to equal the context class. - The [class.protected] restriction applies to accesses to constructors and destructors. A protected constructor or destructor can only be used to create or destroy a base subobject, as a direct result. - Several places were dropping or misapplying object information. The standard could really be much clearer about what the object type is supposed to be in some of these accesses. Usually it's easy enough to find a reasonable answer, but still, the standard makes a very confident statement about accesses to instance members only being possible in either pointer-to-member literals or member access expressions, which just completely ignores concepts like constructor and destructor calls, using declarations, unevaluated field references, etc. llvm-svn: 154248
-
NAKAMURA Takumi authored
Cygwin-1.7 supports dw2. Some recent mingw distros support one, too. I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin. llvm-svn: 154247
-
Nick Kledzik authored
llvm-svn: 154246
-
Alexis Hunt authored
string. llvm-svn: 154243
-
Nick Kledzik authored
a hello world executable from atoms. There is still much to be flushed out. Added one test case, test/darwin/hello-world.objtxt, which exercises the darwin platform. Added -platform option to lld-core tool to dynamically select platform. llvm-svn: 154242
-