- Jan 22, 2014
-
-
Matt Arsenault authored
llvm-svn: 199836
-
Tim Northover authored
llvm-svn: 199801
-
Chandler Carruth authored
inconsistent results for different orderings of alloca slices. The fundamental issue is that it is just always a mistake to return early from this function. There is no effective early exit to leverage. This patch stops trynig to do so and simplifies the code a bit as a consequence. Original diagnosis and patch by James Molloy with some name tweaks by me in part reflecting feedback from Duncan Smith on the mailing list. llvm-svn: 199771
-
- Jan 20, 2014
-
-
Owen Anderson authored
Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators. This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd. llvm-svn: 199629
-
- Jan 19, 2014
-
-
Benjamin Kramer authored
Also make them vector-aware. llvm-svn: 199608
-
Benjamin Kramer authored
llvm-svn: 199605
-
Benjamin Kramer authored
llvm-svn: 199604
-
Benjamin Kramer authored
llvm-svn: 199602
-
Benjamin Kramer authored
llvm-svn: 199598
-
Chandler Carruth authored
intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the *other* side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the *uses* of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. llvm-svn: 199590
-
Arnold Schwaighofer authored
a reduction. Really. Under certain circumstances (the use list of an instruction has to be set up right - hence the extra pass in the test case) we would not recognize when a value in a potential reduction cycle was used multiple times by the reduction cycle. Fixes PR18526. radar://15851149 llvm-svn: 199570
-
- Jan 18, 2014
-
-
Nick Lewycky authored
Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu! llvm-svn: 199564
-
Benjamin Kramer authored
PR18532. llvm-svn: 199553
-
Owen Anderson authored
Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep). llvm-svn: 199528
-
- Jan 17, 2014
-
-
Kostya Serebryany authored
- add a mode for collecting per-block coverage (-asan-coverage=2). So far the implementation is naive (all blocks are instrumented), the performance overhead on top of asan could be as high as 30%. - Make sure the one-time calls to __sanitizer_cov are moved to function buttom, which in turn required to copy the original debug info into the call insn. Here is the performance data on SPEC 2006 (train data, comparing asan with asan-coverage={0,1,2}): asan+cov0 asan+cov1 diff 0-1 asan+cov2 diff 0-2 diff 1-2 400.perlbench, 65.60, 65.80, 1.00, 76.20, 1.16, 1.16 401.bzip2, 65.10, 65.50, 1.01, 75.90, 1.17, 1.16 403.gcc, 1.64, 1.69, 1.03, 2.04, 1.24, 1.21 429.mcf, 21.90, 22.60, 1.03, 23.20, 1.06, 1.03 445.gobmk, 166.00, 169.00, 1.02, 205.00, 1.23, 1.21 456.hmmer, 88.30, 87.90, 1.00, 91.00, 1.03, 1.04 458.sjeng, 210.00, 222.00, 1.06, 258.00, 1.23, 1.16 462.libquantum, 1.73, 1.75, 1.01, 2.11, 1.22, 1.21 464.h264ref, 147.00, 152.00, 1.03, 160.00, 1.09, 1.05 471.omnetpp, 115.00, 116.00, 1.01, 140.00, 1.22, 1.21 473.astar, 133.00, 131.00, 0.98, 142.00, 1.07, 1.08 483.xalancbmk, 118.00, 120.00, 1.02, 154.00, 1.31, 1.28 433.milc, 19.80, 20.00, 1.01, 20.10, 1.02, 1.01 444.namd, 16.20, 16.20, 1.00, 17.60, 1.09, 1.09 447.dealII, 41.80, 42.20, 1.01, 43.50, 1.04, 1.03 450.soplex, 7.51, 7.82, 1.04, 8.25, 1.10, 1.05 453.povray, 14.00, 14.40, 1.03, 15.80, 1.13, 1.10 470.lbm, 33.30, 34.10, 1.02, 34.10, 1.02, 1.00 482.sphinx3, 12.40, 12.30, 0.99, 13.00, 1.05, 1.06 llvm-svn: 199488
-
- Jan 16, 2014
-
-
Quentin Colombet authored
When registering a pass, a pass can now specify a second construct that takes as argument a pointer to TargetMachine. The PassInfo class has been updated to reflect that possibility. If such a constructor exists opt will use it instead of the default constructor when instantiating the pass. Since such IR passes are supposed to be rare, no specific support has been added to this commit to allow an easy registration of such a pass. In other words, for such pass, the initialization function has to be hand-written (see CodeGenPrepare for instance). Now, codegenprepare can be tested using opt: opt -codegenprepare -mtriple=mytriple input.ll llvm-svn: 199430
-
Owen Anderson authored
llvm-svn: 199427
-
Owen Anderson authored
Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation. llvm-svn: 199425
-
Owen Anderson authored
Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression. llvm-svn: 199424
-
Owen Anderson authored
Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X). llvm-svn: 199420
-
Evgeniy Stepanov authored
flag from clang, and disable zero-base shadow support on all platforms where it is not the default behavior. - It is completely unused, as far as we know. - It is ABI-incompatible with non-zero-base shadow, which means all objects in a process must be built with the same setting. Failing to do so results in a segmentation fault at runtime. - It introduces a backward dependency of compiler-rt on user code, which is uncommon and complicates testing. This is the LLVM part of a larger change. llvm-svn: 199371
-
- Jan 15, 2014
-
-
Hans Wennborg authored
There has been an old FIXME to find the right cut-off for when it's worth analyzing and potentially transforming a switch to a lookup table. The switches always have two or more cases. I could not measure any speed-up by transforming a switch with two cases. A switch with three cases gets a nice speed-up, and I couldn't measure any compile-time regression, so I think this is the right threshold. In a Clang self-host, this causes 480 new switches to be transformed, and reduces the final binary size with 8 KB. llvm-svn: 199294
-
Arnold Schwaighofer authored
strides Fixes PR18480. llvm-svn: 199291
-
- Jan 14, 2014
-
-
Matt Arsenault authored
llvm-svn: 199254
-
Matt Arsenault authored
Bitcasts can't be between address spaces anymore. llvm-svn: 199253
-
Matt Arsenault authored
llvm-svn: 199246
-
Duncan P. N. Exon Smith authored
Reapply r199191, reverted in r199197 because it carelessly broke Other/link-opts.ll. The problem was that calling createInternalizePass("main") would select createInternalizePass(bool("main")) instead of createInternalizePass(ArrayRef<const char *>("main")). This commit fixes the bug. The original commit message follows. Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile*(). <rdar://problem/14334895> llvm-svn: 199244
-
Nico Rieck authored
Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199218
-
Nico Rieck authored
Revert this for now until I fix an issue in Clang with it. This reverts commit r199204. llvm-svn: 199207
-
Nico Rieck authored
Representing dllexport/dllimport as distinct linkage types prevents using these attributes on templates and inline functions. Instead of introducing further mixed linkage types to include linkonce and weak ODR, the old import/export linkage types are replaced with a new separate visibility-like specifier: define available_externally dllimport void @f() {} @Var = dllexport global i32 1, align 4 Linkage for dllexported globals and functions is now equal to their linkage without dllexport. Imported globals and functions must be either declarations with external linkage, or definitions with AvailableExternallyLinkage. llvm-svn: 199204
-
NAKAMURA Takumi authored
Please update also Other/link-opts.ll, in next time. llvm-svn: 199197
-
Duncan P. N. Exon Smith authored
Add API to LTOCodeGenerator to specify a strategy for the -internalize pass. This is a new attempt at Bill's change in r185882, which he reverted in r188029 due to problems with the gold linker. This puts the onus on the linker to decide whether (and what) to internalize. In particular, running internalize before outputting an object file may change a 'weak' symbol into an internal one, even though that symbol could be needed by an external object file --- e.g., with arclite. This patch enables three strategies: - LTO_INTERNALIZE_FULL: the default (and the old behaviour). - LTO_INTERNALIZE_NONE: skip -internalize. - LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden visibility. LTO_INTERNALIZE_FULL should be used when linking an executable. Outputting an object file (e.g., via ld -r) is more complicated, and depends on whether hidden symbols should be internalized. E.g., for ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and LTO_INTERNALIZE_HIDDEN can be used otherwise. However, LTO_INTERNALIZE_FULL is inappropriate, since the output object file will eventually need to link with others. lto_codegen_set_internalize_strategy() sets the strategy for subsequent calls to lto_codegen_write_merged_modules() and lto_codegen_compile*(). <rdar://problem/14334895> llvm-svn: 199191
-
- Jan 13, 2014
-
-
Chandler Carruth authored
can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104
-
Chandler Carruth authored
trees into the Support library. These are all expressed in terms of the generic GraphTraits and CFG, with no reliance on any concrete IR types. Putting them in support clarifies that and makes the fact that the static analyzer in Clang uses them much more sane. When moving the Dominators.h file into the IR library I claimed that this was the right home for it but not something I planned to work on. Oops. So why am I doing this? It happens to be one step toward breaking the requirement that IR verification can only be performed from inside of a pass context, which completely blocks the implementation of verification for the new pass manager infrastructure. Fixing it will also allow removing the concept of the "preverify" step (WTF???) and allow the verifier to cleanly flag functions which fail verification in a way that precludes even computing dominance information. Currently, that results in a fatal error even when you ask the verifier to not fatally error. It's awesome like that. The yak shaving will continue... llvm-svn: 199095
-
Chandler Carruth authored
directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082
-
Chandler Carruth authored
llvm-svn: 199080
-
- Jan 12, 2014
-
-
Hans Wennborg authored
case when the lookup table doesn't have any holes. This means we can build a lookup table for switches like this: switch (x) { case 0: return 1; case 1: return 2; case 2: return 3; case 3: return 4; default: exit(1); } The default case doesn't yield a constant result here, but that doesn't matter, since a default result is only necessary for filling holes in the lookup table, and this table doesn't have any holes. This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB off the resulting clang binary. llvm-svn: 199025
-
- Jan 11, 2014
-
-
Arnold Schwaighofer authored
I saw no compile or execution time regressions on x86_64 -mavx -O3. radar://13075509 llvm-svn: 199015
-
NAKAMURA Takumi authored
Excuse me, I hope msc16 builders would be fine till its end day. Introduce nullptr then. ;) llvm-svn: 199001
-
Diego Novillo authored
1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 llvm-svn: 198973
-