- Dec 05, 2013
-
-
Yi Jiang authored
llvm-svn: 196544
-
Renato Golin authored
Test is platform independent, but I don't want to force vector-width, or that could spoil the pragma test. llvm-svn: 196539
-
Renato Golin authored
The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537
-
Arnold Schwaighofer authored
We were creating external uses for scalar values in MustGather entries that also had a ScalarToTreeEntry (they also are present in a vectorized tuple). This meant we would keep a value 'alive' as a scalar and vectorized causing havoc. This is not necessary because when we create a MustGather vector we explicitly create external uses entries for the insertelement instructions of the MustGather vector elements. Fixes PR18129. radar://15582184 llvm-svn: 196508
-
Alp Toker authored
This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
-
- Dec 03, 2013
-
-
Yunzhong Gao authored
referenced in a way that even the linker does not see. Differential Revision: http://llvm-reviews.chandlerc.com/D2280 llvm-svn: 196300
-
Arnold Schwaighofer authored
clang enables vectorization at optimization levels > 1 and size level < 2. opt should behave similarily. Loop vectorization and SLP vectorization can be disabled with the flags -disable-(loop/slp)-vectorization. llvm-svn: 196294
-
NAKAMURA Takumi authored
llvm/test/Transforms/SampleProfile/syntax.ll: Relax an expression, not to check locale-dependent message. llvm-svn: 196195
-
- Dec 02, 2013
-
-
Kay Tiong Khoo authored
Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned llvm-svn: 196129
-
Diego Novillo authored
The profile file parser needed some tests for its parsing actions. This adds tests for each of the error messages emitted by the parser. llvm-svn: 196106
-
Alp Toker authored
llvm-svn: 196064
-
- Nov 28, 2013
-
-
Stephen Canon authored
llvm-svn: 195934
-
- Nov 26, 2013
-
-
Nadav Rotem authored
PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE. llvm-svn: 195791
-
Arnold Schwaighofer authored
In signed arithmetic we could end up with an i64 trip count for an i32 phi. Because it is signed arithmetic we know that this is only defined if the i32 does not wrap. It is therefore safe to truncate the i64 trip count to a i32 value. Fixes PR18049. llvm-svn: 195787
-
Nadav Rotem authored
we generate PHI nodes with multiple entries from the same basic block but with different values. Enabling CSE on ExtractElement instructions make sure that all of the RAUWed instructions are the same. llvm-svn: 195773
-
Stepan Dyatkovskiy authored
Short description. This issue is about case of treating pointers as integers. We treat pointers as different if they references different address space. At the same time, we treat pointers equal to integers (with machine address width). It was a point of false-positive. Consider next case on 32bit machine: void foo0(i32 addrespace(1)* %p) void foo1(i32 addrespace(2)* %p) void foo2(i32 %p) foo0 != foo1, while foo1 == foo2 and foo0 == foo2. As you can see it breaks transitivity. That means that result depends on order of how functions are presented in module. Next order causes merging of foo0 and foo1: foo2, foo0, foo1 First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be merged with foo2. Depending on order, things could be merged we don't expect to. The fix: Forbid to treat any pointer as integer, except for those, who belong to address space 0. llvm-svn: 195769
-
- Nov 25, 2013
-
-
Chandler Carruth authored
llvm-svn: 195691
-
- Nov 23, 2013
-
-
Manman Ren authored
We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. Make tests more robust by removing hard-coded metadata numbers in CHECK lines. llvm-svn: 195535
-
- Nov 22, 2013
-
-
Manman Ren authored
We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. llvm-svn: 195504
-
Matt Arsenault authored
If the beginning of the loop was also the entry block of the function, branches were inserted to the entry block which isn't allowed. If this occurs, create a new dummy function entry block that branches to the start of the loop. llvm-svn: 195493
-
Matt Arsenault authored
llvm-svn: 195492
-
Richard Sandiford authored
llvm-svn: 195471
-
Yi Jiang authored
llvm-svn: 195406
-
- Nov 21, 2013
-
-
Kostya Serebryany authored
Summary: Don't speculate loads under ThreadSanitizer. This fixes https://code.google.com/p/thread-sanitizer/issues/detail?id=40 Also discussed here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-November/067929.html Reviewers: chandlerc Reviewed By: chandlerc CC: llvm-commits, dvyukov Differential Revision: http://llvm-reviews.chandlerc.com/D2227 llvm-svn: 195324
-
- Nov 20, 2013
-
-
Yuchen Wu authored
Instead of permanently outputting "MVLL" as the file checksum, clang will create gcno and gcda checksums by hashing the destination block numbers of every arc. This allows for llvm-cov to check if the two gcov files are synchronized. Regenerated the test files so they contain the checksum. Also added negative test to ensure error when the checksums don't match. llvm-svn: 195191
-
- Nov 19, 2013
-
-
Arnold Schwaighofer authored
We are slicing an array of Value pointers and process those slices in a loop. The problem is that we might invalidate a later slice by vectorizing a former slice. Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can tell. The test case will only fail when running with libgmalloc. radar://15498655 llvm-svn: 195162
-
Chandler Carruth authored
order of slices of the alloca which have exactly the same size and other properties. This was found by a perniciously unstable sort implementation used to flush out buggy uses of the algorithm. The fundamental idea is that findCommonType should return the best common type it can find across all of the slices in the range. There were two bugs here previously: 1) We would accept an integer type smaller than a byte-width multiple, and if there were different bit-width integer types, we would accept the first one. This caused an actual failure in the testcase updated here when the sort order changed. 2) If we found a bad combination of types or a non-load, non-store use before an integer typed load or store we would bail, but if we found the integere typed load or store, we would use it. The correct behavior is to always use an integer typed operation which covers the partition if one exists. While a clever debugging sort algorithm found problem #1 in our existing test cases, I have no useful test case ideas for #2. I spotted in by inspection when looking at this code. llvm-svn: 195118
-
- Nov 18, 2013
-
-
Paul Robinson authored
(except functions marked always_inline). Functions with 'optnone' must also have 'noinline' so they don't get inlined into any other function. Based on work by Andrea Di Biagio. llvm-svn: 195046
-
Arnold Schwaighofer authored
In some case the loop exit count computation can overflow. Extend the type to prevent most of those cases. The problem is loops like: int main () { int a = 1; char b = 0; lbl: a &= 4; b--; if (b) goto lbl; return a; } The backedge count is 255. The induction variable type is i8. If we add one to 255 to get the exit count we overflow to zero. To work around this issue we extend the type of the induction variable to i32 in the case of i8 and i16. PR17532 llvm-svn: 195008
-
- Nov 17, 2013
-
-
Hal Finkel authored
Generally speaking, control flow paths with error reporting calls are cold. So far, error reporting calls are calls to perror and calls to fprintf, fwrite, etc. with stderr as the stream. This can be extended in the future. The primary motivation is to improve block placement (the cold attribute affects the static branch prediction heuristics). llvm-svn: 194943
-
Hal Finkel authored
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3*i] = foo(0); x[3*i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. llvm-svn: 194939
-
- Nov 16, 2013
-
-
Hal Finkel authored
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls: (fptrunc (sqrt (fpext x))) -> (sqrtf x) but does not apply the same optimization to llvm.sqrt. This is a problem because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in fast-math mode, and because this optimization is being applied to sqrt and not applied to llvm.sqrt, sometimes the fast-math code is slower. This change makes InstCombine apply this optimization to llvm.sqrt as well. This fixes the specific problem in PR17758, although the same underlying issue (optimizations applied to libcalls are not applied to intrinsics) exists for other optimizations in SimplifyLibCalls. llvm-svn: 194935
-
Benjamin Kramer authored
This is common in bitfield code. llvm-svn: 194925
-
Arnold Schwaighofer authored
When we vectorize a scalar access with no alignment specified, we have to set the target's abi alignment of the scalar access on the vectorized access. Using the same alignment of zero would be wrong because most targets will have a bigger abi alignment for vector types. This probably fixes PR17878. llvm-svn: 194876
-
- Nov 15, 2013
-
-
Manman Ren authored
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we try to promote two arguments, they will both write to OriginalLoads causing created loads for the two arguments to have the same original load. And the same tbaa tag and alignment will be put to the created loads for the two arguments. The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*> for OriginalLoads, so each Argument will write to different parts of the map. PR17906 llvm-svn: 194846
-
Matt Arsenault authored
llvm-svn: 194786
-
Matt Arsenault authored
Patch by Michele Scandale! llvm-svn: 194760
-
- Nov 14, 2013
-
-
Yunzhong Gao authored
with and without -g. Adding a test case to make sure that the threshold used in the memory dependence analysis is respected. The test case also checks that debug intrinsics are not counted towards this threshold. Differential Revision: http://llvm-reviews.chandlerc.com/D2141 llvm-svn: 194646
-
- Nov 13, 2013
-
-
Diego Novillo authored
This adds a new scalar pass that reads a file with samples generated by 'perf' during runtime. The samples read from the profile are incorporated and emmited as IR metadata reflecting that profile. The profile file is assumed to have been generated by an external profile source. The profile information is converted into IR metadata, which is later used by the analysis routines to estimate block frequencies, edge weights and other related data. External profile information files have no fixed format, each profiler is free to define its own. This includes both the on-disk representation of the profile and the kind of profile information stored in the file. A common kind of profile is based on sampling (e.g., perf), which essentially counts how many times each line of the program has been executed during the run. The SampleProfileLoader pass is organized as a scalar transformation. On startup, it reads the file given in -sample-profile-file to determine what kind of profile it contains. This file is assumed to contain profile information for the whole application. The profile data in the file is read and incorporated into the internal state of the corresponding profiler. To facilitate testing, I've organized the profilers to support two file formats: text and native. The native format is whatever on-disk representation the profiler wants to support, I think this will mostly be bitcode files, but it could be anything the profiler wants to support. To do this, every profiler must implement the SampleProfile::loadNative() function. The text format is mostly meant for debugging. Records are separated by newlines, but each profiler is free to interpret records as it sees fit. Profilers must implement the SampleProfile::loadText() function. Finally, the pass will call SampleProfile::emitAnnotations() for each function in the current translation unit. This function needs to translate the loaded profile into IR metadata, which the analyzer will later be able to use. This patch implements the first steps towards the above design. I've implemented a sample-based flat profiler. The format of the profile is fairly simplistic. Each sampled function contains a list of relative line locations (from the start of the function) together with a count representing how many samples were collected at that line during execution. I generate this profile using perf and a separate converter tool. Currently, I have only implemented a text format for these profiles. I am interested in initial feedback to the whole approach before I send the other parts of the implementation for review. This patch implements: - The SampleProfileLoader pass. - The base ExternalProfile class with the core interface. - A SampleProfile sub-class using the above interface. The profiler generates branch weight metadata on every branch instructions that matches the profiles. - A text loader class to assist the implementation of SampleProfile::loadText(). - Basic unit tests for the pass. Additionally, the patch uses profile information to compute branch weights based on instruction samples. This patch converts instruction samples into branch weights. It does a fairly simplistic conversion: Given a multi-way branch instruction, it calculates the weight of each branch based on the maximum sample count gathered from each target basic block. Note that this assignment of branch weights is somewhat lossy and can be misleading. If a basic block has more than one incoming branch, all the incoming branches will get the same weight. In reality, it may be that only one of them is the most heavily taken branch. I will adjust this assignment in subsequent patches. llvm-svn: 194566
-
- Nov 12, 2013
-
-
Nadav Rotem authored
Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on). llvm-svn: 194525
-