Commits · 01cfa94212a1e352145319554047aea3a637d709 · Lorenzo Albano / LLVM bpEVL

Dec 05, 2013

Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) · 01cfa942
Yi Jiang authored Dec 05, 2013
```
llvm-svn: 196544
```
01cfa942

Renato Golin authored Dec 05, 2013

Test is platform independent, but I don't want to force vector-width, or
that could spoil the pragma test.

llvm-svn: 196539

e593fea5

Add #pragma vectorize enable/disable to LLVM · 729a3ae9

Renato Golin authored Dec 05, 2013

The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537

729a3ae9

SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use · 7ee53cac

Arnold Schwaighofer authored Dec 05, 2013

We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508

7ee53cac

Correct word hyphenations · f907b891

Alp Toker authored Dec 05, 2013

This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471

f907b891

Dec 03, 2013

Teach the internalize pass to skip dllexported symbols because they could be · 9163e8bc

Yunzhong Gao authored Dec 03, 2013

referenced in a way that even the linker does not see.

Differential Revision: http://llvm-reviews.chandlerc.com/D2280

llvm-svn: 196300

9163e8bc

opt: Mirror vectorization presets of clang · 46db725a

Arnold Schwaighofer authored Dec 03, 2013

clang enables vectorization at optimization levels > 1 and size level < 2. opt
should behave similarily.

Loop vectorization and SLP vectorization can be disabled with the flags
-disable-(loop/slp)-vectorization.

llvm-svn: 196294

46db725a

llvm/test/Transforms/SampleProfile/syntax.ll: Relax an expression, not to... · 9ae4da2e
NAKAMURA Takumi authored Dec 03, 2013
```
llvm/test/Transforms/SampleProfile/syntax.ll: Relax an expression, not to check locale-dependent message.

llvm-svn: 196195
```
9ae4da2e

Dec 02, 2013

Conservative fix for PR17827 - don't optimize a shift + and + compare sequence... · 5389f746

Kay Tiong Khoo authored Dec 02, 2013

Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned

llvm-svn: 196129

5389f746

Add tests for profile sample file parsing. · 21cb8d4d

Diego Novillo authored Dec 02, 2013

The profile file parser needed some tests for its parsing actions.
This adds tests for each of the error messages emitted by the parser.

llvm-svn: 196106

21cb8d4d

Rename test with misspelt filename · 43d937fc
Alp Toker authored Dec 02, 2013
```
llvm-svn: 196064
```
43d937fc

Nov 28, 2013
- Rein in overzealous InstCombine of fptrunc(OP(fpextend, fpextend)). · c454964c
  Stephen Canon authored Nov 28, 2013
```
llvm-svn: 195934
```
  c454964c
Nov 26, 2013

PR1860 - We can't save a list of ExtractElement instructions to CSE because... · b0082d24

Nadav Rotem authored Nov 26, 2013

PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.

llvm-svn: 195791

b0082d24

LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary · a2c8e008

Arnold Schwaighofer authored Nov 26, 2013

In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787

a2c8e008

PR18060 - When we RAUW values with ExtractElement instructions in some cases · f9f8482e

Nadav Rotem authored Nov 26, 2013

we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773

f9f8482e

PR17925 bugfix. · abb8505d

Stepan Dyatkovskiy authored Nov 26, 2013

Short description.

This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:

void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)

foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.

As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.

The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.

llvm-svn: 195769

abb8505d

Nov 25, 2013
- Add the test case that I missed when committing r195528. Doh! · ec1fb5c7
  Chandler Carruth authored Nov 25, 2013
```
llvm-svn: 195691
```
  ec1fb5c7
Nov 23, 2013

Debug Info: update testing cases to specify the debug info version number. · d664bd77

Manman Ren authored Nov 23, 2013

We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

Make tests more robust by removing hard-coded metadata numbers in CHECK lines.

llvm-svn: 195535

d664bd77

Nov 22, 2013
- Debug Info: update testing cases to specify the debug info version number. · 409558f8
  Manman Ren authored Nov 22, 2013
```
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

llvm-svn: 195504
```
  409558f8
- StructurizeCFG: Fix verification failure with some loops. · 6ea0aade
  Matt Arsenault authored Nov 22, 2013
```
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.

llvm-svn: 195493
```
  6ea0aade
- StructurizeCFG: Fix inverting a branch on an argument · 9fb6e0ba
  Matt Arsenault authored Nov 22, 2013
```
llvm-svn: 195492
```
  9fb6e0ba
- Add a Scalarizer pass. · 8ee1b77d
  Richard Sandiford authored Nov 22, 2013
```
llvm-svn: 195471
```
  8ee1b77d
- SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses. · 79a2b0a6
  Yi Jiang authored Nov 22, 2013
```
llvm-svn: 195406
```
  79a2b0a6
Nov 21, 2013

Don't speculate loads under ThreadSanitizer · 0b458286

Kostya Serebryany authored Nov 21, 2013

Summary:
Don't speculate loads under ThreadSanitizer.
This fixes https://code.google.com/p/thread-sanitizer/issues/detail?id=40
Also discussed here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-November/067929.html

Reviewers: chandlerc

Reviewed By: chandlerc

CC: llvm-commits, dvyukov

Differential Revision: http://llvm-reviews.chandlerc.com/D2227

llvm-svn: 195324

0b458286

Nov 20, 2013

llvm-cov: Added file checksum to gcno and gcda files. · babe7491

Yuchen Wu authored Nov 20, 2013

Instead of permanently outputting "MVLL" as the file checksum, clang
will create gcno and gcda checksums by hashing the destination block
numbers of every arc. This allows for llvm-cov to check if the two gcov
files are synchronized.

Regenerated the test files so they contain the checksum. Also added
negative test to ensure error when the checksums don't match.

llvm-svn: 195191

babe7491

Nov 19, 2013

SLPVectorizer: Fix stale for Value pointer array · 8bc4a0ba

Arnold Schwaighofer authored Nov 19, 2013

We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

llvm-svn: 195162

8bc4a0ba

Fix an issue where SROA computed different results based on the relative · a1262006

Chandler Carruth authored Nov 19, 2013

order of slices of the alloca which have exactly the same size and other
properties. This was found by a perniciously unstable sort
implementation used to flush out buggy uses of the algorithm.

The fundamental idea is that findCommonType should return the best
common type it can find across all of the slices in the range. There
were two bugs here previously:

1) We would accept an integer type smaller than a byte-width multiple,
   and if there were different bit-width integer types, we would accept
   the first one. This caused an actual failure in the testcase updated
   here when the sort order changed.
2) If we found a bad combination of types or a non-load, non-store use
   before an integer typed load or store we would bail, but if we found
   the integere typed load or store, we would use it. The correct
   behavior is to always use an integer typed operation which covers the
   partition if one exists.

While a clever debugging sort algorithm found problem #1 in our existing
test cases, I have no useful test case ideas for #2. I spotted in by
inspection when looking at this code.

llvm-svn: 195118

a1262006

Nov 18, 2013

The 'optnone' attribute means don't inline anything into this function · dcbe35ba

Paul Robinson authored Nov 18, 2013

(except functions marked always_inline).
Functions with 'optnone' must also have 'noinline' so they don't get
inlined into any other function.

Based on work by Andrea Di Biagio.

llvm-svn: 195046

dcbe35ba

LoopVectorizer: Extend the induction variable to a larger type · b72cb4ec

Arnold Schwaighofer authored Nov 18, 2013

In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

llvm-svn: 195008

b72cb4ec

Nov 17, 2013

Add the cold attribute to error-reporting call sites · 66cd3f1b

Hal Finkel authored Nov 17, 2013

Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.

The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).

llvm-svn: 194943

66cd3f1b

Add a loop rerolling pass · bf45efde

Hal Finkel authored Nov 16, 2013

This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

llvm-svn: 194939

bf45efde

Nov 16, 2013

Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt · 12100bf7

Hal Finkel authored Nov 16, 2013

InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:

  (fptrunc (sqrt (fpext x))) -> (sqrtf x)

but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.

This change makes InstCombine apply this optimization to llvm.sqrt as well.

This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.

llvm-svn: 194935

12100bf7

InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs. · 03f3e248
Benjamin Kramer authored Nov 16, 2013
```
This is common in bitfield code.

llvm-svn: 194925
```
03f3e248

LoopVectorizer: Use abi alignment for accesses with no alignment · dbb7b87d

Arnold Schwaighofer authored Nov 15, 2013

When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876

dbb7b87d

Nov 15, 2013

ArgumentPromotion: correctly transfer TBAA tags and alignments. · bc37658a

Manman Ren authored Nov 15, 2013

We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.

The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.

PR17906

llvm-svn: 194846

bc37658a

Add instcombine visitor for addrspacecast · a9e95abc
Matt Arsenault authored Nov 15, 2013
```
llvm-svn: 194786
```
a9e95abc
Add addrspacecast instruction. · b03bd4d9
Matt Arsenault authored Nov 15, 2013
```
Patch by Michele Scandale!

llvm-svn: 194760
```
b03bd4d9

Nov 14, 2013

Fixing a heisenbug where the memory dependence analysis behaves differently · 5cbcf56a

Yunzhong Gao authored Nov 14, 2013

with and without -g.

Adding a test case to make sure that the threshold used in the memory
dependence analysis is respected. The test case also checks that debug
intrinsics are not counted towards this threshold.

Differential Revision: http://llvm-reviews.chandlerc.com/D2141

llvm-svn: 194646

5cbcf56a

Nov 13, 2013

SampleProfileLoader pass. Initial setup. · 8d6568b5

Diego Novillo authored Nov 13, 2013

This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

llvm-svn: 194566

8d6568b5

Nov 12, 2013

Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that... · 0ed2fdb5

Nadav Rotem authored Nov 12, 2013

Fold (iszero(A&K1) | iszero(A&K2)) ->  (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on).

llvm-svn: 194525

0ed2fdb5