Commits · 84de61148b466485e3743e42b1763b8ff20b337f · Roger Ferrer / llvm-epi-0.8

Jan 22, 2014

Handle an addrspacecast case in memcpyopt · 84de6114
Matt Arsenault authored Jan 22, 2014
```
llvm-svn: 199836
```
84de6114
Loop strength reduce: fix function name. · bc6659c4
Tim Northover authored Jan 22, 2014
```
llvm-svn: 199801
```
bc6659c4

[SROA] Fix a bug which could cause the common type finding to return · 4de31543

Chandler Carruth authored Jan 21, 2014

inconsistent results for different orderings of alloca slices. The
fundamental issue is that it is just always a mistake to return early
from this function. There is no effective early exit to leverage. This
patch stops trynig to do so and simplifies the code a bit as
a consequence.

Original diagnosis and patch by James Molloy with some name tweaks by me
in part reflecting feedback from Duncan Smith on the mailing list.

llvm-svn: 199771

4de31543

Jan 20, 2014

Fix all the remaining lost-fast-math-flags bugs I've been able to find. The... · 1664dc89

Owen Anderson authored Jan 20, 2014

Fix all the remaining lost-fast-math-flags bugs I've been able to find. The most important of these are cases in the generic logic for combining BinaryOperators.
This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.

llvm-svn: 199629

1664dc89

Jan 19, 2014

InstCombine: Modernize a bunch of cast combines. · b80e1699
Benjamin Kramer authored Jan 19, 2014
```
Also make them vector-aware.

llvm-svn: 199608
```
b80e1699
InstCombine: Hoist 3 copies of AddOne/SubOne into a header. · 970f4959
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199605
```
970f4959
InstCombine: Replace a hand-rolled version of isKnownToBeAPowerOfTwo with the real thing. · 7a74bd47
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199604
```
7a74bd47
InstCombine: Teach most integer add/sub/mul/div combines how to deal with vectors. · 72196f3a
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199602
```
72196f3a
InstCombine: Refactor fmul/fdiv combines to handle vectors. · 76b15d04
Benjamin Kramer authored Jan 19, 2014
```
llvm-svn: 199598
```
76b15d04

Fix a really nasty SROA bug with how we handled out-of-bounds memcpy · 1bf38c6a

Chandler Carruth authored Jan 19, 2014

intrinsics.

Reported on the list by Evan with a couple of attempts to fix, but it
took a while to dig down to the root cause. There are two overlapping
bugs here, both centering around the circumstance of discovering
a memcpy operand which is known to be completely outside the bounds of
the alloca.

First, we need to kill the *other* side of the memcpy if it was added to
this alloca. Otherwise we'll factor it into our slicing and try to
rewrite it even though we know for a fact that it is dead. This is made
more tricky because we can visit the sides in either order. So we have
to both kill the other side and skip instructions marked as dead. The
latter really should be goodness in every case, but here is a matter of
correctness.

Second, we need to actually remove the *uses* of the alloca by the
memcpy when queuing it for later deletion. Otherwise it may still be
using the alloca when we go to promote it (if the rewrite re-uses the
existing alloca instruction). Do this by factoring out the
use-clobbering used when for nixing a Phi argument and re-using it
across the operands of a to-be-deleted instruction.

llvm-svn: 199590

1bf38c6a

LoopVectorizer: A reduction that has multiple uses of the reduction value is not · cc742dd9

Arnold Schwaighofer authored Jan 19, 2014

a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570

cc742dd9

Jan 18, 2014
- Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg),... · a6a17d77
  Nick Lewycky authored Jan 18, 2014
```
Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu!

llvm-svn: 199564
```
  a6a17d77
- InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too. · fea9ac99
  Benjamin Kramer authored Jan 18, 2014
```
PR18532.

llvm-svn: 199553
```
  fea9ac99
- Fix more instances of dropped fast math flags when optimizing FADD... · 48b842ef
  Owen Anderson authored Jan 18, 2014
```
Fix more instances of dropped fast math flags when optimizing FADD instructions.  All found by inspection (aka grep).

llvm-svn: 199528
```
  48b842ef
Jan 17, 2014

[asan] extend asan-coverage (still experimental). · 714c67c3

Kostya Serebryany authored Jan 17, 2014

 - add a mode for collecting per-block coverage (-asan-coverage=2).
   So far the implementation is naive (all blocks are instrumented),
   the performance overhead on top of asan could be as high as 30%.
 - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
   which in turn required to copy the original debug info into the call insn.

Here is the performance data on SPEC 2006
(train data, comparing asan with asan-coverage={0,1,2}):

                             asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
       400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
           401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
             403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
             429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
           445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
           456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
           458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
      462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
         464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
         471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
           473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
       483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
            433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
            444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
          447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
          450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
          453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
             470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
         482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06

llvm-svn: 199488

714c67c3

Jan 16, 2014

[opt][PassInfo] Allow opt to run passes that need target machine. · dc0b2ea2

Quentin Colombet authored Jan 16, 2014

When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.

Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).

Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll

llvm-svn: 199430

dc0b2ea2

Fix two cases where we could lose fast math flags when optimizing FADD expressions. · e7321660
Owen Anderson authored Jan 16, 2014
```
llvm-svn: 199427
```
e7321660
Fix an instance where we would drop fast math flags when performing an fdiv to... · 4557a156
Owen Anderson authored Jan 16, 2014
```
Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation.

llvm-svn: 199425
```
4557a156
Fix a bug in InstCombine where we failed to preserve fast math flags when... · e8537fc7
Owen Anderson authored Jan 16, 2014
```
Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression.

llvm-svn: 199424
```
e8537fc7
Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which... · f74cfe03
Owen Anderson authored Jan 16, 2014
```
Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X).

llvm-svn: 199420
```
f74cfe03

[asan] Remove -fsanitize-address-zero-base-shadow command line · 13665367

Evgeniy Stepanov authored Jan 16, 2014

flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.

- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.

This is the LLVM part of a larger change.

llvm-svn: 199371

13665367

Jan 15, 2014

Switch-to-lookup tables: set threshold to 3 cases · 4744ac17

Hans Wennborg authored Jan 15, 2014

There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.

The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.

In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.

llvm-svn: 199294

4744ac17

LoopVectorize: Only strip casts from integer types when replacing symbolic · dc4c9460
Arnold Schwaighofer authored Jan 15, 2014
```
strides

Fixes PR18480.

llvm-svn: 199291
```
dc4c9460

Jan 14, 2014

Do pointer cast simplifications on addrspacecast · 2d353d1a
Matt Arsenault authored Jan 14, 2014
```
llvm-svn: 199254
```
2d353d1a
Remove a check for an illegal condition. · f08a44f9
Matt Arsenault authored Jan 14, 2014
```
Bitcasts can't be between address spaces anymore.

llvm-svn: 199253
```
f08a44f9
Make nocapture analysis work with addrspacecast · e55a2c2e
Matt Arsenault authored Jan 14, 2014
```
llvm-svn: 199246
```
e55a2c2e

Reapply "LTO: add API to set strategy for -internalize" · 93be7c4f

Duncan P. N. Exon Smith authored Jan 14, 2014

Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll.  The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")).  This commit
fixes the bug.

The original commit message follows.

Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199244

93be7c4f

Decouple dllexport/dllimport from linkage · 7157bb76

Nico Rieck authored Jan 14, 2014

Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218

7157bb76

Revert "Decouple dllexport/dllimport from linkage" · 9d2e0df0

Nico Rieck authored Jan 14, 2014

Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207

9d2e0df0

Decouple dllexport/dllimport from linkage · e43aaf79

Nico Rieck authored Jan 14, 2014

Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204

e43aaf79

Revert r199191, "LTO: add API to set strategy for -internalize" · 23c0ab53
NAKAMURA Takumi authored Jan 14, 2014
```
Please update also Other/link-opts.ll, in next time.

llvm-svn: 199197
```
23c0ab53

LTO: add API to set strategy for -internalize · 43ea3478

Duncan P. N. Exon Smith authored Jan 14, 2014

Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199191

43ea3478

Jan 13, 2014

[PM] Split DominatorTree into a concrete analysis result object which · 73523021

Chandler Carruth authored Jan 13, 2014

can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104

73523021

[PM] Pull the generic graph algorithms and data structures for dominator · e509db41

Chandler Carruth authored Jan 13, 2014

trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095

e509db41

[cleanup] Move the Dominators.h and Verifier.h headers into the IR · 5ad5f15c

Chandler Carruth authored Jan 13, 2014

directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082

5ad5f15c

Re-sort #include lines again, prior to moving headers around. · 07baed53
Chandler Carruth authored Jan 13, 2014
```
llvm-svn: 199080
```
07baed53

Jan 12, 2014

Switch-to-lookup tables: Don't require a result for the default · ac114a3c

Hans Wennborg authored Jan 12, 2014

case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025

ac114a3c

Jan 11, 2014

LoopVectorizer: Enable strided memory accesses versioning per default · 66c742ae
Arnold Schwaighofer authored Jan 11, 2014
```
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
```
66c742ae

LoopVectorize.cpp: Appease MSC16. · 41c409ce

NAKAMURA Takumi authored Jan 11, 2014

Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001

41c409ce

Extend and simplify the sample profile input file. · 9518b63b

Diego Novillo authored Jan 10, 2014

1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973

9518b63b