Commits · 3ef5e46b6d7ef95435328936864f509fa7f62880 · Roger Ferrer / llvm-epi-0.8

Mar 10, 2014

MemCpyOpt: When merging memsets also merge the trivial case of two memsets... · 3ef5e46b

Benjamin Kramer authored Mar 10, 2014

MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination.

The testcase is from PR19092, but I think the bug described there is actually a clang issue.

llvm-svn: 203489

3ef5e46b

For functions with ARM target specific calling convention, when simplify-libcall · 0e8f4612

Evan Cheng authored Mar 10, 2014

optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.

e.g.
extern double pow(double, double);
double t(double x) {
  return pow(10, x);
}

Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
  ret double %0
}

declare double @llvm.pow.f64(double, double) #1

Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %__exp10 = call double @__exp10(double %x) #1
  ret double %__exp10
}

declare double @__exp10(double)

The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.

I can think of 3 options to fix this.

1. Make "C" calling convention just work since the target should know what CC
   is being used.

   This doesn't work because each function can use different CC with the "pcs"
   attribute.

2. Have Clang add the right CC keyword on the calls to LLVM builtin.

   This will work but it doesn't match the LLVM IR specification which states
   these are "Standard C Library Intrinsics".

3. Fix simplify libcall so the resulting calls to the C routines will have the
   proper CC keyword. e.g.
   %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1

   This works and is the solution I implemented here.

Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.

1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.

There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
   that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
   simplify-libcall that's not handled.

But for now, this is the fix that I'm most comfortable with.

llvm-svn: 203488

0e8f4612

Mar 09, 2014

Revert r203230, "CodeGenPrep: sink extends of illegal types into use block." · 1783e1e9
NAKAMURA Takumi authored Mar 09, 2014
```
It choked i686 stage2.

llvm-svn: 203386
```
1783e1e9

IR: Change inalloca's grammar a bit · c4ab61cb

David Majnemer authored Mar 09, 2014

The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:

 - Attributes which have parenthesized arguments are never preceded by
   commas.  This form of attribute is the only one which ever has
   optional arguments.  However, not all of these attributes support
   optional arguments: 'thread_local' supports an optional argument but
   'addrspace' does not.  Interestingly, 'addrspace' is documented as
   being a "qualifier".  What constitutes a qualifier?  I cannot find a
   definition.

 - Some attributes use a space between the keyword and the value.
   Examples of this form are 'align' and 'section'.  These are always
   preceded by a comma.

 - Otherwise, the attribute has no argument.  These attributes do not
   have a preceding comma.

Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type.  'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.

With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type.  Unlike the current formulation, there would be no preceding
comma.  The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.

llvm-svn: 203376

c4ab61cb

Mar 07, 2014

CodeGenPrep: sink extends of illegal types into use block. · ad3d81d3

Tim Northover authored Mar 07, 2014

This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

Patch by Manuel Jacob.

llvm-svn: 203230

ad3d81d3

InstCombine: form shuffles from wider range of insert/extractelements · fad2761c

Tim Northover authored Mar 07, 2014

Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).

This should allow shuffles with different numbers of elements on the
input and output sides as well.

llvm-svn: 203229

fad2761c

Allow constant folding of round function whenever feasible · b67688a8
Karthik Bhat authored Mar 07, 2014
```
llvm-svn: 203198
```
b67688a8

Mar 06, 2014

Allow constant folding of copysign · daa8cd10
Karthik Bhat authored Mar 06, 2014
```
llvm-svn: 203076
```
daa8cd10

Change math intrinsic attributes from readonly to readnone. These · b741b945

Raul E. Silvera authored Mar 06, 2014

are operations that do not access memory but may be sensitive
to floating-point environment changes. LLVM does not attempt
to model FP environment changes, so this was unnecessarily conservative
and was getting on the way of some optimizations, in particular
SLP vectorization.

llvm-svn: 203037

b741b945

Mar 05, 2014
- LoopVectorizer: Preserve fast-math flags · ab12363c
  Arnold Schwaighofer authored Mar 05, 2014
```
Fixes PR19045.

llvm-svn: 203008
```
  ab12363c
- ConstantFolding: Also fold the vector overloads of our math intrinsics. · 061d147f
  Benjamin Kramer authored Mar 05, 2014
```
llvm-svn: 202997
```
  061d147f
- Trivial test commit. · 18ebc7cd
  Raul E. Silvera authored Mar 05, 2014
```
llvm-svn: 202924
```
  18ebc7cd
- Allow constant folding of fma and fmuladd · 8377858c
  Matt Arsenault authored Mar 05, 2014
```
llvm-svn: 202914
```
  8377858c
Mar 03, 2014

Pass to emit DWARF path discriminators. · f5041ce5

Diego Novillo authored Mar 03, 2014

DWARF discriminators are used to distinguish multiple control flow paths
on the same source location. When this happens, instructions across
basic block boundaries will share the same debug location.

This pass detects this situation and creates a new lexical scope to one
of the two instructions. This lexical scope is a child scope of the
original and contains a new discriminator value. This discriminator is
then picked up from MCObjectStreamer::EmitDwarfLocDirective to be
written on the object file.

This fixes http://llvm.org/bugs/show_bug.cgi?id=18270.

llvm-svn: 202752

f5041ce5

Feb 27, 2014

Add a debug info code generation level to the compile unit metadata · 75d49db1

Eric Christopher authored Feb 27, 2014

and update everything accordingly. This can be used to conditionalize
the amount of output in the backend based on the amount of debug
requested/metadata emission scheme by a front end (e.g. clang).

Paired with a commit to clang.

llvm-svn: 202332

75d49db1

Feb 26, 2014

Fix broken FileCheck prefixes · 0a0c674b
Nico Rieck authored Feb 26, 2014
```
llvm-svn: 202308
```
0a0c674b

GlobalOpt: Apply fastcc to internal x86_thiscallcc functions · 22869378

Reid Kleckner authored Feb 26, 2014

We should apply fastcc whenever profitable.  We can expand this list,
but there are lots of conventions with performance implications that we
don't want to change.

Differential Revision: http://llvm-reviews.chandlerc.com/D2705

llvm-svn: 202293

22869378

Fix broken FileCheck prefix · 5645b363
Nico Rieck authored Feb 26, 2014
```
llvm-svn: 202291
```
5645b363
Fix PR18165: LSR must avoid scaling factors that exceed the limit on truncated use. · 429e9edd
Andrew Trick authored Feb 26, 2014
```
Patch by Michael Zolotukhin!

llvm-svn: 202273
```
429e9edd

[SROA] Use the correct index integer size in GEPs through non-default · dfb2efd0

Chandler Carruth authored Feb 26, 2014

address spaces.

This isn't really a correctness issue (the values are truncated) but its
much cleaner.

Patch by Matt Arsenault!

llvm-svn: 202252

dfb2efd0

[SROA] Teach SROA how to handle pointers from address spaces other than · 286d87ed

Chandler Carruth authored Feb 26, 2014

the default.

Based on the patch by Matt Arsenault, D1764!

I switched one place to use the more direct pointer type to compute the
desired address space, and I reworked the memcpy rewriting section to
reflect significant refactorings that this patch helped inspire.

Thanks to several of the folks who helped review and improve the patch
as well.

llvm-svn: 202247

286d87ed

[SROA] Split the alignment computation complete for the memcpy rewriting · aa72b93a

Chandler Carruth authored Feb 26, 2014

to work independently for the slice side and the other side.

This allows us to only compute the minimum of the two when we actually
rewrite to a memcpy that needs to take the minimum, and preserve higher
alignment for one side or the other when rewriting to loads and stores.

This fix was inspired by seeing the result of some refactoring that
makes addrspace handling better.

llvm-svn: 202242

aa72b93a

[SROA] Fix PR18615 with some long overdue simplifications to the bounds · 6aedc106

Chandler Carruth authored Feb 26, 2014

checking in SROA.

The primary change is to just rely on uge for checking that the offset
is within the allocation size. This removes the explicit checks against
isNegative which were terribly error prone (including the reversed logic
that led to PR18615) and prevented us from supporting stack allocations
larger than half the address space.... Ok, so maybe the latter isn't
*common* but it's a silly restriction to have.

Also, we used to try to support a PHI node which loaded from before the
start of the allocation if any of the loaded bytes were within the
allocation. This doesn't make any sense, we have never really supported
loading or storing *before* the allocation starts. The simplified logic
just doesn't care.

We continue to allow loading past the end of the allocation in part to
support cases where there is a PHI and some loads are larger than others
and the larger ones reach past the end of the allocation. We could solve
this a different and more conservative way, but I'm still somewhat
paranoid about this.

llvm-svn: 202224

6aedc106

Feb 25, 2014

[SROA] Fix another instability in SROA with respect to the slice · 3bf18ed5

Chandler Carruth authored Feb 25, 2014

ordering.

The fundamental problem that we're hitting here is that the use-def
chain ordering is *itself* not a stable thing to be relying on in the
rewriting for SROA. Further, we use a non-stable sort over the slices to
arrange them based on the section of the alloca they're operating on.
With a debugging STL implementation (or different implementations in
stage2 and stage3) this can cause stage2 != stage3.

The specific aspect of this problem fixed in this commit deals with the
rewriting and load-speculation around PHIs and Selects. This, like many
other aspects of the use-rewriting in SROA, is really part of the
"strong SSA-formation" that is doen by SROA where it works very hard to
canonicalize loads and stores in *just* the right way to satisfy the
needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the
same alloca, we test that loads downstream of the select are
speculatable around it twice. If only one of the operands to the select
needs to be rewritten, then if we get lucky we rewrite that one first
and the select is immediately speculatable. This can cause the order of
operand visitation, and thus the order of slices to be rewritten, to
change an alloca from promotable to non-promotable and vice versa.

The fix is to defer all of the speculation until *after* the rewrite
phase is done. Once we've rewritten everything, we can accurately test
for whether speculation will work (once, instead of twice!) and the
order ceases to matter.

This also happens to simplify the other subtlety of speculation -- we
need to *not* speculate anything unless the result of speculating will
make the alloca fully promotable by mem2reg. I had a previous attempt at
simplifying this, but it was still pretty horrible.

There is actually already a *really* nice test case for this in
basictest.ll, but on multiple STL implementations and inputs, we just
got "lucky". Fortunately, the test case is very small and we can
essentially build it in exactly the opposite way to get reasonable
coverage in both directions even from normal STL implementations.

llvm-svn: 202092

3bf18ed5

Feb 24, 2014
- SLPVectorizer: Try vectorizing 'splat' stores · 9611d23d
  Arnold Schwaighofer authored Feb 24, 2014
```
Vectorize sequential stores of a broadcasted value.
5% on eon.

radar://16124699

llvm-svn: 202067
```
  9611d23d
Feb 21, 2014

Make sure that value handle users see the transformation of an indirect call... · 75080ff2

Nick Lewycky authored Feb 20, 2014

Make sure that value handle users see the transformation of an indirect call to a direct call. This is important for the CallGraph iteration. Patch by Björn Steinbrink!

llvm-svn: 201822

75080ff2

Feb 19, 2014

X86: move test requiring X86TargetLowering info into its own directory · d495642c
Tim Northover authored Feb 19, 2014
```
    
If LLVM is built without X86 as a supported target then the test would
mysteriously fail.

llvm-svn: 201668
```
d495642c

Try addding datalayout in case that's what Hexagon doesn't like. · d496355b

Tim Northover authored Feb 19, 2014

Just a wild stab in the dark really, but in the absence of any ability to
reproduce the problem...

llvm-svn: 201658

d496355b

X86 CodeGenPrep: sink shufflevectors before shifts · aeb8e06d

Tim Northover authored Feb 19, 2014

On x86, shifting a vector by a scalar is significantly cheaper than shifting a
vector by another fully general vector. Unfortunately, because SelectionDAG
operates on just one basic block at a time, the shufflevector instruction that
reveals whether the right-hand side of a shift *is* really a scalar is often
not visible to CodeGen when it's needed.

This adds another handler to CodeGenPrepare, to sink any useful shufflevector
instructions down to the basic block where they're used, predicated on a target
hook (since on other architectures, doing so will often just introduce extra
real work).

rdar://problem/16063505

llvm-svn: 201655

aeb8e06d

Feb 17, 2014
- fix for null VectorizedValue assertion in the SLP Vectorizer (in function... · 7a463d06
  Gerolf Hoflehner authored Feb 17, 2014
```
fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178

llvm-svn: 201501
```
  7a463d06
Feb 16, 2014

SCEVExpander: Try hard not to create derived induction variables in other loops · 26f567d8

Arnold Schwaighofer authored Feb 16, 2014

During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop. This patch tries to base such derived
induction variables of the preceeding loop's induction variable.

This helps twolf on arm and seems to help scimark2 on x86.

Reapply with a fix for the case of a value derived from a pointer.

radar://15970709

llvm-svn: 201496

26f567d8

Fix broken CHECK lines · 76471787
Nico Rieck authored Feb 16, 2014
```
llvm-svn: 201479
```
76471787

Feb 15, 2014

Revert "SCEVExpander: Try hard not to create derived induction variables in other loops" · 847d9614
Arnold Schwaighofer authored Feb 15, 2014
```
This reverts commit r201465. It broke an arm bot.

llvm-svn: 201466
```
847d9614

SCEVExpander: Try hard not to create derived induction variables in other loops · 1e12f856

Arnold Schwaighofer authored Feb 15, 2014

This helps twolf on arm and seems to help scimark2 on x86.

radar://15970709

llvm-svn: 201465

1e12f856

Feb 14, 2014
- Do more addrspacecast transforms that happen for bitcast. · aa689f50
  Matt Arsenault authored Feb 14, 2014
```
Makes addrspacecast (gep) do addrspacecast (gep) instead.

llvm-svn: 201376
```
  aa689f50
Feb 13, 2014

Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove... · 753e1762

Daniel Sanders authored Feb 13, 2014

Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call

Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333

753e1762

GlobalOpt: Aliases don't have sections, don't copy them when replacing · 22b19da9

Reid Kleckner authored Feb 13, 2014

As defined in LangRef, aliases do not have sections.  However, LLVM's
GlobalAlias class inherits from GlobalValue, which means we can read and
set its section.  We should probably ban that as a separate change,
since it doesn't make much sense for an alias to have a section that
differs from its aliasee.

Fixes PR18757, where the section was being lost on the global in code
from Clang like:

extern "C" {
__attribute__((used, section("CUSTOM"))) static int in_custom_section;
}

Reviewers: rafael.espindola

Differential Revision: http://llvm-reviews.chandlerc.com/D2758

llvm-svn: 201286

22b19da9

Remove a very old instcombine where we would turn sequences of selects into · 883b5add

Owen Anderson authored Feb 12, 2014

logical operations on the i1's driving them. This is a bad idea for every
target I can think of (confirmed with micro tests on all of: x86-64, ARM,
AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into
a general purpose register, whereas consuming it directly into a select generally
allows it to exist only transiently in a predicate or flags register.

Chandler ran a set of performance tests with this change, and reported no
measurable change on x86-64.

llvm-svn: 201275

883b5add

Feb 12, 2014

Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm()... · abe212a3

Daniel Sanders authored Feb 12, 2014

Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call

It introduced multiple test failures in the buildbots.

llvm-svn: 201241

abe212a3

Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call · a7d504cf

Daniel Sanders authored Feb 12, 2014

Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237

a7d504cf