Commits · b5b7a616460e40d81260a4e6fc866d4fc7440cef · Roger Ferrer / llvm-epi-0.8

Mar 26, 2014

Follow-up to r204790: don't try to emit line tables if there are no functions with DI in the TU · b5b7a616
Timur Iskhodzhanov authored Mar 26, 2014
```
llvm-svn: 204795
```
b5b7a616
[mips] $s8 is an alias for $fp in all ABI's, not just N32/N64. · 85f482b0
Daniel Sanders authored Mar 26, 2014
```
llvm-svn: 204793
```
85f482b0
Fix PR19239 - Add support for generating debug info for functions without... · 8499a122
Timur Iskhodzhanov authored Mar 26, 2014
```
Fix PR19239 - Add support for generating debug info for functions without lexical scopes and/or debug info at all

llvm-svn: 204790
```
8499a122

Revert "Prevent alias from pointing to weak aliases." · 65481d7b

Rafael Espindola authored Mar 26, 2014

This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784

65481d7b

[PowerPC] Generate logical vector VSX instructions · bd4de9d4

Hal Finkel authored Mar 26, 2014

These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.

llvm-svn: 204782

bd4de9d4

Prevent alias from pointing to weak aliases. · 3b712a84

Rafael Espindola authored Mar 26, 2014

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781

3b712a84

DebugInfo: Add fission-related sections to COFF · 62dd7df6

David Blaikie authored Mar 26, 2014

Allows this test to pass on COFF platforms so we don't need to restrict
this test to a single target anymore.

llvm-svn: 204780

62dd7df6

Correctly detect if a symbol uses a reserved section index or not. · 85a8491a

Rafael Espindola authored Mar 26, 2014

The logic was incorrect for variables, causing them to end up in the wrong
section if the section had an index >= 0xff00.

llvm-svn: 204771

85a8491a

[X86] Add broadcast instructions to the table used by ExeDepsFix pass. · 6f12ae0d

Quentin Colombet authored Mar 26, 2014

Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).

In particular, prior to this patch we were generating:
  vpbroadcastd  LCPI1_0(%rip), %ymm2
  vpand %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0 ## <- domain change penalty

Now, we generate the following nice sequence where everything is in the float
domain:
  vbroadcastss  LCPI1_0(%rip), %ymm2
  vandps  %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0

<rdar://problem/16354675>

llvm-svn: 204770

6f12ae0d

Create .symtab_shndxr only when needed. · 10be0837

Rafael Espindola authored Mar 25, 2014

We need .symtab_shndxr if and only if a symbol references a section with an
index >= 0xff00.

The old code was trying to figure out if the section was needed ahead of time,
making it a fairly dependent on the code actually writing the table. It was
also somewhat conservative and would create the section in cases where it was
not needed.

If I remember correctly, the old structure was there so that the sections were
created in the same order gas creates them. That was valuable when MC's support
for ELF was new and we tested with elf-dump.py.

This patch refactors the symbol table creation to another class and makes it
obvious that .symtab_shndxr is really only created when we are about to output
a reference to a section index >= 0xff00.

While here, also improve the tests to use macros. One file is one section
short of needing .symtab_shndxr, the second one has just the right number.

llvm-svn: 204769

10be0837

[PowerPC] Select between VSX A-type and M-type FMA instructions just before RA · 174e5909

Hal Finkel authored Mar 25, 2014

The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:

 1. This will eliminate an otherwise-necessary copy of the addend

 2. One of the product operands is killed by the instruction

The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.

As a simple example, if we have:

%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
  ...
  %vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
                        %RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
  ...

We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:

  %vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
                        %RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16

is replaced by:

  %vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
                        %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9

and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9

llvm-svn: 204768

174e5909

Mar 25, 2014

Use Endian.h to simplify this code a bit. · 0ce0971a

Rafael Espindola authored Mar 25, 2014

While at it, factor some logic into FragmentWriter. This will allow more code
to be factored out of the fairly large ELFObjectWriter.

llvm-svn: 204765

0ce0971a

[Constant Hoisting] Make the constant candidate map local to the collectConstantCandidates method. · 7be410f5
Juergen Ributzka authored Mar 25, 2014
```
llvm-svn: 204758
```
7be410f5

[PowerPC] Correct commutable indices for VSX FMA instructions · 6c32ff31

Hal Finkel authored Mar 25, 2014

Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.

I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).

llvm-svn: 204748

6c32ff31

[PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions · 25e0454f

Hal Finkel authored Mar 25, 2014

TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.

llvm-svn: 204746

25e0454f

R600: Move computeMaskedBitsForTargetNode out of AMDILISelLowering.cpp · 0c274fee

Matt Arsenault authored Mar 25, 2014

Remove handling of select_cc, since it makes no sense to be there. This
now does nothing, but I'll be adding some handling of other target nodes
soon.

llvm-svn: 204743

0c274fee

blockfreq: Implement Pass::releaseMemory() · 3dbe1050

Duncan P. N. Exon Smith authored Mar 25, 2014

Implement Pass::releaseMemory() in BlockFrequencyInfo and
MachineBlockFrequencyInfo.  Just delete the private implementation when
not in use.  Switch to a std::unique_ptr to make the logic more clear.

<rdar://problem/14292693>

llvm-svn: 204741

3dbe1050

blockfreq: Use const in MachineBlockFrequencyInfo · 936aef92
Duncan P. N. Exon Smith authored Mar 25, 2014
```
<rdar://problem/14292693>

llvm-svn: 204740
```
936aef92

[X86TTI] Make constant base pointers for getElementPtr opaque. · 631c4914

Juergen Ributzka authored Mar 25, 2014

If getElementPtr uses a constant as base pointer, then make the constant opaque.
This prevents constant folding it with the offset. The offset can usually be
encoded in the load/store instruction itself and the base address doesn't have
to be rematerialized several times.

llvm-svn: 204739

631c4914

[Stackmaps][X86TTI] Fix think-o in getIntImmCost calculation. · 5eef98cf

Juergen Ributzka authored Mar 25, 2014

The cost for the first four stackmap operands was always TCC_Free.
This is only true for the first two operands. All other operands
are TCC_Free if they are within 64bit.

llvm-svn: 204738

5eef98cf

[DAG] Keep the opaque constant flag when performing unary constant folding operations. · e2e16844

Juergen Ributzka authored Mar 25, 2014

Usually opaque constants shouldn't be folded, unless they are simple unary
operations that don't create new constants. Although this shouldn't drop the
opaque constant flag. This commit fixes this.

Related to <rdar://problem/14774662>

llvm-svn: 204737

e2e16844

[X86] Generate VPSHUFB for in-place v16i16 shuffles · 4beef4c9

Adam Nemet authored Mar 25, 2014

This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.

Fixes <rdar://problem/16167303>

llvm-svn: 204735

4beef4c9

[X86] Factor out new helper getPSHUFB · ac6d6383

Adam Nemet authored Mar 25, 2014

I found three implementations of this.  This splits it out into a new function
and uses it from the three places.

My plan is to add a fourth use when lowering a vector_shuffle:v16i16.

Compared the assembly output of test/CodeGen/X86 before and after.

The only change is due to how the first PSHUFB was generated in
LowerVECTOR_SHUFFLEv8i16.  If the shuffle mask specified undef (i.e. -1), the
old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the
control mask.  Now we write 0x80.  These are of course interchangeable since
bit 7 decides if a constant zero is written in the result byte.  The other
instances of this code use 0x80 consistently.

Related to <rdar://problem/16167303>

llvm-svn: 204734

ac6d6383

[InstCombine] Don't fold bitcast into store if it would need addrspacecast · 0af4aa9a

Richard Osborne authored Mar 25, 2014

Summary:
Previously the code didn't check if the before and after types for the
store were pointers to different address spaces. This resulted in
instcombine using a bitcast to convert between pointers to different
address spaces, causing an assertion due to the invalid cast.

It is not be appropriate to use addrspacecast this case because it is
not guaranteed to be a no-op cast. Instead bail out and do not do the
transformation.

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3117

llvm-svn: 204733

0af4aa9a

Reuse earlier variables to make it clear the types involved in the cast. · 9805ec45
Richard Osborne authored Mar 25, 2014
```
No functionality change.

llvm-svn: 204732
```
9805ec45

ScalarEvolution: Compute exit counts for loops with a power-of-2 step. · e75eaca3

Benjamin Kramer authored Mar 25, 2014

If we have a loop of the form
for (unsigned n = 0; n != (k & -32); n += 32) {}
then we know that n is always divisible by 32 and the loop must
terminate. Even if we have a condition where the loop counter will
overflow it'll always hold this invariant.

PR19183. Our loop vectorizer creates this pattern and it's also
occasionally formed by loop counters derived from pointers.

llvm-svn: 204728

e75eaca3

Fix creating illegal setcc cond codes. · b22426c5

Matt Arsenault authored Mar 25, 2014

If GT/UGT or LT/ULT were set to expand, a comparison
with a constant would replace it with the illegal
cond code.

There are several more places later in this function that
will have the same basic problem.

Theoretically R600 should hit this problem for a test,
but for some reason it doesn't.

llvm-svn: 204727

b22426c5

[msan] More precise instrumentation of select IR. · fc742acc

Evgeniy Stepanov authored Mar 25, 2014

Some bits of select result may be initialized even if select condition
is not.

https://code.google.com/p/memory-sanitizer/issues/detail?id=50

llvm-svn: 204716

fc742acc

[mips] '.set at=$0' should be equivalent to '.set noat' · 71a89d92
Daniel Sanders authored Mar 25, 2014
```
Differential Revision: http://llvm-reviews.chandlerc.com/D3171

llvm-svn: 204714
```
71a89d92
Fix AVX2 Gather execution domains. · 45dc4894
Cameron McInally authored Mar 25, 2014
```
llvm-svn: 204713
```
45dc4894

[mips] Correct testcase for .set at=$reg and emit the new warnings for numeric registers too. · b1d7e53a

Daniel Sanders authored Mar 25, 2014

Summary:
Remove the XFAIL added in my previous commit and correct the test such that
it correctly tests the expansion of the assembler temporary.

Also added a test to check that $at is always $1 when written by the
user.

Corrected the new assembler temporary warnings so that they are emitted for
numeric registers too.

Differential Revision: http://llvm-reviews.chandlerc.com/D3169

llvm-svn: 204711

b1d7e53a

[mips] Fix assembler temporary expansion and add associated warnings about the use of $at. · e231ae9e

Daniel Sanders authored Mar 25, 2014

Summary:
The assembler temporary is normally $at ($1) but can be reassigned using
'.set at=$reg'. Regardless of which register is nominated as the assembler
temporary, $at remains $1 when written by the user.

Adds warnings under the following conditions:
* The register nominated as the assembler temporary is used by the user.
* '.set noat' is in effect and $at is used by the user.
Both of these only work for named registers. I have a follow up commit that makes it work for numeric registers as well.

XFAIL set-at-directive.s since it incorrectly tests that $at is redefined by
'.set at=$reg'. Testcases will follow in a separate commit.

Patch by David Chisnall
His work was sponsored by: DARPA, AFRL

Differential Revision: http://llvm-reviews.chandlerc.com/D3167

llvm-svn: 204710

e231ae9e

Simplify loop that worked around bugs in old GCC/Xcode. · e706b883
Erik Verbruggen authored Mar 25, 2014
```
GCC 4.0.1 and Xcode 2 are no longer supported for building llvm/clang.

llvm-svn: 204705
```
e706b883
Disable Visual C++ warning 4722 about aborting a destructor, · 24fdbe56
Yaron Keren authored Mar 25, 2014
```
it has no value for us.

llvm-svn: 204704
```
24fdbe56

WinCOFF: Add support for -fdata-sections · 273bff47

David Majnemer authored Mar 25, 2014

This is a pretty straight forward translation for COFF, we just need to
stick the data in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.

N.B. We must be careful to avoid sticking entities with private linkage
in COMDAT groups.  COFF is pretty hostile to the renaming of entities so
we must be careful to disallow GlobalVariables with unstable names.

llvm-svn: 204703

273bff47

DebugInfo: Add GNU_addr_base and GNU_ranges_base only when there are addresses or ranges · 3ffe4dd6
David Blaikie authored Mar 25, 2014
```
Based on code review feedback from Eric in r204672.

llvm-svn: 204702
```
3ffe4dd6

SLP vectorizer: Don't hoist vector extracts of phis. · c8ac7ea2

Andrew Trick authored Mar 25, 2014

Extracts coming from phis were being hoisted, while all others were
sunk to their uses. This was inconsistent and didn't seem to serve a
purpose. Changing all extracts to be sunk to uses is a prerequisite
for adding block frequency to the SLP vectorizer's cost model.

I benchmarked the change in isolation (without block frequency). I
only saw noise on x86 and some potentially significant improvements on
ARM. No major regressions is good enough for me.

llvm-svn: 204699

c8ac7ea2

DebugInfo: Support debug_loc under fission · 9c550ac4

David Blaikie authored Mar 25, 2014

Implement debug_loc.dwo, as well as llvm-dwarfdump support for dumping
this section.

Outlined in the DWARF5 spec and http://gcc.gnu.org/wiki/DebugFission the
debug_loc.dwo section has more variation than the standard debug_loc,
allowing 3 different forms of entry (plus the end of list entry). GCC
seems to, and Clang certainly, only use one form, so I've just
implemented dumping support for that for now.

It wasn't immediately obvious that there was a good refactoring to share
the implementation of dumping support between debug_loc and
debug_loc.dwo, so they're separate for now - ideas welcome or I may come
back to it at some point.

As per a comment in the code, we could choose different forms that may
reduce the number of debug_addr entries we emit, but that will require
further study.

llvm-svn: 204697

9c550ac4

DebugInfo: Remove unnecessary zero-size check · 2d33d6a4

David Blaikie authored Mar 25, 2014

This seems excessive - switching section isn't expensive (or if it is
we're already being wasteful, since we emitted the debug_loc section
symbol earlier anyway) and otherwise there's no work that happens in
this function when the list is empty.

llvm-svn: 204696

2d33d6a4

Manman Ren authored Mar 25, 2014

When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost. 
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.

CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.

With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.

rdar://16162005

llvm-svn: 204690

78cf02a0