Commits · 100f8f60c396ffc6cf5beac64b55e157dd6d22ed · Roger Ferrer / llvm-epi

Jan 16, 2015

Check commit access · 100f8f60
Sumanth Gundapaneni authored Jan 16, 2015
```
llvm-svn: 226302
```
100f8f60

[AVX512] Add intrinsics for masked aligned FP loads and stores · 3e8b22bc

Adam Nemet authored Jan 16, 2015

Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 226296

3e8b22bc

[AVX512] Remove trailing whitespaces in this test · 9b8cfa21
Adam Nemet authored Jan 16, 2015
```
llvm-svn: 226295
```
9b8cfa21
IR: Allow 16-bits for column info · 2f5bb313
Duncan P. N. Exon Smith authored Jan 16, 2015
```
Raise the limit for column information from 8 bits to 16 bits.

llvm-svn: 226291
```
2f5bb313

IR: Cleanup dead code, NFC · c9cddb08

Duncan P. N. Exon Smith authored Jan 16, 2015

Line/column fixups already exist in `MDLocation`.  Delete the duplicated
logic in `DebugLoc`.

llvm-svn: 226290

c9cddb08

[Hexagon] Updating call/jump instruction patterns. · 2e3a26de
Colin LeMahieu authored Jan 16, 2015
```
llvm-svn: 226288
```
2e3a26de

[X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0. · ae47bc6a

Andrea Di Biagio authored Jan 16, 2015

This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].

llvm-svn: 226277

ae47bc6a

[mips] Remove a redundant semicolon and add space before curly brackets. NFC. · f476200c
Toma Tabacu authored Jan 16, 2015
```
llvm-svn: 226269
```
f476200c

[X86] Refactored stack memory folding tests to explicitly force register spilling · 367db8ee

Simon Pilgrim authored Jan 16, 2015

The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations.

This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach).

More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point.

Differential Revision: http://reviews.llvm.org/D6932

llvm-svn: 226264

367db8ee

Revert r226242 - Revert Revert Don't create new comdats in CodeGen · 60b72136
Timur Iskhodzhanov authored Jan 16, 2015
```
This breaks AddressSanitizer (ninja check-asan) on Windows

llvm-svn: 226251
```
60b72136
Use report_fatal_error instead of llvm_unreachable, so we don't crash on user input · 3ca723c9
Filipe Cabecinhas authored Jan 16, 2015
```
llvm-svn: 226248
```
3ca723c9

[PowerPC] Adjust PatchPoints for ppc64le · 52f7c018

Hal Finkel authored Jan 16, 2015

Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

llvm-svn: 226247

52f7c018

Fix Reassociate handling of constant in presence of undef float · 590a2700
Mehdi Amini authored Jan 16, 2015
```
http://reviews.llvm.org/D6993

llvm-svn: 226245
```
590a2700

Revert "Revert Don't create new comdats in CodeGen" · 67a79e72

Rafael Espindola authored Jan 16, 2015

This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

llvm-svn: 226242

67a79e72

Work around to get the build bot clang-cmake-armv7-a15-full green by · 15e9420f

Kevin Enderby authored Jan 16, 2015

removing the macho-archive-headers.test added with r226228 that it is
failing on for now while I try to figure out what is going on.

llvm-svn: 226241

15e9420f

Another attempt to fix the build bot clang-cmake-armv7-a15-full failing on · 95f1860d
Kevin Enderby authored Jan 16, 2015
```
the macho-archive-headers.test added with r226228.

llvm-svn: 226239
```
95f1860d

Add a new pass "inductive range check elimination" · a1837a34

Sanjoy Das authored Jan 16, 2015

IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

This pass was originally r226201.  It was reverted because it used C++
features not supported by MSVC 2012.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226238

a1837a34

This should fix the build bot clang-cmake-armv7-a15-full failing on · a975d4df
Kevin Enderby authored Jan 16, 2015
```
the macho-archive-headers.test added with r226228.

llvm-svn: 226232
```
a975d4df
R600/SI: Add patterns for v_cvt_{flr|rpi}_i32_f32 · eeb2a7e6
Matt Arsenault authored Jan 15, 2015
```
llvm-svn: 226230
```
eeb2a7e6
Fix edge case when Start overflowed in 32 bit mode · c552c9ab
Filipe Cabecinhas authored Jan 15, 2015
```
llvm-svn: 226229
```
c552c9ab
Add the option, -archive-headers, used with -macho to print the Mach-O archive... · 13023a1a
Kevin Enderby authored Jan 15, 2015
```
Add the option, -archive-headers, used with -macho to print the Mach-O archive headers to llvm-objdump.

llvm-svn: 226228
```
13023a1a

R600/SI: Fix trailing comma with modifiers · 268757ba

Matt Arsenault authored Jan 15, 2015

Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.

llvm-svn: 226226

268757ba

[Hexagon] Adding new-value store and bit reverse instructions. · cd9c4e3e
Colin LeMahieu authored Jan 15, 2015
```
llvm-svn: 226224
```
cd9c4e3e

Jan 15, 2015

Report fatal errors instead of segfaulting/asserting on a few invalid accesses... · 40139500

Filipe Cabecinhas authored Jan 15, 2015

Report fatal errors instead of segfaulting/asserting on a few invalid accesses while reading MachO files.

Summary:
Shift an older “invalid file” test to get a consistent naming for these tests.

Bugs found by afl-fuzz

Reviewers: rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6945

llvm-svn: 226219

40139500

[Object] Add SF_Exported flag. This flag will be set on all symbols that would · 7e0692b6

Lang Hames authored Jan 15, 2015

be exported from a dylib if their containing object file were linked into one.

No test case: No command line tools query this flag, and there are no Object
unit tests.

llvm-svn: 226217

7e0692b6

Revert r226201 (Add a new pass "inductive range check elimination") · 7f62ac8e

Sanjoy Das authored Jan 15, 2015

The change used C++11 features not supported by MSVC 2012.  I will fix
the change to use things supported MSVC 2012 and recommit shortly.

llvm-svn: 226216

7f62ac8e

InductiveRangeCheckElimination: Remove extra ';' · f1f72c9e
David Majnemer authored Jan 15, 2015
```
This silences a GCC warning.

llvm-svn: 226215
```
f1f72c9e
Fixing pedantic build warnings. · 204096b5
Andrew Kaylor authored Jan 15, 2015
```
llvm-svn: 226214
```
204096b5
[Hexagon] Fix 226206 by uncommenting required pattern and changing patterns... · c59328e6
Colin LeMahieu authored Jan 15, 2015
```
[Hexagon] Fix 226206 by uncommenting required pattern and changing patterns for simple load-extends.

llvm-svn: 226210
```
c59328e6

[PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls · e2ab0f17

Hal Finkel authored Jan 15, 2015

Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

llvm-svn: 226207

e2ab0f17

[Hexagon] Updating indexed load-extend patterns and changing test to new expected output. · f87697f0
Colin LeMahieu authored Jan 15, 2015
```
llvm-svn: 226206
```
f87697f0

Add a new pass "inductive range check elimination" · 7059e295

Sanjoy Das authored Jan 15, 2015

IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226201

7059e295

Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" · 5ef58eb8

Hal Finkel authored Jan 15, 2015

Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

llvm-svn: 226200

5ef58eb8

Style cleanup of old gc.root lowering code · 66c9fb0d

Philip Reames authored Jan 15, 2015

Use static functions for helpers rather than static member functions.  a) this changes the linking (minor at best), and b) this makes it obvious no object state is involved.

llvm-svn: 226198

66c9fb0d

R600/SI: Improve fpext / fptrunc test coverage · 59b09ab9
Matt Arsenault authored Jan 15, 2015
```
llvm-svn: 226197
```
59b09ab9
clang-format GCStrategy.cpp & GCRootLowering.cpp (NFC) · b8714416
Philip Reames authored Jan 15, 2015
```
llvm-svn: 226196
```
b8714416

Split GCStrategy.cpp into two files (NFC) · f27f3738

Philip Reames authored Jan 15, 2015

This preparation for an update to http://reviews.llvm.org/D6811.  GCStrategy.cpp will hopefully be moving into IR/, where as the lowering logic needs to stay in CodeGen/

llvm-svn: 226195

f27f3738

[Hexagon] Removing old versions of vsplice, valign, cl0, ct0 and updating... · 538b8581
Colin LeMahieu authored Jan 15, 2015
```
[Hexagon] Removing old versions of vsplice, valign, cl0, ct0 and updating references to new versions.

llvm-svn: 226194
```
538b8581

R600/SI: Unify VOP2 instructions which are VOP3-only on VI · f0b130ac

Marek Olsak authored Jan 15, 2015

This removes some duplicated classes and definitions.

These instructions are defined:
  _e32 // pseudo
  _e32_si
  _e64 // pseudo
  _e64_si
  _e64_vi

llvm-svn: 226191

f0b130ac

R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VI · c5368505
Marek Olsak authored Jan 15, 2015
```
llvm-svn: 226190
```
c5368505