Commits · a2e28156b4c5b1c35f2a242c07b2251fa9550b34 · Roger Ferrer / llvm-epi-0.8

Mar 22, 2013

R600: Use legacy (0 * anything = 0) MUL instructions for pow intrinsics · a2e28156

Michel Danzer authored Mar 22, 2013

Fixes wrong lighting in some corner cases with r600g and radeonsi, e.g.
manifested by failure of two piglit/glean tests and intermittent black
patches in many apps.

Tested on SI and RS880.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62012 [radeonsi]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58150

 [r600g]

NOTE: This is a candidate for the Mesa stable branch.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 177730

a2e28156

[asan] Change the way we report the alloca frame on stack-buff-overflow. · cdd35a90

Kostya Serebryany authored Mar 22, 2013

Before: the function name was stored by the compiler as a constant string
and the run-time was printing it.
Now: the PC is stored instead and the run-time prints the full symbolized frame.
This adds a couple of instructions into every function with non-empty stack frame,
but also reduces the binary size because we store less strings (I saw 2% size reduction).
This change bumps the asan ABI version to v3.

llvm part.

Example of report (now):
==31711==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffa77cf1c5 at pc 0x41feb0 bp 0x7fffa77cefb0 sp 0x7fffa77cefa8
READ of size 1 at 0x7fffa77cf1c5 thread T0
    #0 0x41feaf in Frame0(int, char*, char*, char*) stack-oob-frames.cc:20
    #1 0x41f7ff in Frame1(int, char*, char*) stack-oob-frames.cc:24
    #2 0x41f477 in Frame2(int, char*) stack-oob-frames.cc:28
    #3 0x41f194 in Frame3(int) stack-oob-frames.cc:32
    #4 0x41eee0 in main stack-oob-frames.cc:38
    #5 0x7f0c5566f76c (/lib/x86_64-linux-gnu/libc.so.6+0x2176c)
    #6 0x41eb1c (/usr/local/google/kcc/llvm_cmake/a.out+0x41eb1c)
Address 0x7fffa77cf1c5 is located in stack of thread T0 at offset 293 in frame
    #0 0x41f87f in Frame0(int, char*, char*, char*) stack-oob-frames.cc:12  <<<<<<<<<<<<<< this is new
  This frame has 6 object(s):
    [32, 36) 'frame.addr'
    [96, 104) 'a.addr'
    [160, 168) 'b.addr'
    [224, 232) 'c.addr'
    [288, 292) 's'
    [352, 360) 'd'

llvm-svn: 177724

cdd35a90

tsan: handle vptr loads specially · 55e63ef4

Dmitry Vyukov authored Mar 22, 2013

This is required to determine ctor/dtor vs virtual call races.
http://llvm-reviews.chandlerc.com/D566

llvm-svn: 177717

55e63ef4

Fix llvm::removeUnreachableBlocks to handle unreachable loops. · 2a066afc
Evgeniy Stepanov authored Mar 22, 2013
```
llvm-svn: 177713
```
2a066afc

InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep... · f364bc63

Arnaud A. de Grandmaison authored Mar 22, 2013

InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep GV, i)) C) to a bit test.

The original code used i32, and i64 if legal. This introduced unneeded
casts when they aren't legal, or when the index variable i has another
type. In order of preference: try to use i's type; use the smallest
fitting legal type (using an added DataLayout method); default to i32.
A testcase checks that this works when the index gep operand is i16.

Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com>
Reviewed by : Duncan

llvm-svn: 177712

f364bc63

Remove ScavengedRC from RegisterScavenging · 7dbe0f06

Hal Finkel authored Mar 22, 2013

ScavengedRC was a dead private variable (set, but not otherwise used). No
functionality change intended.

llvm-svn: 177708

7dbe0f06

Reorder the DIFile field in DILexicalBlock to become a prefix common with other DIScopes · f333dc95
David Blaikie authored Mar 22, 2013
```
llvm-svn: 177703
```
f333dc95

Revert r177543: Add timing of the IR parsing code with a new · 0a9875ab

Chandler Carruth authored Mar 22, 2013

-time-ir-parsing flag

This breaks the layering of the Support library. We can't add an
implementation side to IRReader because it refers directly to entities
only accessible as part of the IR, AsmParser, and BitcodeReader
libraries. It can only be used in a context where all of those libraries
will be available.

We'll need to find some other way to get this functionality, and
hopefully solve the long-standing layering problem of IRReader.h...

llvm-svn: 177695

0a9875ab

Fix the invalid opcode for Mips branch instructions in the assembler · 4f69a0f2

Jack Carter authored Mar 22, 2013

For mips a branch an 18-bit signed offset (the 16-bit 
offset field shifted left 2 bits) is added to the 
address of the instruction following the branch 
(not the branch itself), in the branch delay slot, 
to form a PC-relative effective target address. 

Previously, the code generator did not perform the 
shift of the immediate branch offset which resulted 
in wrong instruction opcode. This patch fixes the issue.

Contributor: Vladimir Medic
llvm-svn: 177687

4f69a0f2

This patch that enables the Mips assembler to use symbols for offset for instructions · 9e65aa35

Jack Carter authored Mar 22, 2013

This patch uses the generated instruction info tables to 
identify memory/load store instructions.
After successful matching and based on the operand type 
and size, it generates additional instructions to the output.

Contributor: Vladimir Medic
llvm-svn: 177685

9e65aa35

Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class · f70c41ea

Hal Finkel authored Mar 21, 2013

As Jakob pointed out in his review of r177423, having a shared ZERO
register between the 32- and 64-bit register classes causes this
odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended,
this adds a ZERO8 register which differentiates the 32- and 64-bit
zeros.

No functionality change intended.

llvm-svn: 177683

f70c41ea

Always forward 'resume' instructions to the outter landing pad. · 173c71ff

Bill Wendling authored Mar 21, 2013

How did this ever work?

Basically, if you have a function that's inlined into the caller, it may not
have any 'call' instructions, but any 'resume' instructions it may have should
still be forwarded to the outer (caller's) landing pad. This requires that all
of the 'landingpad' instructions in the callee have their clauses merged with
the caller's outer 'landingpad' instruction (hence the bit of ugly code in the
`forwardResume' method).

Testcase in a follow commit to the test-suite repository.

<rdar://problem/13360379> & PR15555

llvm-svn: 177680

173c71ff

Fix a register-class comparison bug in PPCCTRLoops · 891671af

Hal Finkel authored Mar 21, 2013

Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).

llvm-svn: 177679

891671af

Refactor the filename/directory information in DISubprogram to refer directly... · 5ef3fcb7
David Blaikie authored Mar 21, 2013
```
Refactor the filename/directory information in DISubprogram to refer directly to the pair rather than the DIFile.

llvm-svn: 177677
```
5ef3fcb7
Add a query to tell if a landing pad has a catch-all. · d254ab22
Bill Wendling authored Mar 21, 2013
```
llvm-svn: 177675
```
d254ab22

Mar 21, 2013

Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes · 0d7d62e4
David Blaikie authored Mar 21, 2013
```
llvm-svn: 177674
```
0d7d62e4

<rdar://problem/13477190 > On Darwin, use DARWIN_USER_TEMP_DIR or... · a86ddf04

Douglas Gregor authored Mar 21, 2013

<rdar://problem/13477190> On Darwin, use DARWIN_USER_TEMP_DIR or DARWIN_USER_CACHE_DIR for the system temporary directory.

The DARWIN_USER_TEMP_DIR and DARWIN_USER_CACHE_DIR configuration
settings are more idiomatic for Darwin than the TMPDIR environment
variable.

llvm-svn: 177669

a86ddf04

This patch enables the Mips .set directive to define aliases · d76b2376

Jack Carter authored Mar 21, 2013

The .set directive in the Mips the assembler can be 
used to set the value of a symbol to an expression. 
This changes the symbol's value and type to conform 
to the expression's.

Syntax: .set symbol, expression

This patch implements the parsing of the above syntax 
and enables the parser to use defined symbols when 
parsing operands.

Contributor: Vladimir Medic
llvm-svn: 177667

d76b2376

Implement builtin_{setjmp/longjmp} on PPC · 756810fe

Hal Finkel authored Mar 21, 2013

This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
                __builtin_longjmp(env_sigill,1);
}

main() {
	...

        for (int i = 0; i < c; ++i) {
                if (__builtin_setjmp(env_sigill)) {
                        goto done;
                } else {
                        foo();
                }

done:;
        }

	...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

llvm-svn: 177666

756810fe

Remove unused field in DISubprogram · cc8d0901
David Blaikie authored Mar 21, 2013
```
llvm-svn: 177661
```
cc8d0901

Add support for spilling VRSAVE on PPC · a1431df5

Hal Finkel authored Mar 21, 2013

Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.

llvm-svn: 177654

a1431df5

Correct PPC FRAMEADDR lowering using a pseudo-register · aa03c03a

Hal Finkel authored Mar 21, 2013

The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

llvm-svn: 177653

aa03c03a

Avoid NEON SP-FP unless unsafe-math or Darwin · b4dd6c59

Renato Golin authored Mar 21, 2013

NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.

llvm-svn: 177651

b4dd6c59

Hoist the definition of getTypeSizeInBits to be inlinable and in the · df973ed6

Chandler Carruth authored Mar 21, 2013

header.

This method is called in the hot path for *many* passes, SROA is what
caught my interest. A common pattern is that which branch of the switch
should be taken is known in the callsite and so it is a very good
candidate for inlining and simplification. Moving it into the header
allows the optimizer to fold a lot of boring, repeatitive code in
callers of this routine.

I'm seeing pretty significant speedups in parts of SROA and I suspect
other passes will see similar speedups if they end up working with type
sizes frequently. I've not seen any significant growth of the binaries
as a consequence, but let me know if you see anything suspicious here.

llvm-svn: 177632

df973ed6

[SROA] Prefix names using a custom IRBuilder inserter. · 34f0c7fc

Chandler Carruth authored Mar 21, 2013

The key part of this is ensuring that name prefixes remain in a Twine
form until we get to a point where we can nuke them under NDEBUG. This
is tricky using the old APIs as they played fast and loose with Twine,
which is prone to serious error. The inserter is much cleaner as it is
actually in the call stack leading to the setName call, and so has
a good opportunity to prepend the prefix.

This matters more than you might imagine because most runs over an
alloca find a single partition, and rewrite 3 or 4 instructions
referring to it. As a consequence doing this lazily and exclusively with
Twine allows the optimizer to delete more of it and shaves another 2% to
3% off of the release build's SROA run time for PR15412. I also think
the APIs are cleaner, and the use of Twine is more reliable, so
I consider it a win-win despite the churn required to reach this state.

llvm-svn: 177631

34f0c7fc

[msan] Add an option to disable poisoning of shadow for undef values. · a9a962ca
Evgeniy Stepanov authored Mar 21, 2013
```
llvm-svn: 177630
```
a9a962ca

simplify-libcalls: Removed unused variable · cf691565

Meador Inge authored Mar 21, 2013

The 'Modified' variable should have been removed from SimplifyLibCalls
in r177619, but was missed.  This commit removes it.

llvm-svn: 177622

cf691565

Fix missing std::. Not sure how this compiles for anyone else. · 4ab769f4
Matt Arsenault authored Mar 21, 2013
```
llvm-svn: 177620
```
4ab769f4

Move library call prototype attribute inference to functionattrs · 6b6a161c

Meador Inge authored Mar 21, 2013

The simplify-libcalls pass implemented a doInitialization hook to infer
function prototype attributes for well-known functions.  Given that the
simplify-libcalls pass is going away *and* that the functionattrs pass
is already in place to deduce function attributes, I am moving this logic
to the functionattrs pass.  This approach was discussed during patch
review:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.

llvm-svn: 177619

6b6a161c

Add a WriteMicrocoded for ancient microcoded instructions. · 5891cf97
Jakob Stoklund Olesen authored Mar 21, 2013
```
llvm-svn: 177611
```
5891cf97
Debug info: refactor the first field of DICompileUnit to be a raw file/directory pair · efb0d65e
David Blaikie authored Mar 20, 2013
```
This removes the DICompileUnit special case from DIScope.

llvm-svn: 177610
```
efb0d65e

Use pre-inc, pre-dec when possible. · 773be0ce

Jakub Staszak authored Mar 20, 2013

They are generally faster (at least not slower) than post-inc, post-dec.

llvm-svn: 177608

773be0ce

Remove 'else' after 'return'. · fa41def6
Jakub Staszak authored Mar 20, 2013
```
llvm-svn: 177607
```
fa41def6
Make variable name more explicit and eliminate redundant lookup in SDNodeOrdering · 7478f3d7
Justin Holewinski authored Mar 20, 2013
```
llvm-svn: 177600
```
7478f3d7
Model prefetches and barriers as loads. · 712f6748
Jakob Stoklund Olesen authored Mar 20, 2013
```
It's not yet clear if these instructions need a more careful model.

llvm-svn: 177599
```
712f6748
Add a catch-all WriteSystem SchedWrite type. · 5b535c96
Jakob Stoklund Olesen authored Mar 20, 2013
```
This is used for all the expensive system instructions.

llvm-svn: 177598
```
5b535c96

Mar 20, 2013
- When computing the demanded bits of Load SDNodes, make sure that we are... · 4536d582
  Nadav Rotem authored Mar 20, 2013
```
When computing the demanded bits of Load SDNodes, make sure that we are looking at the loaded-value operand and not the ptr result (in case of pre-inc loads).
rdar://13348420

llvm-svn: 177596
```
  4536d582
- Debug Info: Swap the 2nd and 3rd parameters to DICompileUnit to match the common DIScope prefix · 3b88852a
  David Blaikie authored Mar 20, 2013
```
llvm-svn: 177595
```
  3b88852a
- Annotate the remaining SSE MOV instructions. · cd4ebb76
  Jakob Stoklund Olesen authored Mar 20, 2013
```
llvm-svn: 177592
```
  cd4ebb76
- Annotate SSE horizontal and integer instructions. · c6dc70d8
  Jakob Stoklund Olesen authored Mar 20, 2013
```
llvm-svn: 177591
```
  c6dc70d8