- Mar 22, 2013
-
-
Michel Danzer authored
Fixes wrong lighting in some corner cases with r600g and radeonsi, e.g. manifested by failure of two piglit/glean tests and intermittent black patches in many apps. Tested on SI and RS880. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62012 [radeonsi] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58150 [r600g] NOTE: This is a candidate for the Mesa stable branch. Reviewed-by:
Christian König <christian.koenig@amd.com> llvm-svn: 177730
-
Kostya Serebryany authored
Before: the function name was stored by the compiler as a constant string and the run-time was printing it. Now: the PC is stored instead and the run-time prints the full symbolized frame. This adds a couple of instructions into every function with non-empty stack frame, but also reduces the binary size because we store less strings (I saw 2% size reduction). This change bumps the asan ABI version to v3. llvm part. Example of report (now): ==31711==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffa77cf1c5 at pc 0x41feb0 bp 0x7fffa77cefb0 sp 0x7fffa77cefa8 READ of size 1 at 0x7fffa77cf1c5 thread T0 #0 0x41feaf in Frame0(int, char*, char*, char*) stack-oob-frames.cc:20 #1 0x41f7ff in Frame1(int, char*, char*) stack-oob-frames.cc:24 #2 0x41f477 in Frame2(int, char*) stack-oob-frames.cc:28 #3 0x41f194 in Frame3(int) stack-oob-frames.cc:32 #4 0x41eee0 in main stack-oob-frames.cc:38 #5 0x7f0c5566f76c (/lib/x86_64-linux-gnu/libc.so.6+0x2176c) #6 0x41eb1c (/usr/local/google/kcc/llvm_cmake/a.out+0x41eb1c) Address 0x7fffa77cf1c5 is located in stack of thread T0 at offset 293 in frame #0 0x41f87f in Frame0(int, char*, char*, char*) stack-oob-frames.cc:12 <<<<<<<<<<<<<< this is new This frame has 6 object(s): [32, 36) 'frame.addr' [96, 104) 'a.addr' [160, 168) 'b.addr' [224, 232) 'c.addr' [288, 292) 's' [352, 360) 'd' llvm-svn: 177724
-
Dmitry Vyukov authored
This is required to determine ctor/dtor vs virtual call races. http://llvm-reviews.chandlerc.com/D566 llvm-svn: 177717
-
Evgeniy Stepanov authored
llvm-svn: 177713
-
Arnaud A. de Grandmaison authored
InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep GV, i)) C) to a bit test. The original code used i32, and i64 if legal. This introduced unneeded casts when they aren't legal, or when the index variable i has another type. In order of preference: try to use i's type; use the smallest fitting legal type (using an added DataLayout method); default to i32. A testcase checks that this works when the index gep operand is i16. Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com> Reviewed by : Duncan llvm-svn: 177712
-
Hal Finkel authored
ScavengedRC was a dead private variable (set, but not otherwise used). No functionality change intended. llvm-svn: 177708
-
David Blaikie authored
llvm-svn: 177703
-
Chandler Carruth authored
-time-ir-parsing flag This breaks the layering of the Support library. We can't add an implementation side to IRReader because it refers directly to entities only accessible as part of the IR, AsmParser, and BitcodeReader libraries. It can only be used in a context where all of those libraries will be available. We'll need to find some other way to get this functionality, and hopefully solve the long-standing layering problem of IRReader.h... llvm-svn: 177695
-
Jack Carter authored
For mips a branch an 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. Previously, the code generator did not perform the shift of the immediate branch offset which resulted in wrong instruction opcode. This patch fixes the issue. Contributor: Vladimir Medic llvm-svn: 177687
-
Jack Carter authored
This patch uses the generated instruction info tables to identify memory/load store instructions. After successful matching and based on the operand type and size, it generates additional instructions to the output. Contributor: Vladimir Medic llvm-svn: 177685
-
Hal Finkel authored
As Jakob pointed out in his review of r177423, having a shared ZERO register between the 32- and 64-bit register classes causes this odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended, this adds a ZERO8 register which differentiates the 32- and 64-bit zeros. No functionality change intended. llvm-svn: 177683
-
Bill Wendling authored
How did this ever work? Basically, if you have a function that's inlined into the caller, it may not have any 'call' instructions, but any 'resume' instructions it may have should still be forwarded to the outer (caller's) landing pad. This requires that all of the 'landingpad' instructions in the callee have their clauses merged with the caller's outer 'landingpad' instruction (hence the bit of ugly code in the `forwardResume' method). Testcase in a follow commit to the test-suite repository. <rdar://problem/13360379> & PR15555 llvm-svn: 177680
-
Hal Finkel authored
Thanks to Jakob for isolating the underlying problem from the test case in r177423. The original commit had introduced asymmetric copy operations, but these turned out to be a work-around to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops). llvm-svn: 177679
-
David Blaikie authored
Refactor the filename/directory information in DISubprogram to refer directly to the pair rather than the DIFile. llvm-svn: 177677
-
Bill Wendling authored
llvm-svn: 177675
-
- Mar 21, 2013
-
-
David Blaikie authored
llvm-svn: 177674
-
rdar://problem/13477190Douglas Gregor authored
<rdar://problem/13477190> On Darwin, use DARWIN_USER_TEMP_DIR or DARWIN_USER_CACHE_DIR for the system temporary directory. The DARWIN_USER_TEMP_DIR and DARWIN_USER_CACHE_DIR configuration settings are more idiomatic for Darwin than the TMPDIR environment variable. llvm-svn: 177669
-
Jack Carter authored
The .set directive in the Mips the assembler can be used to set the value of a symbol to an expression. This changes the symbol's value and type to conform to the expression's. Syntax: .set symbol, expression This patch implements the parsing of the above syntax and enables the parser to use defined symbols when parsing operands. Contributor: Vladimir Medic llvm-svn: 177667
-
Hal Finkel authored
This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is similar to that on X86, with the exception that a branch-and-link variant is used to get the right jump address. Credit goes to Bill Schmidt for suggesting the use of the unconditional bcl form (instead of the regular bl instruction) to limit return-address-cache pollution. Benchmarking the speed at -O3 of: static jmp_buf env_sigill; void foo() { __builtin_longjmp(env_sigill,1); } main() { ... for (int i = 0; i < c; ++i) { if (__builtin_setjmp(env_sigill)) { goto done; } else { foo(); } done:; } ... } vs. the same code using the libc setjmp/longjmp functions on a P7 shows that this builtin implementation is ~4x faster with Altivec enabled and ~7.25x faster with Altivec disabled. This comparison is somewhat unfair because the libc version must also save/restore the VSX registers which we don't yet support. llvm-svn: 177666
-
David Blaikie authored
llvm-svn: 177661
-
Hal Finkel authored
Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally callee-preserved and handled by special code this has never before been necessary. However, this capability will be required by a forthcoming commit adding SjLj support. llvm-svn: 177654
-
Hal Finkel authored
The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame pointer (r31) will be used. When it seemed as through the frame pointer would not be used, the stack pointer (r1) was used instead. Unfortunately, because the stack size is not yet known, this does not work. Instead, this change introduces new always-reserved pseudo-registers (FP and FP8) that are replaced during prologue insertion with the real frame-pointer register (either r1 or r31). It is important that this intrinsic always return a valid frame address because it is used by Clang to store the frame address as part of code generation for __builtin_setjmp. llvm-svn: 177653
-
Renato Golin authored
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision floating point operations with NEON unless unsafe-math is turned on. The equivalent VFP instructions are IEEE 754 compliant, but in some cores they're much slower, so some archs/OSs might still request it to be on by default, such as Swift and Darwin. llvm-svn: 177651
-
Chandler Carruth authored
header. This method is called in the hot path for *many* passes, SROA is what caught my interest. A common pattern is that which branch of the switch should be taken is known in the callsite and so it is a very good candidate for inlining and simplification. Moving it into the header allows the optimizer to fold a lot of boring, repeatitive code in callers of this routine. I'm seeing pretty significant speedups in parts of SROA and I suspect other passes will see similar speedups if they end up working with type sizes frequently. I've not seen any significant growth of the binaries as a consequence, but let me know if you see anything suspicious here. llvm-svn: 177632
-
Chandler Carruth authored
The key part of this is ensuring that name prefixes remain in a Twine form until we get to a point where we can nuke them under NDEBUG. This is tricky using the old APIs as they played fast and loose with Twine, which is prone to serious error. The inserter is much cleaner as it is actually in the call stack leading to the setName call, and so has a good opportunity to prepend the prefix. This matters more than you might imagine because most runs over an alloca find a single partition, and rewrite 3 or 4 instructions referring to it. As a consequence doing this lazily and exclusively with Twine allows the optimizer to delete more of it and shaves another 2% to 3% off of the release build's SROA run time for PR15412. I also think the APIs are cleaner, and the use of Twine is more reliable, so I consider it a win-win despite the churn required to reach this state. llvm-svn: 177631
-
Evgeniy Stepanov authored
llvm-svn: 177630
-
Meador Inge authored
The 'Modified' variable should have been removed from SimplifyLibCalls in r177619, but was missed. This commit removes it. llvm-svn: 177622
-
Matt Arsenault authored
llvm-svn: 177620
-
Meador Inge authored
The simplify-libcalls pass implemented a doInitialization hook to infer function prototype attributes for well-known functions. Given that the simplify-libcalls pass is going away *and* that the functionattrs pass is already in place to deduce function attributes, I am moving this logic to the functionattrs pass. This approach was discussed during patch review: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html. llvm-svn: 177619
-
Jakob Stoklund Olesen authored
llvm-svn: 177611
-
David Blaikie authored
This removes the DICompileUnit special case from DIScope. llvm-svn: 177610
-
Jakub Staszak authored
They are generally faster (at least not slower) than post-inc, post-dec. llvm-svn: 177608
-
Jakub Staszak authored
llvm-svn: 177607
-
Justin Holewinski authored
llvm-svn: 177600
-
Jakob Stoklund Olesen authored
It's not yet clear if these instructions need a more careful model. llvm-svn: 177599
-
Jakob Stoklund Olesen authored
This is used for all the expensive system instructions. llvm-svn: 177598
-
- Mar 20, 2013
-
-
Nadav Rotem authored
When computing the demanded bits of Load SDNodes, make sure that we are looking at the loaded-value operand and not the ptr result (in case of pre-inc loads). rdar://13348420 llvm-svn: 177596
-
David Blaikie authored
llvm-svn: 177595
-
Jakob Stoklund Olesen authored
llvm-svn: 177592
-
Jakob Stoklund Olesen authored
llvm-svn: 177591
-