- Dec 11, 2012
-
-
Evan Cheng authored
1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do *not* replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791
-
Arnold Schwaighofer authored
Analyse Phis under the starting assumption that they are NoAlias. Recursively look at their inputs. If they MayAlias/MustAlias there must be an input that makes them so. Addresses bug 14351. llvm-svn: 169788
-
- Dec 10, 2012
-
-
Lang Hames authored
InitSections is called before the MCContext is initialized it could cause duplicate temporary symbols to be emitted later (after context initialization resets the temporary label counter). llvm-svn: 169785
-
Eric Christopher authored
llvm-svn: 169780
-
Eric Christopher authored
llvm-svn: 169779
-
Eric Christopher authored
llvm-svn: 169776
-
Nadav Rotem authored
llvm-svn: 169774
-
Tom Stellard authored
llvm-svn: 169773
-
Tom Stellard authored
llvm-svn: 169772
-
Nadav Rotem authored
llvm-svn: 169771
-
Eli Bendersky authored
llvm-svn: 169762
-
Akira Hatanaka authored
getMipsRegisterNumbering and use MCRegisterInfo::getEncodingValue instead. llvm-svn: 169760
-
Eric Christopher authored
going on and makes a lot of the terminology in comments make more sense. llvm-svn: 169758
-
Eric Christopher authored
llvm-svn: 169757
-
Eric Christopher authored
llvm-svn: 169756
-
Bill Wendling authored
The `-mno-red-zone' flag wasn't being propagated to the functions that code coverage generates. This allowed some of them to use the red zone when that wasn't allowed. <rdar://problem/12843084> llvm-svn: 169754
-
Nadav Rotem authored
while (i--) sum+=A[i]; llvm-svn: 169752
-
Eli Bendersky authored
the assembler. This is useful in order to know how the numbers add up, since in particular the Align fragments account for a non-trivial portion of the emitted fragments (especially on -O0 which sets relax-all). llvm-svn: 169747
-
Hal Finkel authored
misched used GetUnderlyingObject in order to break false load/store dependencies, and the -enable-aa-sched-mi feature similarly relied on GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis. Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so (especially due to LSR) all of these mechanisms failed for induction-variable-dependent loads and stores inside loops. This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects (which will recurse through phi and select instructions) in misched. Andy reviewed, tested and simplified this patch; Thanks! llvm-svn: 169744
-
Chandler Carruth authored
Accidental commit... git svn betrayed me. Sorry for the noise. llvm-svn: 169741
-
Chandler Carruth authored
Summary: Not all chips targeted by x86_64 have this feature, but a dramatically increasing number do. Specifying a chip-specific tuning parameter will continue to turn the feature on or off as appropriate for that particular chip, but the generic flag should try to achieve the best performance on the most widely available hardware. Today, the number of chips with fast UA access dwarfs those without in the x86-64 space. Note that this also brings LLVM's code generation for this '-march' flag more in line with that of modern GCCs. CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D195 llvm-svn: 169740
-
Chandler Carruth authored
Thanks to the PaX folks for noticing in review! We need some tests here, any sugestions welcome... llvm-svn: 169739
-
Chandler Carruth authored
Intel chips. The model number rules were determined by inspecting Intel's documentation for their newer chip model numbers. My understanding is that all of the newer Intel chips have fast unaligned memory access, but if anyone is concerned about a particular chip, just shout. No tests updated; it's not clear we have dedicated tests for the chips' various features, but if anyone would like tests (or can point me at some existing ones), I'm happy to oblige. llvm-svn: 169730
-
Chandler Carruth authored
This visitor provides infrastructure for recursively traversing the use-graph of a pointer-producing instruction like an alloca or a malloc. It maintains a worklist of uses to visit, so it can handle very deep recursions. It automatically looks through instructions which simply translate one pointer to another (bitcasts and GEPs). It tracks the offset relative to the original pointer as long as that offset remains constant and exposes it during the visit as an APInt offset. Finally, it performs conservative escape analysis. However, currently it has some limitations that should be addressed going forward: 1) It doesn't handle vectors of pointers. 2) It doesn't provide a cheaper visitor when the constant offset tracking isn't needed. 3) It doesn't support non-instruction pointer values. The current functionality is exactly what is required to implement the SROA pointer-use visitors in terms of this one, rather than in terms of their own ad-hoc base visitor, which was always very poorly specified. SROA has been converted to use this, and the code there deleted which this utility now provides. Technically speaking, using this new visitor allows SROA to handle a few more cases than it previously did. It is now more aggressive in ignoring chains of instructions which look like they would defeat SROA, but in fact do not because they never result in a read or write of memory. While this is "neat", it shouldn't be interesting for real programs as any such chains should have been removed by others passes long before we get to SROA. As a consequence, I've not added any tests for these features -- it shouldn't be part of SROA's contract to perform such heroics. The goal is to extend the functionality of this visitor going forward, and re-use it from passes like ASan that can benefit from doing a detailed walk of the uses of a pointer. Thanks to Ben Kramer for the code review rounds and lots of help reviewing and debugging this patch. llvm-svn: 169728
-
Craig Topper authored
llvm-svn: 169727
-
NAKAMURA Takumi authored
llvm-svn: 169724
-
Chandler Carruth authored
When SROA was evaluating a mixture of i1 and i8 loads and stores, in just a particular case, it would tickle a latent bug where we compared bits to bytes rather than bits to bits. As a consequence of the latent bug, we would allow integers through which were not byte-size multiples, a situation the later rewriting code was never intended to handle. In release builds this could trigger all manner of oddities, but the reported issue in PR14548 was forming invalid bitcast instructions. The only downside of this fix is that it makes it more clear that SROA in its current form is not capable of handling mixed i1 and i8 loads and stores. Sometimes with the previous code this would work by luck, but usually it would crash, so I'm not terribly worried. I'll watch the LNT numbers just to be sure. llvm-svn: 169719
-
- Dec 09, 2012
-
-
Michael Ilseman authored
llvm-svn: 169712
-
Paul Redmond authored
- added function to VectorTargetTransformInfo to query cost of intrinsics - vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc. Reviewed by: Nadav llvm-svn: 169711
-
Michael Ilseman authored
llvm-svn: 169710
-
Paul Redmond authored
llvm-svn: 169709
-
Jakub Staszak authored
No functionality change. llvm-svn: 169703
-
Jakub Staszak authored
llvm-svn: 169701
-
Chandler Carruth authored
This will more closely match the behavior of the new PtrUseVisitor that I am adding. Hopefully this will not change the actual behavior in any way, but by making the processing order more similar help in debugging. llvm-svn: 169697
-
Craig Topper authored
llvm-svn: 169692
-
Shuxin Yang authored
- fix a bug which cause sigfault. - add two testing cases which was causing crash llvm-svn: 169687
-
- Dec 08, 2012
-
-
Craig Topper authored
Teach DAG combine to handle vector logical operations with vectors of all 1s or all 0s. These cases can show up when vectors are split for legalizing. Fix some tests that were dependent on these cases not being combined. llvm-svn: 169684
-
Chandler Carruth authored
There are still bugs in this pass, as well as other issues that are being worked on, but the bugs are crashers that occur pretty easily in the wild. Test cases have been sent to the original commit's review thread. This reverts the commits: r169671: Fix a logic error. r169604: Move the popcnt tests to an X86 subdirectory. r168931: Initial commit adding the pass. llvm-svn: 169683
-
Benjamin Kramer authored
llvm-svn: 169676
-
Shuxin Yang authored
llvm-svn: 169671
-