- Mar 15, 2013
-
-
Eric Christopher authored
issues. llvm-svn: 177136
-
Eric Christopher authored
llvm-svn: 177135
-
Nadav Rotem authored
llvm-svn: 177131
-
Nadav Rotem authored
llvm-svn: 177130
-
David Blaikie authored
(these were added in r177089) llvm-svn: 177129
-
Akira Hatanaka authored
llvm-svn: 177128
-
- Mar 14, 2013
-
-
Andrew Trick authored
This is the other half of r177122 that I meant to commit at the same time. llvm-svn: 177123
-
Jakob Stoklund Olesen authored
The new InstrSchedModel is easier to use than the instruction itineraries. It will be used to model instruction latency and throughput in modern Intel microarchitectures like Sandy Bridge. InstrSchedModel should be able to coexist with instruction itinerary classes, but for cleanliness we should switch the Atom processor model to the new InstrSchedModel as well. llvm-svn: 177122
-
Reed Kotler authored
See the Mips16ISetLowering.cpp patch to see a use of this. For now now the extra code in Mips16ISetLowering.cpp is a nop but is used for test purposes. Mips32 registers are setup and then removed and then the Mips16 registers are setup. Normally you need to add register classes and then call computeRegisterProperties. llvm-svn: 177120
-
Arnold Schwaighofer authored
Also remove some unneeded function attributes. llvm-svn: 177114
-
Chad Rosier authored
the win64 calling convention. rdar://13423768 llvm-svn: 177113
-
Andrew Trick authored
This allows abitrary groups of processor resources. Using something in a subset automatically counts againts the superset. Currently, this only works if the superset is also a ProcResGroup as opposed to a SuperUnit. This allows SandyBridge to be expressed naturally, which will be checked in shortly. def SBPort01 : ProcResGroup<[SBPort0, SBPort1]>; def SBPort15 : ProcResGroup<[SBPort1, SBPort5]>; def SBPort23 : ProcResGroup<[SBPort2, SBPort3]>; def SBPort015 : ProcResGroup<[SBPort0, SBPort1, SBPort5]>; llvm-svn: 177112
-
Hal Finkel authored
This is a generic function (derived from PEI); moving it into MachineFrameInfo eliminates a current redundancy between the ARM and AArch64 backends, and will allow it to be used by the PowerPC target code. No functionality change intended. llvm-svn: 177111
-
Hal Finkel authored
Add the current PEI register scavenger as a parameter to the processFunctionBeforeFrameFinalized callback. This change is necessary in order to allow the PowerPC target code to set the register scavenger frame index after the save-area offset adjustments performed by processFunctionBeforeFrameFinalized. Only after these adjustments have been made is it possible to estimate the size of the stack frame. llvm-svn: 177108
-
Hal Finkel authored
Make requiresFrameIndexScavenging return true, and create virtual registers in the spilling code instead of using the register scavenger directly. This makes the target-level code simpler, and importantly, delays the scavenging until after callee-saved register processing (which will be important for later changes). Also cleans up trackLivenessAfterRegAlloc (makes it inline in the header with the other related functions). This makes it clear that it always returns true. No functionality change intended. llvm-svn: 177107
-
Hal Finkel authored
We used to add a spill slot for the register scavenger whenever the function has a frame pointer. This is unnecessarily conservative: We may need the spill slot for dynamic stack allocations, and functions with dynamic stack allocations always have a FP, but we might also have a FP for other reasons (such as the user explicitly disabling frame-pointer elimination), and we don't necessarily need a spill slot for those functions. The structsinregs test needed adjustment because it disables FP elimination. llvm-svn: 177106
-
Arnold Schwaighofer authored
By terrible I mean we store/load from the stack. This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update) where we decide to vectorize a loop with a VF of 8 resulting in a 25% degradation on a cortex-a8. LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32 LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32 The bug that tracks the CodeGen part is PR14868. radar://13403975 llvm-svn: 177105
-
Akira Hatanaka authored
No functionality changes. llvm-svn: 177104
-
Jyotsna Verma authored
We are warning the user about the alignment, so we should not assert. llvm-svn: 177103
-
Arnold Schwaighofer authored
llvm-svn: 177102
-
Akira Hatanaka authored
Patch by Stephen Hines. llvm-svn: 177101
-
Arnold Schwaighofer authored
We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098
-
Akira Hatanaka authored
llvm-svn: 177096
-
Akira Hatanaka authored
No intended functionality changes. llvm-svn: 177095
-
Hal Finkel authored
I don't think that it is otherwise clear how the overlapping offsets are processed into distinct spill slots. Comment that this is done in processFunctionBeforeFrameFinalized. llvm-svn: 177094
-
Akira Hatanaka authored
llvm-svn: 177092
-
Akira Hatanaka authored
llvm-svn: 177090
-
Akira Hatanaka authored
mips16 and MipsSEDAGToDAGISel is for mips32/64. No functionality changes. llvm-svn: 177089
-
Shuxin Yang authored
Rules include: 1)1 x*y +/- x*z => x*(y +/- z) (the order of operands dosen't matter) 2) y/x +/- z/x => (y +/- z)/x The transformation is disabled if the new add/sub expr "y +/- z" is a denormal/naz/inifinity. rdar://12911472 llvm-svn: 177088
-
Adrian Prantl authored
This is the backend part of a CFE test with the same name. llvm-svn: 177087
-
Vincent Lejeune authored
llvm-svn: 177078
-
Alexey Samsonov authored
llvm-svn: 177063
-
Chandler Carruth authored
The fundamental problem is that SROA didn't allow for overly wide loads where the bits past the end of the alloca were masked away and the load was sufficiently aligned to ensure there is no risk of page fault, or other trapping behavior. With such widened loads, SROA would delete the load entirely rather than clamping it to the size of the alloca in order to allow mem2reg to fire. This was exposed by a test case that neatly arranged for GVN to run first, widening certain loads, followed by an inline step, and then SROA which miscompiles the code. However, I see no reason why this hasn't been plaguing us in other contexts. It seems deeply broken. Diagnosing all of the above took all of 10 minutes of debugging. The really annoying aspect is that fixing this completely breaks the pass. ;] There was an implicit reliance on the fact that no loads or stores extended past the alloca once we decided to rewrite them in the final stage of SROA. This was used to encode information about whether the loads and stores had been split across multiple partitions of the original alloca. That required threading explicit tracking of whether a *use* of a partition is split across multiple partitions. Once that was done, another problem arose: we allowed splitting of integer loads and stores iff they were loads and stores to the entire alloca. This is a really arbitrary limitation, and splitting at least some integer loads and stores is crucial to maximize promotion opportunities. My first attempt was to start removing the restriction entirely, but currently that does Very Bad Things by causing *many* common alloca patterns to be fully decomposed into i8 operations and lots of or-ing together to produce larger integers on demand. The code bloat is terrifying. That is still the right end-goal, but substantial work must be done to either merge partitions or ensure that small i8 values are eagerly merged in some other pass. Sadly, figuring all this out took essentially all the time and effort here. So the end result is that we allow splitting only when the load or store at least covers the alloca. That ensures widened loads and stores don't hurt SROA, and that we don't rampantly decompose operations more than we have previously. All of this was already fairly well tested, and so I've just updated the tests to cover the wide load behavior. I can add a test that crafts the pass ordering magic which caused the original PR, but that seems really brittle and to provide little benefit. The fundamental problem is that widened loads should Just Work. llvm-svn: 177055
-
Joerg Sonnenberger authored
linkers to interact with GNU ld. llvm-svn: 177016
-
Craig Topper authored
llvm-svn: 177015
-
Craig Topper authored
Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185. llvm-svn: 177014
-
Craig Topper authored
Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior. llvm-svn: 177011
-
Michael Liao authored
- Fix the typo on type checking llvm-svn: 177010
-
Jiong Wang authored
llvm-svn: 177009
-
Nick Lewycky authored
something else and forgot to remove. llvm-svn: 177007
-