Commits · 31f4354c75e22f41586896ea58845519c838c91e · Roger Ferrer / llvm-epi

Mar 15, 2013
- Turn anonymous type in anonymous union warning back on after cleaning up · 31f4354c
  Eric Christopher authored Mar 15, 2013
```
issues.

llvm-svn: 177136
```
  31f4354c
- Silence anonymous type in anonymous union warnings. · 8996c5d4
  Eric Christopher authored Mar 15, 2013
```
llvm-svn: 177135
```
  8996c5d4
- Add a triple to the test. · 4a4827ce
  Nadav Rotem authored Mar 15, 2013
```
llvm-svn: 177131
```
  4a4827ce
- Unaligned loads should use the VMOVUPS opcode. · adfa5eaf
  Nadav Rotem authored Mar 14, 2013
```
llvm-svn: 177130
```
  adfa5eaf
- Remove some unused variables to clean the Clang -Werror build · 6e5e0316
  David Blaikie authored Mar 14, 2013
```
(these were added in r177089)

llvm-svn: 177129
```
  6e5e0316
- [mips] Set isAllocatable bit of unallocatable register classes to 0. · b83b2eda
  Akira Hatanaka authored Mar 14, 2013
```
llvm-svn: 177128
```
  b83b2eda
Mar 14, 2013

Fix r177112: Add ProcResGroup. · a5c747b0

Andrew Trick authored Mar 14, 2013

This is the other half of r177122 that I meant to commit at the same time.

llvm-svn: 177123

a5c747b0

Prepare for adding InstrSchedModel annotations to X86 instructions. · 71236682

Jakob Stoklund Olesen authored Mar 14, 2013

The new InstrSchedModel is easier to use than the instruction
itineraries. It will be used to model instruction latency and throughput
in modern Intel microarchitectures like Sandy Bridge.

InstrSchedModel should be able to coexist with instruction itinerary
classes, but for cleanliness we should switch the Atom processor model
to the new InstrSchedModel as well.

llvm-svn: 177122

71236682

Add a new method which enables one to change register classes. · fafaa9d9

Reed Kotler authored Mar 14, 2013

See the Mips16ISetLowering.cpp patch to see a use of this.
For now now the extra code in Mips16ISetLowering.cpp is a nop but is
used for test purposes. Mips32 registers are setup and then removed and
then the Mips16 registers are setup. 

Normally you need to add register classes and then call
computeRegisterProperties.

llvm-svn: 177120

fafaa9d9

LoopVectorizer: Insert some white space to make test case more readable · 9b55e31b
Arnold Schwaighofer authored Mar 14, 2013
```
Also remove some unneeded function attributes.

llvm-svn: 177114
```
9b55e31b
[fast-isel] The X86FastISel::FastLowerArguments function doesn't properly handle · 4b54f594
Chad Rosier authored Mar 14, 2013
```
the win64 calling convention.
rdar://13423768

llvm-svn: 177113
```
4b54f594

MachineModel: Add a ProcResGroup class. · 4e67cba8

Andrew Trick authored Mar 14, 2013

This allows abitrary groups of processor resources. Using something in
a subset automatically counts againts the superset. Currently, this
only works if the superset is also a ProcResGroup as opposed to a
SuperUnit.

This allows SandyBridge to be expressed naturally, which will be
checked in shortly.

def SBPort01 : ProcResGroup<[SBPort0, SBPort1]>;
def SBPort15 : ProcResGroup<[SBPort1, SBPort5]>;
def SBPort23  : ProcResGroup<[SBPort2, SBPort3]>;
def SBPort015 : ProcResGroup<[SBPort0, SBPort1, SBPort5]>;

llvm-svn: 177112

4e67cba8

Move estimateStackSize from ARM into MachineFrameInfo · 628ba128

Hal Finkel authored Mar 14, 2013

This is a generic function (derived from PEI); moving it into
MachineFrameInfo eliminates a current redundancy between the ARM and AArch64
backends, and will allow it to be used by the PowerPC target code.

No functionality change intended.

llvm-svn: 177111

628ba128

Provide the register scavenger to processFunctionBeforeFrameFinalized · 5a765fdd

Hal Finkel authored Mar 14, 2013

Add the current PEI register scavenger as a parameter to the
processFunctionBeforeFrameFinalized callback.

This change is necessary in order to allow the PowerPC target code to
set the register scavenger frame index after the save-area offset
adjustments performed by processFunctionBeforeFrameFinalized. Only
after these adjustments have been made is it possible to estimate
the size of the stack frame.

llvm-svn: 177108

5a765fdd

Use frame-index scavenging for PPC register spilling · ad26f4de

Hal Finkel authored Mar 14, 2013

Make requiresFrameIndexScavenging return true, and create virtual registers in
the spilling code instead of using the register scavenger directly. This makes
the target-level code simpler, and importantly, delays the scavenging until
after callee-saved register processing (which will be important for later
changes).

Also cleans up trackLivenessAfterRegAlloc (makes it inline in the header with
the other related functions). This makes it clear that it always returns true.

No functionality change intended.

llvm-svn: 177107

ad26f4de

Not all PPC functions with a frame pointer need a RS spill slot · e987a311

Hal Finkel authored Mar 14, 2013

We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.

The structsinregs test needed adjustment because it disables FP elimination.

llvm-svn: 177106

e987a311

ARM cost model: Increase cost of some vector selects we do terrible on · 8070b382

Arnold Schwaighofer authored Mar 14, 2013

By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

llvm-svn: 177105

8070b382

[mips] Fix filename in comment and delete unnecessary lines of code. · 44ebe001
Akira Hatanaka authored Mar 14, 2013
```
No functionality changes.

llvm-svn: 177104
```
44ebe001
Hexagon: Removed asserts regarding alignment and offset. · ec613665
Jyotsna Verma authored Mar 14, 2013
```
We are warning the user about the alignment, so we should not assert.

llvm-svn: 177103
```
ec613665
Add missing asserts flag to test - it uses debug flags · 4991ce9d
Arnold Schwaighofer authored Mar 14, 2013
```
llvm-svn: 177102
```
4991ce9d
Android uses cacheflush(long start, long end, long flags) for MIPS. · 7239a600
Akira Hatanaka authored Mar 14, 2013
```
Patch by Stephen Hines.

llvm-svn: 177101
```
7239a600

LoopVectorize: Invert case when we use a vector cmp value to query select cost · c63cf3a0

Arnold Schwaighofer authored Mar 14, 2013

We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

llvm-svn: 177098

c63cf3a0

Add back lines which were accidentally deleted in CMakeLists.txt. · 7cc48f45
Akira Hatanaka authored Mar 14, 2013
```
llvm-svn: 177096
```
7cc48f45
[mips] Define function MipsSEDAGToDAGISel::selectAddESubE. · b8835b82
Akira Hatanaka authored Mar 14, 2013
```
No intended functionality changes.

llvm-svn: 177095
```
b8835b82

Add a comment about overlapping PPC frame offsets · ad92b465

Hal Finkel authored Mar 14, 2013

I don't think that it is otherwise clear how the overlapping offsets
are processed into distinct spill slots. Comment that this is done
in processFunctionBeforeFrameFinalized.

llvm-svn: 177094

ad92b465

[mips] Rename functions and variables to start with proper case. · 040d2255
Akira Hatanaka authored Mar 14, 2013
```
llvm-svn: 177092
```
040d2255
Add header file MipsISelDAGToDAG.h. · 29a0da35
Akira Hatanaka authored Mar 14, 2013
```
llvm-svn: 177090
```
29a0da35
[mips] Define two subclasses of MipsDAGToDAGISel. Mips16DAGToDAGISel is for · 30a84787
Akira Hatanaka authored Mar 14, 2013
```
mips16 and MipsSEDAGToDAGISel is for mips32/64. 

No functionality changes.

llvm-svn: 177089
```
30a84787

Perform factorization as a last resort of unsafe fadd/fsub simplification. · 2eca602f

Shuxin Yang authored Mar 14, 2013

Rules include:
  1)1 x*y +/- x*z => x*(y +/- z) 
    (the order of operands dosen't matter)

  2) y/x +/- z/x => (y +/- z)/x 

 The transformation is disabled if the new add/sub expr "y +/- z" is a 
denormal/naz/inifinity.

rdar://12911472

llvm-svn: 177088

2eca602f

Test that we emit a DW_AT_location for self captured by a block. · ed6d9554
Adrian Prantl authored Mar 14, 2013
```
This is the backend part of a CFE test with the same name.

llvm-svn: 177087
```
ed6d9554
R600: Factorize code handling Const Read Port limitation · 0a22bc41
Vincent Lejeune authored Mar 14, 2013
```
llvm-svn: 177078
```
0a22bc41
[ASan] emit instrumentation for initialization order checking by default · 819eddc3
Alexey Samsonov authored Mar 14, 2013
```
llvm-svn: 177063
```
819eddc3

PR14972: SROA vs. GVN exposed a really bad bug in SROA. · a1c54bbe

Chandler Carruth authored Mar 14, 2013

The fundamental problem is that SROA didn't allow for overly wide loads
where the bits past the end of the alloca were masked away and the load
was sufficiently aligned to ensure there is no risk of page fault, or
other trapping behavior. With such widened loads, SROA would delete the
load entirely rather than clamping it to the size of the alloca in order
to allow mem2reg to fire. This was exposed by a test case that neatly
arranged for GVN to run first, widening certain loads, followed by an
inline step, and then SROA which miscompiles the code. However, I see no
reason why this hasn't been plaguing us in other contexts. It seems
deeply broken.

Diagnosing all of the above took all of 10 minutes of debugging. The
really annoying aspect is that fixing this completely breaks the pass.
;] There was an implicit reliance on the fact that no loads or stores
extended past the alloca once we decided to rewrite them in the final
stage of SROA. This was used to encode information about whether the
loads and stores had been split across multiple partitions of the
original alloca. That required threading explicit tracking of whether
a *use* of a partition is split across multiple partitions.

Once that was done, another problem arose: we allowed splitting of
integer loads and stores iff they were loads and stores to the entire
alloca. This is a really arbitrary limitation, and splitting at least
some integer loads and stores is crucial to maximize promotion
opportunities. My first attempt was to start removing the restriction
entirely, but currently that does Very Bad Things by causing *many*
common alloca patterns to be fully decomposed into i8 operations and
lots of or-ing together to produce larger integers on demand. The code
bloat is terrifying. That is still the right end-goal, but substantial
work must be done to either merge partitions or ensure that small i8
values are eagerly merged in some other pass. Sadly, figuring all this
out took essentially all the time and effort here.

So the end result is that we allow splitting only when the load or store
at least covers the alloca. That ensures widened loads and stores don't
hurt SROA, and that we don't rampantly decompose operations more than we
have previously.

All of this was already fairly well tested, and so I've just updated the
tests to cover the wide load behavior. I can add a test that crafts the
pass ordering magic which caused the original PR, but that seems really
brittle and to provide little benefit. The fundamental problem is that
widened loads should Just Work.

llvm-svn: 177055

a1c54bbe

Add two of the float related ARM-specific entries for e_flags needed for · 7118befd
Joerg Sonnenberger authored Mar 14, 2013
```
linkers to interact with GNU ld.

llvm-svn: 177016
```
7118befd
Fix the name of a variable to match its declaration. Fixes build failure from r177014. · ba824298
Craig Topper authored Mar 14, 2013
```
llvm-svn: 177015
```
ba824298

Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit... · 87299973

Craig Topper authored Mar 14, 2013

Fix a bug in the calculation of the VEX.B bit for FMA4 rr with the VEX.W bit set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.

llvm-svn: 177014

87299973

Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded... · a66d81d5

Craig Topper authored Mar 14, 2013

Teach X86 MC instruction lowering that VMOVAPSrr and other VEX-encoded register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.

llvm-svn: 177011

a66d81d5

Fix PR15309 · 20d28704
Michael Liao authored Mar 14, 2013
```
- Fix the typo on type checking

llvm-svn: 177010
```
20d28704
test commit: remove blank line. · 5bbb96d7
Jiong Wang authored Mar 14, 2013
```
llvm-svn: 177009
```
5bbb96d7
Remove a change to the debug info in this test, that I made while testing · 3d28d4de
Nick Lewycky authored Mar 14, 2013
```
something else and forgot to remove.

llvm-svn: 177007
```
3d28d4de