Commits · d6f46b8af8825be5287a41d9103ba290f34858f5 · Roger Ferrer / llvm-epi-0.8

Aug 29, 2010
- now that loop passes don't use DomFrontier, there is no reason · d6f46b8a
  Chris Lattner authored Aug 29, 2010
```
for the unroller to pretend it supports updating it.  It still
has a horrible hack for DomTree.

llvm-svn: 112444
```
  d6f46b8a
- Make IVUsers iterative instead of recursive. · 3a08ed79
  Dan Gohman authored Aug 29, 2010
```
This has the side effect of reversing the order of most of
IVUser's results.

llvm-svn: 112442
```
  3a08ed79
- Optionally rerun dedicated-register filtering after applying · 002ff89c
  Dan Gohman authored Aug 29, 2010
```
other filtering techniques, as those may allow it to filter
out more obviously unprofitable candidates.

llvm-svn: 112441
```
  002ff89c
- Fix several areas in LSR to do a better job keeping the main · f031792c
  Dan Gohman authored Aug 29, 2010
```
LSRInstance data structures up to date. This fixes some
pessimizations caused by stale data which will be exposed
in an upcoming change.

llvm-svn: 112440
```
  f031792c
- Refactor the three main groups of code out of · e9e0873b
  Dan Gohman authored Aug 29, 2010
```
NarrowSearchSpaceUsingHeuristics into separate functions.

llvm-svn: 112439
```
  e9e0873b
- Delete a bogus check. · 37a0f680
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112438
```
  37a0f680
- Add some comments. · b6a520d6
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112437
```
  b6a520d6
- Move this debug output into GenerateAllReuseFormula, to declutter · bf673e06
  Dan Gohman authored Aug 29, 2010
```
the high-level logic.

llvm-svn: 112436
```
  bf673e06
- Delete an unused declaration. · d366b6d5
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112435
```
  d366b6d5
- Do one lookup instead of two. · 4f13bbfe
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112434
```
  4f13bbfe
- Restructure the {A,+,B}<L> * {C,+,D}<L> folding so that it folds · d1da5cdf
  Dan Gohman authored Aug 29, 2010
```
all applicable addrecs before recursing on getMulExpr, instead of
recursing on getMulExpr for each one.

llvm-svn: 112433
```
  d1da5cdf
- Batch up subtracts along with adds, when analyzing long chains of · 3e6fc189
  Dan Gohman authored Aug 29, 2010
```
operations.

llvm-svn: 112432
```
  3e6fc189
- Micro-optimize GroupByComplexity. · 7712d290
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112431
```
  7712d290
- Hold AddRec->getLoop() in a variable, to make the Mul code more consistent · 0f2de013
  Dan Gohman authored Aug 29, 2010
```
with the Add code.

llvm-svn: 112430
```
  0f2de013
- Rename a variable, for consistency. · 028c1815
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112429
```
  028c1815
- Use iterators instead of indices. · 28a84d4b
  Dan Gohman authored Aug 29, 2010
```
llvm-svn: 112428
```
  28a84d4b
- Fix lowering of INSERT_VECTOR_ELT in SPU. · 1e616572
  Kalle Raiskila authored Aug 29, 2010
```
The IDX was treated as byte index, not element index.

llvm-svn: 112422
```
  1e616572
- Fix whitespaces. No functionality changes. · 8fc2b590
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112421
```
  8fc2b590
- licm preserves the cfg, it doesn't have to explicitly say it · f94f6bb0
  Chris Lattner authored Aug 29, 2010
```
preserves domfrontier.  It does preserve AA though.

llvm-svn: 112419
```
  f94f6bb0
- now that it doesn't use the PromoteMemToReg function, LICM doesn't · abe61ef3
  Chris Lattner authored Aug 29, 2010
```
require DomFrontier.  Dropping this doesn't actually save any runs
of the pass though.

llvm-svn: 112418
```
  abe61ef3
- completely rewrite the memory promotion algorithm in LICM. · 1dc98b47
  Chris Lattner authored Aug 29, 2010
```
Among other things, this uses SSAUpdater instead of 
PromoteMemToReg.

llvm-svn: 112417
```
  1dc98b47
- Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm · d0c05488
  Bob Wilson authored Aug 29, 2010
```
IR add/sub operations with one or both operands sign- or zero-extended.
Auto-upgrade the old intrinsics.

llvm-svn: 112416
```
  d0c05488
- use getUniqueExitBlocks instead of a manual set. · 9c3931a5
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112412
```
  9c3931a5
- A couple of small missed optimizations. · f75de6ea
  Eli Friedman authored Aug 29, 2010
```
llvm-svn: 112411
```
  f75de6ea
- reimplement LICM::sink to use SSAUpdater instead of PromoteMemToReg. · 85bf5421
  Chris Lattner authored Aug 29, 2010
```
This leads to much simpler code.

llvm-svn: 112410
```
  85bf5421
- implement SSAUpdater::RewriteUseAfterInsertions, a helpful form of RewriteUse. · c3fb03e2
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112409
```
  c3fb03e2
- remove dead proto · b50407f1
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112408
```
  b50407f1
- reduce indentation in LICM::sink by using early exits, use · cd96b4df
  Chris Lattner authored Aug 29, 2010
```
getUniqueExitBlocks instead of getExitBlocks and a manual
set to eliminate dupes.

llvm-svn: 112405
```
  cd96b4df
- modernize this pass a bit: use efficient set/map and reduce indentation. · 188cc5a0
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112404
```
  188cc5a0
- when merging two alias sets, the result set is volatile if either · dc8070ed
  Chris Lattner authored Aug 29, 2010
```
of the sets is volatile.  We were dropping the volatile bit of the
merged in set, leading (luckily) to assertions in cases like 
PR7535.  I cannot produce a testcase that repros with opt, but this
is obviously correct.

llvm-svn: 112402
```
  dc8070ed
- more cleanup · eef6b19d
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112401
```
  eef6b19d
- clean this up · afb7074f
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112400
```
  afb7074f
- - Add a parameter to T2I_bin_irs for those patterns which set the S bit. · df9ec17d
  Bill Wendling authored Aug 29, 2010
```
- Create T2I_bin_sw_irs to be like T2I_bin_w_irs, but that it sets the S bit.

llvm-svn: 112399
```
  df9ec17d
- add a bunch more common shuffles to the instprinter. · 38ccc8b8
  Chris Lattner authored Aug 29, 2010
```
llvm-svn: 112397
```
  38ccc8b8
- Name ANDflag to ANDS, which is less stupid. · b0dc465c
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112395
```
  b0dc465c
- File missing from last commit. · ac64ed09
  Bill Wendling authored Aug 29, 2010
```
llvm-svn: 112394
```
  ac64ed09
- Create an ARMISD::AND node. This node is exactly like the "ARM::AND" node, but · 0a65116c
  Bill Wendling authored Aug 29, 2010
```
it sets the CPSR register.

llvm-svn: 112393
```
  0a65116c
Aug 28, 2010

I have manually decoded the imm field of an insertps one too many · 7a05e6dc

Chris Lattner authored Aug 28, 2010

times.  This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle 
instructions which indicates the shuffle mask, e.g.:

	insertps	$113, %xmm3, %xmm0     ## xmm0 = zero,xmm0[1,2],xmm3[1]
	unpcklps	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	pshufd	$1, %xmm1, %xmm1        ## xmm1 = xmm1[1,0,0,0]

This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic.  I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.

llvm-svn: 112387

7a05e6dc

fix the buildvector->insertp[sd] logic to not always create a redundant · 94656b1c

Chris Lattner authored Aug 28, 2010

insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379

94656b1c

fix the BuildVector -> unpcklps logic to not do pointless shuffles · bcb6090a

Chris Lattner authored Aug 28, 2010

when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378

bcb6090a