Commits · fc33e1d99b40936321cbfb9a8c8e4841da9351aa · Roger Ferrer / llvm-epi-0.8

May 17, 2013

X86: Make shuffle -> shift conversion more aggressive about undefs. · fc33e1d9

Benjamin Kramer authored May 17, 2013

Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.

We still prefer palignr over psrldq because it has higher throughput on
sandybridge.

llvm-svn: 182102

fc33e1d9

May 05, 2013

Remove a recently redundant transform from X86ISelLowering. · 66fb70de

David Majnemer authored May 05, 2013

X86ISelLowering has support to treat:
(icmp ne (and (xor %flags, -1), (shl 1, flag)), 0)

as if it were actually:
(icmp eq (and %flags, (shl 1, flag)), 0)

However, r179386 has code at the InstCombine level to handle this.

llvm-svn: 181145

66fb70de

Fix an odd comment. · 42932bdc
Nadav Rotem authored May 04, 2013
```
llvm-svn: 181136
```
42932bdc

May 02, 2013
- 80-col fixup. · 06badde1
  Michael Liao authored May 02, 2013
```
llvm-svn: 180915
```
  06badde1
- Avoid duplicating logic on frame register selecting when lowering eh_return · afafa98f
  Michael Liao authored May 02, 2013
```
No functionality change

llvm-svn: 180914
```
  afafa98f
- Avoid duplicating logic on frame register selecting when lowering frameaddr · 31d39a4a
  Michael Liao authored May 02, 2013
```
No functionality change

llvm-svn: 180912
```
  31d39a4a
Apr 20, 2013
- Remove unused ShouldFoldAtomicFences flag. · 16aba170
  Tim Northover authored Apr 20, 2013
```
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.

llvm-svn: 179940
```
  16aba170
- Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE. · a2b53390
  Tim Northover authored Apr 20, 2013
```
llvm-svn: 179939
```
  a2b53390
- ArrayRefize getMachineNode(). No functionality change. · b53d8963
  Michael Liao authored Apr 19, 2013
```
llvm-svn: 179901
```
  b53d8963
Apr 19, 2013
- Use 'array_lengthof' as possible to avoid magic numbers · e28fab22
  Michael Liao authored Apr 19, 2013
```
llvm-svn: 179833
```
  e28fab22
Apr 18, 2013
- X86: Add an SSE2 lowering for 64 bit compares when pcmpgtq (SSE4.2) isn't available. · c5578288
  Benjamin Kramer authored Apr 18, 2013
```
This pattern started popping up in vectorized min/max reductions.

llvm-svn: 179797
```
  c5578288
Apr 11, 2013

Optimize vector select from all 0s or all 1s · 55658d42

Michael Liao authored Apr 11, 2013

As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane,
vector select could be simplified to AND/OR or removed if one or both values
being selected is all 0s or all 1s.

llvm-svn: 179267

55658d42

Enhance bool simplifcation in X86 to handle more cases · f7bf8705

Michael Liao authored Apr 11, 2013

This patch is revised based on patch from Victor Umansky
<victor.umansky@intel.com>. More cases are handled in X86's bool
simplification, i.e.
- SETCC_CARRY
- value is truncated to i1 with AND

As a by-product, PR5443 is also fixed.

llvm-svn: 179265

f7bf8705

Apr 10, 2013
- __sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in · ac0469c5
  Evan Cheng authored Apr 10, 2013
```
xmm0 / xmm1.

rdar://13599493

llvm-svn: 179141
```
  ac0469c5
Apr 05, 2013

Use the target options specified on a function to reset the back-end. · eb108bad

Bill Wendling authored Apr 05, 2013

During LTO, the target options on functions within the same Module may
change. This would necessitate resetting some of the back-end. Do this for X86,
because it's a Friday afternoon.

llvm-svn: 178917

eb108bad

Mar 31, 2013
- X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available. · b60633fb
  Benjamin Kramer authored Mar 31, 2013
```
A vector sext + sitofp is a lot cheaper than 8 scalar conversions.

llvm-svn: 178448
```
  b60633fb
Mar 29, 2013

Remove the old CodePlacementOpt pass. · 70671b99

Benjamin Kramer authored Mar 29, 2013

It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1.

llvm-svn: 178349

70671b99

Add support of RDSEED defined in AVX2 extension · a486a11d
Michael Liao authored Mar 28, 2013
```
llvm-svn: 178314
```
a486a11d

Enhance boolean simplification to handle 16-/64-bit RDRAND · 5fff5c7b

Michael Liao authored Mar 28, 2013

- RDRAND always clears the destination value when a random value is not
  available (i.e. CF == 0). This value is truncated or zero-extended as
  the false boolean value to be returned. Boolean simplification needs
  to skip this 'zext' or 'trunc' node.

llvm-svn: 178312

5fff5c7b

Skip moving call address loading into callseq when targets prefer register indirect call. · 96b42608

Michael Liao authored Mar 28, 2013

To enable a load of a call address to be folded with that call, this
load is moved from outside of callseq into callseq. Such a moving
adds a non-glued node (that load) into a glued sequence. This non-glue
load is only removed when DAG selection folds them into a memory form
call instruction. When such instruction selection is disabled, it breaks
DAG schedule.

To prevent that, such moving is disabled when target favors register
indirect call.

Previous workaround disabling CALL32m/CALL64m insn selection is removed.

llvm-svn: 178308

96b42608

Mar 28, 2013
- Make Win32 put the SRet address into EAX, fixes PR15556 · a2fd5fdd
  Timur Iskhodzhanov authored Mar 28, 2013
```
llvm-svn: 178291
```
  a2fd5fdd
Mar 27, 2013

· 663e6f95

Preston Gurd authored Mar 27, 2013

For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171

663e6f95

Fix typo (common to both X86 and PPC) · 1996f3d8
Hal Finkel authored Mar 27, 2013
```
Thanks to Bill Schmidt for pointing this out during code review!

llvm-svn: 178170
```
1996f3d8

Mar 26, 2013

Add XTEST codegen support · 03f9ad0e
Michael Liao authored Mar 26, 2013
```
llvm-svn: 178083
```
03f9ad0e

Revise alignment checking/calculation on 256-bit unaligned memory access · 5fbcd817

Michael Liao authored Mar 25, 2013

- It's still considered aligned when the specified alignment is larger
  than the natural alignment;
- The new alignment for the high 128-bit vector should be min(16,
  alignment) as the pointer is advanced by 16, a power-of-2 offset.

llvm-svn: 177947

5fbcd817

Mar 20, 2013

Fix PR15296 · 0f4ea0c4

Michael Liao authored Mar 20, 2013

- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
  to support extended 256-bit integer in AVX but not AVX2.

llvm-svn: 177478

0f4ea0c4

Mark all variable shifts needing customizing · 5a4e81d2

Michael Liao authored Mar 20, 2013

- Prepare moving logic from DAG combining into DAG lowering. There's no
  functionality change.

llvm-svn: 177477

5a4e81d2

Move scalar immediate shift lowering into a dedicated func · 48e8a372
Michael Liao authored Mar 20, 2013
```
- no functionality change

llvm-svn: 177476
```
48e8a372

Mar 19, 2013
- Optimize sext <4 x i8> and <4 x i16> to <4 x i64>. · 0f1bc60d
  Nadav Rotem authored Mar 19, 2013
```
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>

llvm-svn: 177421
```
  0f1bc60d
Mar 18, 2013

TLS support for MinGW targets. · 3e7005f1

Anton Korobeynikov authored Mar 18, 2013

MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available.

Patch by David Nadlinger!

llvm-svn: 177257

3e7005f1

Mar 14, 2013
- Fix PR15309 · 20d28704
  Michael Liao authored Mar 14, 2013
```
- Fix the typo on type checking

llvm-svn: 177010
```
  20d28704
Mar 08, 2013

DAGCombiner: Use correct value type for checking legality of BR_CC v3 · b1588fc0

Tom Stellard authored Mar 08, 2013

LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.

v2:
  - Expand more BR_CC value types for NVPTX

v3:
  - Expand correct BR_CC value types for Hexagon, Mips, and XCore.

llvm-svn: 176694

b1588fc0

Mar 07, 2013

X86: Fold EXTRACT_SUBVECTORs of a BUILD_VECTOR into a smaller BUILD_VECTOR. · 2c3d0df8

Benjamin Kramer authored Mar 07, 2013

That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.

Fixes PR15462.

llvm-svn: 176634

2c3d0df8

Fix two remaining issue after fixing PR15355 when CMOV is not available · d5cac37d

Michael Liao authored Mar 07, 2013

- Phi nodes should be replaced/updated after lowering CMOV into branch
  because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
  we will reuse the EFLAGS generated before the 1st lowered CMOV, which
  won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.

llvm-svn: 176598

d5cac37d

Mar 06, 2013

Fix PR15355 · da22b30b

Michael Liao authored Mar 06, 2013

- Clear 'mayStore' flag when loading from the atomic variable before the
  spin loop
- Clear kill flag from one use to multiple use in registers forming the
  address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
  nor landing pad.) by copying it into virtual register

(patch by Cameron Zwarich)

llvm-svn: 176538

da22b30b

Mar 04, 2013

Bypass Slow Divides · 485296d1

Preston Gurd authored Mar 04, 2013

* Only apply divide bypass optimization when not optimizing for size. 
* Fixed bug caused by constant for 0 value of type Int32,
  used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
  64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.

Patch by Tyler Nowicki!

llvm-svn: 176442

485296d1

Mar 01, 2013

Fix PR10475 · 6af16fc3

Michael Liao authored Mar 01, 2013

- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
  but TLI.getShiftAmountTy() so far only return scalar type. As a
  result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
  TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
  return target-specificed scalar type or the same vector type as the
  1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
  type.

llvm-svn: 176364

6af16fc3

Feb 26, 2013

Refine fix to PR10499, no functionality change · 609a5272
Michael Liao authored Feb 25, 2013
```
- Put expensive checking after simple one

llvm-svn: 176060
```
609a5272

Fix PR10499 · ab976680

Michael Liao authored Feb 25, 2013

- Check whether SSE is available before lowering all 1s vector building with
  PCMPEQD, which is only available from SSE2

llvm-svn: 176058

ab976680

Feb 24, 2013
- Revert r169638 because it broke Mesa llvmpipe tests. · b532fca9
  Nadav Rotem authored Feb 24, 2013
```
Fix PR15239.

llvm-svn: 175985
```
  b532fca9