Commits · 985b1dc2d8497a17c4b32209aa458ec1fdb757da · Roger Ferrer / llvm-epi-0.8

Sep 25, 2012
- Add missing i64 max/min/umax/umin on 32-bit target · de51caf2
  Michael Liao authored Sep 25, 2012
```
- Turn on atomic6432.ll and add specific test case as well

llvm-svn: 164616
```
  de51caf2
- Fix an illegal tailcall opt where the callee returns a double via xmm while... · 446ff28d
  Evan Cheng authored Sep 25, 2012
```
Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511

llvm-svn: 164588
```
  446ff28d
Sep 20, 2012

Re-work X86 code generation of atomic ops with spin-loop · 3237662b

Michael Liao authored Sep 20, 2012

- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281

3237662b

Sep 13, 2012

Add wider vector/integer support for PR12312 · 137f8aed

Michael Liao authored Sep 13, 2012

- Enhance the fix to PR12312 to support wider integer, such as 256-bit
  integer. If more than 1 fully evaluated vectors are found, POR them
  first followed by the final PTEST.

llvm-svn: 163832

137f8aed

Sep 11, 2012
- Make a bunch of lowering helper functions static instead of member functions. No functional change. · a29ed865
  Craig Topper authored Sep 11, 2012
```
llvm-svn: 163596
```
  a29ed865
Aug 19, 2012

When unsafe math is used, we can use commutative FMAX and FMIN. In some cases · 178250ad

Nadav Rotem authored Aug 19, 2012

this allows for better code generation.

Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and
FMINC, which are commutative.

For example:

  movaps  %xmm0, %xmm1
  movsd LC(%rip), %xmm0
  minsd %xmm1, %xmm0

becomes:

  minsd LC(%rip), %xmm0

llvm-svn: 162187

178250ad

Aug 17, 2012

Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce... · 602e1abe

Craig Topper authored Aug 17, 2012

Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce to only a single call to it thus allowing it to be inlined by the compiler.

llvm-svn: 162088

602e1abe

Aug 14, 2012

fix PR11334 · 34107b91

Michael Liao authored Aug 14, 2012

- FP_EXTEND only support extending from vectors with matching elements.
  This results in the scalarization of extending to v2f64 from v2f32,
  which will be legalized to v4f32 not matching with v2f64.
- add X86-specific VFPEXT supproting extending from v4f32 to v2f64.
- add BUILD_VECTOR lowering helper to recover back the original
  extending from v4f32 to v2f64.
- test case is enhanced to include different vector width.

llvm-svn: 161894

34107b91

Aug 13, 2012

Remove the LowerMMXCONCAT_VECTORS function. It could never execute because... · a7aaa62d

Craig Topper authored Aug 13, 2012

Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result.

llvm-svn: 161742

a7aaa62d

Aug 06, 2012

Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom... · ab47fe4e

Craig Topper authored Aug 06, 2012

Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305.

llvm-svn: 161318

ab47fe4e

Aug 03, 2012

Fall back to selection DAG isel for calls to builtin functions. · 3e6fa462

Bob Wilson authored Aug 03, 2012

Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>

llvm-svn: 161232

3e6fa462

Aug 01, 2012
- Added FMA functionality to X86 target. · 3cb3b004
  Elena Demikhovsky authored Aug 01, 2012
```
llvm-svn: 161110
```
  3cb3b004
Jul 19, 2012
- Remove tabs. · 318f03f5
  Bill Wendling authored Jul 19, 2012
```
llvm-svn: 160479
```
  318f03f5
Jul 17, 2012

This is another case where instcombine demanded bits optimization created · f579beca

Evan Cheng authored Jul 17, 2012

large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.

int foo(unsigned long l) {
  return (l>> 47) == 1;
}

we produce

  %shr.mask = and i64 %l, -140737488355328
  %cmp = icmp eq i64 %shr.mask, 140737488355328
  %conv = zext i1 %cmp to i32
  ret i32 %conv

which codegens to

movq    $0xffff800000000000,%rax
andq    %rdi,%rax
movq    $0x0000800000000000,%rcx
cmpq    %rcx,%rax
sete    %al
movzbl    %al,%eax
ret

TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.

Based on a patch by Eli Friedman.

PR10328
rdar://9758774

llvm-svn: 160346

f579beca

Jul 12, 2012

Add intrinsics for Ivy Bridge's rdrand instruction. · 0ab2794e

Benjamin Kramer authored Jul 12, 2012

The rdrand/cmov sequence is the same that is emitted by both
GCC and ICC.

Fixes PR13284.

llvm-svn: 160117

0ab2794e

Jun 09, 2012

Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type.... · a54893c6

Craig Topper authored Jun 09, 2012

Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type.

llvm-svn: 158279

a54893c6

Jun 01, 2012

Implement the local-dynamic TLS model for x86 (PR3985) · 789acfb6

Hans Wennborg authored Jun 01, 2012

This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818

789acfb6

May 25, 2012

Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall · aa58397b

Justin Holewinski authored May 25, 2012

to pass around a struct instead of a large set of individual values.  This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.

NV_CONTRIB

llvm-svn: 157479

aa58397b

Apr 27, 2012

X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. · 913da4b2

Benjamin Kramer authored Apr 27, 2012

* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!

llvm-svn: 155704

913da4b2

Apr 16, 2012
- Merge vpermps/vpermd and vpermpd/vpermq SD nodes. · b86fa404
  Craig Topper authored Apr 16, 2012
```
llvm-svn: 154782
```
  b86fa404
Apr 15, 2012
- Added VPERM optimization for AVX2 shuffles · 779a72b4
  Elena Demikhovsky authored Apr 15, 2012
```
llvm-svn: 154761
```
  779a72b4
Apr 14, 2012
- Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y. · 3e8f1f6a
  Richard Smith authored Apr 13, 2012
```
llvm-svn: 154705
```
  3e8f1f6a
Apr 11, 2012

Reapply 154396 after fixing a test. · 9bc178ac

Nadav Rotem authored Apr 11, 2012

Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483

9bc178ac

Apr 10, 2012

Temporarily revert this patch to see if it brings the buildbots back. · 65ada95b
Eric Christopher authored Apr 10, 2012
```
llvm-svn: 154425
```
65ada95b

Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. · f934f917

Nadav Rotem authored Apr 10, 2012

blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396

f934f917

Fix a long standing tail call optimization bug. When a libcall is emitted · f8bad080

Evan Cheng authored Apr 10, 2012

legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370

f8bad080

Apr 09, 2012
- Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. · b801ca39
  Nadav Rotem authored Apr 09, 2012
```
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
```
  b801ca39
Apr 04, 2012

Always compute all the bits in ComputeMaskedBits. · ba0a6cab

Rafael Espindola authored Apr 04, 2012

This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011

ba0a6cab

Feb 28, 2012

Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call. · 65f9d19c
Evan Cheng authored Feb 28, 2012
```
llvm-svn: 151645
```
65f9d19c

Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack... · ee7b8993

Daniel Dunbar authored Feb 28, 2012

Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part.

llvm-svn: 151630

ee7b8993

Some ARM implementaions, e.g. A-series, does return stack prediction. That is, · 87c7b09d

Evan Cheng authored Feb 28, 2012

the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.

Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.

rdar://8979299

llvm-svn: 151623

87c7b09d

Feb 25, 2012

Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2... · bdf94879

NAKAMURA Takumi authored Feb 25, 2012

Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff.

[Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems:
Clang raised a warning, and X86 LowerOperation would assert out for
fptoui f64 to i32 because it improperly lowered to an illegal
BUILD_PAIR. Here's a patch that addresses these issues. Let me know if
any other changes are necessary. Thanks.

llvm-svn: 151432

bdf94879

Feb 24, 2012
- Add WIN_FTOL_* psudo-instructions to model the unique calling convention · 248d65e7
  Michael J. Spencer authored Feb 24, 2012
```
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!

llvm-svn: 151382
```
  248d65e7
Feb 22, 2012

Make all pointers to TargetRegisterClass const since they are all pointers to... · 760b134f

Craig Topper authored Feb 22, 2012

Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified.

llvm-svn: 151134

760b134f

Feb 19, 2012
- Make a bunch of X86ISelLowering shuffle functions static now that they are no... · 3e5c04e4
  Craig Topper authored Feb 19, 2012
```
Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel.

llvm-svn: 150908
```
  3e5c04e4
Feb 05, 2012

Add target specific node for PMULUDQ. Change patterns to use it and custom... · 1d471e31

Craig Topper authored Feb 05, 2012

Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies.

llvm-svn: 149807

1d471e31

Feb 02, 2012
- Optimization for SIGN_EXTEND operation on AVX. · fb44980b
  Elena Demikhovsky authored Feb 02, 2012
```
Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32
extensions.

llvm-svn: 149600
```
  fb44980b
Feb 01, 2012
- Optimization for "truncate" operation on AVX. · 0e48c70b
  Elena Demikhovsky authored Feb 01, 2012
```
Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles.

llvm-svn: 149485
```
  0e48c70b
Jan 30, 2012

Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic... · ca29bcfc

Craig Topper authored Jan 30, 2012

Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes.

llvm-svn: 149216

ca29bcfc

Jan 23, 2012
- Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching. · 0b7ad76b
  Craig Topper authored Jan 22, 2012
```
llvm-svn: 148670
```
  0b7ad76b