Commits · f036ca466e72be59530679068ac8f817c3b1dbe5 · Roger Ferrer / llvm-epi-0.8

Nov 06, 2012
- CostModel: add another known vector trunc optimization. · f036ca46
  Nadav Rotem authored Nov 06, 2012
```
llvm-svn: 167488
```
  f036ca46
- Cost Model: add tables for some avx type-conversion hacks. · 0914f0b2
  Nadav Rotem authored Nov 06, 2012
```
llvm-svn: 167480
```
  0914f0b2
- Refactor the getTypeLegalizationCost interface. No functionality change. · 48c5b8e6
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167422
```
  48c5b8e6
- CostModel: Add tables for the common x86 compares. · c378a806
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167421
```
  c378a806
Nov 05, 2012
- Suppress signed/unsigned comparison warning. · 18d27620
  Richard Smith authored Nov 05, 2012
```
llvm-svn: 167410
```
  18d27620
- Cost Model: Normalize the insert/extract index when splitting types · 856ffa66
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167402
```
  856ffa66
- Implement the cost of abnormal x86 instruction lowering as a table. · 7411623f
  Nadav Rotem authored Nov 05, 2012
```
llvm-svn: 167395
```
  7411623f
Nov 03, 2012
- X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. · c2345cbe
  Nadav Rotem authored Nov 03, 2012
```
llvm-svn: 167347
```
  c2345cbe
Nov 01, 2012

Revert the majority of the next patch in the address space series: · 5da3f051

Chandler Carruth authored Nov 01, 2012

r165941: Resubmit the changes to llvm core to update the functions to
         support different pointer sizes on a per address space basis.

Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.

However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.

In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.

In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.

This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.

llvm-svn: 167222

5da3f051

(For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization. · 01efdd6c

Shuxin Yang authored Oct 31, 2012

  The adc/sbb optimization is to able to convert following expression
into a single adc/sbb instruction:
  (ult) ... = x + 1 // where the ult is unsigned-less-than comparison
  (ult) ... = x - 1

  This change is to flip the "x >u y" (i.e. ugt comparison) in order 
to expose the adc/sbb opportunity.

llvm-svn: 167180

01efdd6c

Oct 31, 2012
- Clean up redundant SP register maintained in X86 TLI · e2d7e4e8
  Michael Liao authored Oct 31, 2012
```
llvm-svn: 167104
```
  e2d7e4e8
Oct 30, 2012
- X86 MMX: optimize transfer from mmx to i32 · acb8becc
  Manman Ren authored Oct 30, 2012
```
We used to generate a store (movq) + a load.
Now we use movd.

rdar://9946746

llvm-svn: 167056
```
  acb8becc
- Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance · a3d8e997
  Jakub Staszak authored Oct 30, 2012
```
to test it with chapni's fix (-mattr=+avx).

llvm-svn: 166985
```
  a3d8e997
- Revert r166971. It causes buildbot failure. To be investigated. · d74cb61d
  Jakub Staszak authored Oct 29, 2012
```
llvm-svn: 166979
```
  d74cb61d
Oct 29, 2012

Remove unused variable. · c3a92131
Jakub Staszak authored Oct 29, 2012
```
llvm-svn: 166973
```
c3a92131
Simplify code. No functionality change. · 9c361bdf
Jakub Staszak authored Oct 29, 2012
```
llvm-svn: 166972
```
9c361bdf

Allow to fold vector load if there is more than one bitcast, so in the case: · c8f4825b

Jakub Staszak authored Oct 29, 2012

%0 = load <8 x i16>* %dest
%1 = shufflevector <8 x i16> %0, <8 x i16> %in,
      <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14>
store <8 x i16> %1, <8 x i16>* %dest

We get:
  vmovlpd (%eax), %xmm0, %xmm0

instead of:
  vmovaps (%eax), %xmm1
  vmovsd  %xmm1, %xmm0, %xmm0

No extra test-case is added. I just fixed the existing one
(also it uses FileCheck now).

llvm-svn: 166971

c8f4825b

Silence a GCC warning about comparing signed and unsigned types. · ac8448e0
Duncan Sands authored Oct 29, 2012
```
llvm-svn: 166922
```
ac8448e0

Oct 25, 2012
- Clean up where SlotSize should be used instead of pointer size. · 6d810bd9
  Michael Liao authored Oct 25, 2012
```
llvm-svn: 166664
```
  6d810bd9
Oct 24, 2012

Add custom conversion from v2u32 to v2f32 in 32-bit mode · c5af149e

Michael Liao authored Oct 24, 2012

- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
  v2f32 is added to improve the efficiency of the code generated.

llvm-svn: 166545

c5af149e

Oct 23, 2012
- Fix PR14161 · 2843625b
  Michael Liao authored Oct 23, 2012
```
- Check index being extracted to be constant 0 before simplfiying.
  Otherwise, retain the original sequence.

llvm-svn: 166504
```
  2843625b
- Silence -Wsign-compare · bdcebd32
  Matt Beaumont-Gay authored Oct 23, 2012
```
llvm-svn: 166494
```
  bdcebd32
- Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 · c03c03d5
  Michael Liao authored Oct 23, 2012
```
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
  DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.

llvm-svn: 166487
```
  c03c03d5
- Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 · 1be96bb5
  Michael Liao authored Oct 23, 2012
```
llvm-svn: 166486
```
  1be96bb5
Oct 19, 2012

This patch is to fix radar://8426430 . It is about llvm support of __builtin_debugtrap() · cdde059a

Shuxin Yang authored Oct 19, 2012

which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().

  The X86 backend is already able to handle debugtrap(). This patch is to:
  1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
  2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
     make the __builtin_debugtrap() "available" to all existing ports without the hassle of
     changing their code.
  3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
     __builtin_trap() will be expanded into the function call of the specified trap function.
    This behavior may need change in the future.

  The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86. 

llvm-svn: 166300

cdde059a

Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 · 4b7ccfca

Michael Liao authored Oct 19, 2012

- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
  sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
  INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
  (so far 1) elements being inserted.

llvm-svn: 166288

4b7ccfca

Oct 17, 2012

Check SSSE3 instead of SSE4.1 · cef9541d

Michael Liao authored Oct 17, 2012

- All shuffle insns required, especially PSHUB, are added in SSSE3.

llvm-svn: 166086

cef9541d

Fix setjmp on models with non-Small code model nor non-Static relocation model · 6f720613

Michael Liao authored Oct 17, 2012

- MBB address is only valid as an immediate value in Small & Static
  code/relocation models. On other models, LEA is needed to load IP address of
  the restore MBB.
- A minor fix of MBB in MC lowering is added as well to enable target
  relocation flag being propagated into MC.

llvm-svn: 166084

6f720613

Oct 16, 2012

Support v8f32 to v8i8/vi816 conversion through custom lowering · 02ca3454

Michael Liao authored Oct 16, 2012

- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
  vector element-wise widening) to reduce DAG combiner and its overhead added
  in X86 backend.

llvm-svn: 166036

02ca3454

Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 1705a999

NAKAMURA Takumi authored Oct 16, 2012

Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

Original message since r165661:

My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code.

llvm-svn: 166017

1705a999

Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a

Michael Liao authored Oct 15, 2012

- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989

97bf363a

Oct 15, 2012

Resubmit the changes to llvm core to update the functions to support different... · 4bb926d9

Micah Villmow authored Oct 15, 2012

Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis.

llvm-svn: 165941

4bb926d9

Oct 13, 2012

X86: Fix accidentally swapped operands. · ecd15d7f
Benjamin Kramer authored Oct 13, 2012
```
llvm-svn: 165871
```
ecd15d7f

X86: Promote i8 cmov when both operands are coming from truncates of the same width. · d6b9362f

Benjamin Kramer authored Oct 13, 2012

X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.

I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.

llvm-svn: 165868

d6b9362f

Oct 11, 2012
- Revert 165732 for further review. · 0c61134d
  Micah Villmow authored Oct 11, 2012
```
llvm-svn: 165747
```
  0c61134d
- Add in the first iteration of support for llvm/clang/lldb to allow variable... · 08318973
  Micah Villmow authored Oct 11, 2012
```
Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly.

llvm-svn: 165726
```
  08318973
- Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." · da0730c2
  NAKAMURA Takumi authored Oct 11, 2012
```
It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode.

llvm-svn: 165692
```
  da0730c2
- Change MachineInstrBuilder::addDisp to copy over target flags by default. · 60a25a57
  Evan Cheng authored Oct 11, 2012
```
llvm-svn: 165677
```
  60a25a57
Oct 10, 2012

Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 17418964

Nadav Rotem authored Oct 10, 2012

Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

llvm-svn: 165661

17418964

Add support for FP_ROUND from v2f64 to v2f32 · e999b865

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631

e999b865