Commits · 2843625bb520feca64c25976d2170b700216b562 · Roger Ferrer / llvm-epi-0.8

Oct 23, 2012
- Fix PR14161 · 2843625b
  Michael Liao authored Oct 23, 2012
```
- Check index being extracted to be constant 0 before simplfiying.
  Otherwise, retain the original sequence.

llvm-svn: 166504
```
  2843625b
- Silence -Wsign-compare · bdcebd32
  Matt Beaumont-Gay authored Oct 23, 2012
```
llvm-svn: 166494
```
  bdcebd32
- Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 · c03c03d5
  Michael Liao authored Oct 23, 2012
```
- Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce
  DAG combine overhead.
- Extend the support to v4i16/v8i16 as well.

llvm-svn: 166487
```
  c03c03d5
- Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 · 1be96bb5
  Michael Liao authored Oct 23, 2012
```
llvm-svn: 166486
```
  1be96bb5
Oct 19, 2012

This patch is to fix radar://8426430 . It is about llvm support of __builtin_debugtrap() · cdde059a

Shuxin Yang authored Oct 19, 2012

which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().

  The X86 backend is already able to handle debugtrap(). This patch is to:
  1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
  2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
     make the __builtin_debugtrap() "available" to all existing ports without the hassle of
     changing their code.
  3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
     __builtin_trap() will be expanded into the function call of the specified trap function.
    This behavior may need change in the future.

  The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86. 

llvm-svn: 166300

cdde059a

Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 · 4b7ccfca

Michael Liao authored Oct 19, 2012

- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
  sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
  INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
  (so far 1) elements being inserted.

llvm-svn: 166288

4b7ccfca

Oct 17, 2012

Check SSSE3 instead of SSE4.1 · cef9541d

Michael Liao authored Oct 17, 2012

- All shuffle insns required, especially PSHUB, are added in SSSE3.

llvm-svn: 166086

cef9541d

Fix setjmp on models with non-Small code model nor non-Static relocation model · 6f720613

Michael Liao authored Oct 17, 2012

- MBB address is only valid as an immediate value in Small & Static
  code/relocation models. On other models, LEA is needed to load IP address of
  the restore MBB.
- A minor fix of MBB in MC lowering is added as well to enable target
  relocation flag being propagated into MC.

llvm-svn: 166084

6f720613

Oct 16, 2012

Support v8f32 to v8i8/vi816 conversion through custom lowering · 02ca3454

Michael Liao authored Oct 16, 2012

- Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to
  vector element-wise widening) to reduce DAG combiner and its overhead added
  in X86 backend.

llvm-svn: 166036

02ca3454

Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 1705a999

NAKAMURA Takumi authored Oct 16, 2012

Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

Original message since r165661:

My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code.

llvm-svn: 166017

1705a999

Add __builtin_setjmp/_longjmp supprt in X86 backend · 97bf363a

Michael Liao authored Oct 15, 2012

- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989

97bf363a

Oct 15, 2012

Resubmit the changes to llvm core to update the functions to support different... · 4bb926d9

Micah Villmow authored Oct 15, 2012

Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis.

llvm-svn: 165941

4bb926d9

Oct 13, 2012

X86: Fix accidentally swapped operands. · ecd15d7f
Benjamin Kramer authored Oct 13, 2012
```
llvm-svn: 165871
```
ecd15d7f

X86: Promote i8 cmov when both operands are coming from truncates of the same width. · d6b9362f

Benjamin Kramer authored Oct 13, 2012

X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.

I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.

llvm-svn: 165868

d6b9362f

Oct 11, 2012
- Revert 165732 for further review. · 0c61134d
  Micah Villmow authored Oct 11, 2012
```
llvm-svn: 165747
```
  0c61134d
- Add in the first iteration of support for llvm/clang/lldb to allow variable... · 08318973
  Micah Villmow authored Oct 11, 2012
```
Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly.

llvm-svn: 165726
```
  08318973
- Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." · da0730c2
  NAKAMURA Takumi authored Oct 11, 2012
```
It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode.

llvm-svn: 165692
```
  da0730c2
- Change MachineInstrBuilder::addDisp to copy over target flags by default. · 60a25a57
  Evan Cheng authored Oct 11, 2012
```
llvm-svn: 165677
```
  60a25a57
Oct 10, 2012

Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 17418964

Nadav Rotem authored Oct 10, 2012

Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

llvm-svn: 165661

17418964

Add support for FP_ROUND from v2f64 to v2f32 · e999b865

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631

e999b865

Add alternative support for FP_ROUND from v2f32 to v2f64 · effae0c8

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
  rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
  to convert it into a target-specific X86ISD::VFPEXT to work around this
  constraints. This patch also reverts a previous attempt to fix this issue by
  recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
  reduces the overhead of supporting non-power-2 vector FP extend.

llvm-svn: 165625

effae0c8

When expanding atomic load arith instructions, do not lose target flags. rdar://12453106 · 3903e1be
Evan Cheng authored Oct 09, 2012
```
llvm-svn: 165568
```
3903e1be

Oct 09, 2012

Create enums for the different attributes. · c9b22d73

Bill Wendling authored Oct 09, 2012

We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488

c9b22d73

Oct 08, 2012
- Move TargetData to DataLayout. · cdfe20b9
  Micah Villmow authored Oct 08, 2012
```
llvm-svn: 165402
```
  cdfe20b9
Oct 04, 2012
- This patch corrects commit 165126 by using an integer bit width instead of · 0d67f510
  Preston Gurd authored Oct 04, 2012
```
a pointer to a type, in order to remove the uses of getGlobalContext().

Patch by Tyler Nowicki.

llvm-svn: 165255
```
  0d67f510
- Add register encoding support in X86 backend · f54249b5
  Michael Liao authored Oct 04, 2012
```
- Add 'HwEncoding' for X86 registers and call getEncodingValue() to
  retrieve their encoding values.
- This's the first step to adopt new scheme. Furthur revising is onging.

llvm-svn: 165241
```
  f54249b5
- Use new accessor methods to query for attributes. · b0a290ef
  Bill Wendling authored Oct 04, 2012
```
llvm-svn: 165205
```
  b0a290ef
- Clean up tailing whitespaces · d60d8143
  Michael Liao authored Oct 03, 2012
```
llvm-svn: 165182
```
  d60d8143
Sep 30, 2012

Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an... · 4f1c8caf

Craig Topper authored Sep 30, 2012

Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code.

llvm-svn: 164919

4f1c8caf

Sep 26, 2012

Remove the `hasFnAttr' method from Function. · 863bab68

Bill Wendling authored Sep 26, 2012

The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725

863bab68

Sep 25, 2012
- Add missing i64 max/min/umax/umin on 32-bit target · de51caf2
  Michael Liao authored Sep 25, 2012
```
- Turn on atomic6432.ll and add specific test case as well

llvm-svn: 164616
```
  de51caf2
- Fix an illegal tailcall opt where the callee returns a double via xmm while... · 446ff28d
  Evan Cheng authored Sep 25, 2012
```
Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511

llvm-svn: 164588
```
  446ff28d
Sep 21, 2012

Add missing i8 max/min/umax/umin support · a8801860
Michael Liao authored Sep 21, 2012
```
- Fix PR5145 and turn on test 8-bit atomic ops

llvm-svn: 164358
```
a8801860

Revise td of X86 atomic instructions · c33bebff

Michael Liao authored Sep 21, 2012

- Rewirte most atomic instructions in templates for both better
  maintenance and future extensions, such as HLE in TSX.

llvm-svn: 164357

c33bebff

Sep 20, 2012

Re-work X86 code generation of atomic ops with spin-loop · 3237662b

Michael Liao authored Sep 20, 2012

- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281

3237662b

Sep 15, 2012
- X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. · ece43425
  Benjamin Kramer authored Sep 15, 2012
```
This was only an issue if sse is disabled.

llvm-svn: 163967
```
  ece43425
Sep 13, 2012

Fix comment · 8b48bf27
Michael Liao authored Sep 13, 2012
```
llvm-svn: 163835
```
8b48bf27

Add wider vector/integer support for PR12312 · 137f8aed

Michael Liao authored Sep 13, 2012

- Enhance the fix to PR12312 to support wider integer, such as 256-bit
  integer. If more than 1 fully evaluated vectors are found, POR them
  first followed by the final PTEST.

llvm-svn: 163832

137f8aed

Sep 12, 2012

Fix PR11985 · abb87d48

Michael Liao authored Sep 12, 2012

    
- BlockAddress has no support of BA + offset form and there is no way to
  propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
  simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
  support BA + offset addressing.

llvm-svn: 163743

abb87d48

Indentation fixes. No functional change. · ad495964
Craig Topper authored Sep 12, 2012
```
llvm-svn: 163682
```
ad495964