Commits · 083189730e40a2da15e6b66e5b220b707febdb9e · Roger Ferrer / llvm-epi-0.8

Oct 11, 2012
- Add in the first iteration of support for llvm/clang/lldb to allow variable... · 08318973
  Micah Villmow authored Oct 11, 2012
  
  Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly. llvm-svn: 165726
  08318973
- Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." · da0730c2
  NAKAMURA Takumi authored Oct 11, 2012
  
  It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692
  da0730c2
- Change MachineInstrBuilder::addDisp to copy over target flags by default. · 60a25a57
  Evan Cheng authored Oct 11, 2012
  
  llvm-svn: 165677
  60a25a57
Oct 10, 2012

Patch by Shuxin Yang <shuxin.llvm@gmail.com>. · 17418964

Nadav Rotem authored Oct 10, 2012

Original message:

The attached is the fix to radar://11663049. The optimization can be outlined by following rules:

   (select (x != c), e, c) -> select (x != c), e, x),
   (select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.

 The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.

  While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.

  The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".

llvm-svn: 165661

17418964

Add support for FP_ROUND from v2f64 to v2f32 · e999b865

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in
  ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
  v2f32) is scalarized. Add a customized v2f32 widening to convert it
  into a target-specific X86ISD::VFPROUND to work around this
  constraints.

llvm-svn: 165631

e999b865

Add alternative support for FP_ROUND from v2f32 to v2f64 · effae0c8

Michael Liao authored Oct 10, 2012

- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
  rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
  to convert it into a target-specific X86ISD::VFPEXT to work around this
  constraints. This patch also reverts a previous attempt to fix this issue by
  recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
  reduces the overhead of supporting non-power-2 vector FP extend.

llvm-svn: 165625

effae0c8

When expanding atomic load arith instructions, do not lose target flags. rdar://12453106 · 3903e1be
Evan Cheng authored Oct 09, 2012
```
llvm-svn: 165568
```
3903e1be

Oct 09, 2012

Create enums for the different attributes. · c9b22d73

Bill Wendling authored Oct 09, 2012

We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.

llvm-svn: 165488

c9b22d73

Oct 08, 2012
- Move TargetData to DataLayout. · cdfe20b9
  Micah Villmow authored Oct 08, 2012
  
  llvm-svn: 165402
  cdfe20b9
Oct 04, 2012
- This patch corrects commit 165126 by using an integer bit width instead of · 0d67f510
  Preston Gurd authored Oct 04, 2012
  
  a pointer to a type, in order to remove the uses of getGlobalContext(). Patch by Tyler Nowicki. llvm-svn: 165255
  0d67f510
- Add register encoding support in X86 backend · f54249b5
  Michael Liao authored Oct 04, 2012
  
  - Add 'HwEncoding' for X86 registers and call getEncodingValue() to retrieve their encoding values. - This's the first step to adopt new scheme. Furthur revising is onging. llvm-svn: 165241
  f54249b5
- Use new accessor methods to query for attributes. · b0a290ef
  Bill Wendling authored Oct 04, 2012
  
  llvm-svn: 165205
  b0a290ef
- Clean up tailing whitespaces · d60d8143
  Michael Liao authored Oct 03, 2012
  
  llvm-svn: 165182
  d60d8143
Sep 30, 2012

Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an... · 4f1c8caf

Craig Topper authored Sep 30, 2012

Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code.

llvm-svn: 164919

4f1c8caf

Sep 26, 2012

Remove the `hasFnAttr' method from Function. · 863bab68

Bill Wendling authored Sep 26, 2012

The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.

llvm-svn: 164725

863bab68

Sep 25, 2012
- Add missing i64 max/min/umax/umin on 32-bit target · de51caf2
  Michael Liao authored Sep 25, 2012
  
  - Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616
  de51caf2
- Fix an illegal tailcall opt where the callee returns a double via xmm while... · 446ff28d
  Evan Cheng authored Sep 25, 2012
  
  Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588
  446ff28d
Sep 21, 2012

Add missing i8 max/min/umax/umin support · a8801860
Michael Liao authored Sep 21, 2012
```
- Fix PR5145 and turn on test 8-bit atomic ops

llvm-svn: 164358
```
a8801860

Revise td of X86 atomic instructions · c33bebff

Michael Liao authored Sep 21, 2012

- Rewirte most atomic instructions in templates for both better
  maintenance and future extensions, such as HLE in TSX.

llvm-svn: 164357

c33bebff

Sep 20, 2012

Re-work X86 code generation of atomic ops with spin-loop · 3237662b

Michael Liao authored Sep 20, 2012

- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281

3237662b

Sep 15, 2012
- X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. · ece43425
  Benjamin Kramer authored Sep 15, 2012
  
  This was only an issue if sse is disabled. llvm-svn: 163967
  ece43425
Sep 13, 2012

Fix comment · 8b48bf27
Michael Liao authored Sep 13, 2012
```
llvm-svn: 163835
```
8b48bf27

Add wider vector/integer support for PR12312 · 137f8aed

Michael Liao authored Sep 13, 2012

- Enhance the fix to PR12312 to support wider integer, such as 256-bit
  integer. If more than 1 fully evaluated vectors are found, POR them
  first followed by the final PTEST.

llvm-svn: 163832

137f8aed

Sep 12, 2012

Fix PR11985 · abb87d48

Michael Liao authored Sep 12, 2012

    
- BlockAddress has no support of BA + offset form and there is no way to
  propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
  simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
  support BA + offset addressing.

llvm-svn: 163743

abb87d48

Indentation fixes. No functional change. · ad495964
Craig Topper authored Sep 12, 2012
```
llvm-svn: 163682
```
ad495964

Sep 11, 2012
- Make a bunch of lowering helper functions static instead of member functions. No functional change. · a29ed865
  Craig Topper authored Sep 11, 2012
  
  llvm-svn: 163596
  a29ed865
Sep 10, 2012

Remove redundant semicolons which are null statements. · ca1e27be
Dmitri Gribenko authored Sep 10, 2012
```
llvm-svn: 163547
```
ca1e27be
Enhance PR11334 fix to support extload from v2f32/v4f32 · 400f7ef8
Michael Liao authored Sep 10, 2012
```
    
- Fix an remaining issue of PR11674 as well

llvm-svn: 163528
```
400f7ef8

Add boolean simplification support from CMOV · c3d5b21c

Michael Liao authored Sep 10, 2012

- If a boolean value is generated from CMOV and tested as boolean value,
  simplify the use of test result by referencing the original condition.
  RDRAND intrinisc is one of such cases.

llvm-svn: 163516

c3d5b21c

The VPSHUFB 256-bit instruction may be generated when one of input vector is... · 264fb021

Elena Demikhovsky authored Sep 10, 2012

The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer.
I've added the "zeroinitializer" case in this patch.

llvm-svn: 163506

264fb021

Sep 08, 2012
- Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled. · 4ed79bd7
  Craig Topper authored Sep 08, 2012
  
  llvm-svn: 163473
  4ed79bd7
- Use 256-bit alignment for constant pool value for 256-bit vector FNEG lowering. · 0955a9f4
  Craig Topper authored Sep 08, 2012
  
  llvm-svn: 163463
  0955a9f4
- Add support for lowering FABS of vector types. · 98f2e861
  Craig Topper authored Sep 08, 2012
  
  llvm-svn: 163461
  98f2e861
- Set operation action for FFLOOR to Expand for all vector types for X86. Set... · 3e41a5bb
  Craig Topper authored Sep 08, 2012
  
  Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458
  3e41a5bb
Sep 06, 2012
- AVX2 optimization. · 42777877
  Elena Demikhovsky authored Sep 06, 2012
  
  Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312
  42777877
- Remove duplicated helper function · 2d95a2b5
  Michael Liao authored Sep 06, 2012
  
  llvm-svn: 163295
  2d95a2b5
- Use iPTR instead of i32 for extract_subvector/insert_subvector index in... · f3e4aa8c
  Craig Topper authored Sep 06, 2012
  
  Use iPTR instead of i32 for extract_subvector/insert_subvector index in lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder. llvm-svn: 163293
  f3e4aa8c
- Stop casting away const qualifier needlessly. · ad06cee2
  Roman Divacky authored Sep 05, 2012
  
  llvm-svn: 163258
  ad06cee2
Sep 04, 2012

Generic Bypass Slow Div · cdf540d5

Preston Gurd authored Sep 04, 2012

- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150

cdf540d5

This patch optimizes shuffle instruction - generates 2 instructions instead of 4. · cbe99bbb

Elena Demikhovsky authored Sep 04, 2012

Since this specific shuffle is widely used in many workloads we have ~10% performance on them.

shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>

vmovaps (%rdx), %ymm0
vshufps $8, %ymm0, %ymm0, %ymm0
vmovaps (%rcx), %ymm1
vshufps $8, %ymm0, %ymm1, %ymm1
vunpcklps       %ymm0, %ymm1, %ymm0

vmovaps (%rcx), %ymm0
vmovsldup       (%rdx), %ymm1
vblendps        $85, %ymm0, %ymm1, %ymm0

llvm-svn: 163134

cbe99bbb