- Oct 11, 2012
-
-
Micah Villmow authored
Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly. llvm-svn: 165726
-
NAKAMURA Takumi authored
It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692
-
Evan Cheng authored
llvm-svn: 165677
-
- Oct 10, 2012
-
-
Nadav Rotem authored
Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". llvm-svn: 165661
-
Michael Liao authored
- Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631
-
Michael Liao authored
- Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. llvm-svn: 165625
-
-
- Oct 09, 2012
-
-
Bill Wendling authored
We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488
-
- Oct 08, 2012
-
-
Micah Villmow authored
llvm-svn: 165402
-
- Oct 04, 2012
-
-
Preston Gurd authored
a pointer to a type, in order to remove the uses of getGlobalContext(). Patch by Tyler Nowicki. llvm-svn: 165255
-
Michael Liao authored
- Add 'HwEncoding' for X86 registers and call getEncodingValue() to retrieve their encoding values. - This's the first step to adopt new scheme. Furthur revising is onging. llvm-svn: 165241
-
Bill Wendling authored
llvm-svn: 165205
-
Michael Liao authored
llvm-svn: 165182
-
- Sep 30, 2012
-
-
Craig Topper authored
Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code. llvm-svn: 164919
-
- Sep 26, 2012
-
-
Bill Wendling authored
The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725
-
- Sep 25, 2012
-
-
Michael Liao authored
- Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616
-
Evan Cheng authored
Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588
-
- Sep 21, 2012
-
-
Michael Liao authored
- Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358
-
Michael Liao authored
- Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357
-
- Sep 20, 2012
-
-
Michael Liao authored
- Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281
-
- Sep 15, 2012
-
-
Benjamin Kramer authored
This was only an issue if sse is disabled. llvm-svn: 163967
-
- Sep 13, 2012
-
-
Michael Liao authored
llvm-svn: 163835
-
Michael Liao authored
- Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832
-
- Sep 12, 2012
-
-
Michael Liao authored
- BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743
-
Craig Topper authored
llvm-svn: 163682
-
- Sep 11, 2012
-
-
Craig Topper authored
llvm-svn: 163596
-
- Sep 10, 2012
-
-
Dmitri Gribenko authored
llvm-svn: 163547
-
Michael Liao authored
- Fix an remaining issue of PR11674 as well llvm-svn: 163528
-
Michael Liao authored
- If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516
-
Elena Demikhovsky authored
The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506
-
- Sep 08, 2012
-
-
Craig Topper authored
llvm-svn: 163473
-
Craig Topper authored
llvm-svn: 163463
-
Craig Topper authored
llvm-svn: 163461
-
Craig Topper authored
Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458
-
- Sep 06, 2012
-
-
Elena Demikhovsky authored
Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312
-
Michael Liao authored
llvm-svn: 163295
-
Craig Topper authored
Use iPTR instead of i32 for extract_subvector/insert_subvector index in lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder. llvm-svn: 163293
-
Roman Divacky authored
llvm-svn: 163258
-
- Sep 04, 2012
-
-
Preston Gurd authored
- CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150
-
Elena Demikhovsky authored
Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134
-