- Jul 30, 2010
-
-
Jim Grosbach authored
have 4 bits per register in the operand encoding), but have undefined behavior when the operand value is 13 or 15 (SP and PC, respectively). The trivial coalescer in linear scan sometimes will merge a copy from SP into a subsequent instruction which uses the copy, and if that instruction cannot legally reference SP, we get bad code such as: mls r0,r9,r0,sp instead of: mov r2, sp mls r0, r9, r0, r2 This patch adds a new register class for use by Thumb2 that excludes the problematic registers (SP and PC) and is used instead of GPR for those operands which cannot legally reference PC or SP. The trivial coalescer explicitly requires that the register class of the destination for the COPY instruction contain the source register for the COPY to be considered for coalescing. This prevents errant instructions like that above. PR7499 llvm-svn: 109842
-
Nate Begeman authored
llvm-svn: 109813
-
- Jul 29, 2010
-
-
Bob Wilson authored
transformations. llvm-svn: 109800
-
Dale Johannesen authored
integers with mov + vdup. 8003375. This is currently disabled by default because LICM will not hoist a VDUP, so it pessimizes the code if the construct occurs inside a loop (8248029). llvm-svn: 109799
-
Bob Wilson authored
PR7745. llvm-svn: 109788
-
Nate Begeman authored
Add intrinsics __builtin_arm_qadd & __builtin_arm_qsub to allow access to the QADD & QSUB instructions. Behave identically to __qadd & __qsub RealView instruction intrinsics. llvm-svn: 109770
-
Jakob Stoklund Olesen authored
We do sometimes load from a too small stack slot when dealing with x86 arguments (varargs and smaller-than-32-bit args). It looks like we know what we are doing in those cases, so I am going to remove the assert instead of artifically enlarging stack slot sizes. The assert in storeRegToStackSlot stays in. We don't want to write beyond the bounds of a stack slot. llvm-svn: 109764
-
Jim Grosbach authored
ARM mode version of r109693. Remove incorrect substitution pattern for UXTB16. It wrongly assumed the input shift was actually a rotate. rdar://8240138 llvm-svn: 109696
-
Jim Grosbach authored
Remove incorrect substitution pattern for UXTB16. It wrongly assumed the input shift was actually a rotate. rdar://8240138 llvm-svn: 109693
-
Jim Grosbach authored
llvm-svn: 109691
-
- Jul 28, 2010
-
-
Jakob Stoklund Olesen authored
The size of this object isn't used for anything - technically it is of variable size. This avoids a false positive from the assert in X86InstrInfo::loadRegFromStackSlot, and fixes PR7735. llvm-svn: 109652
-
Dan Gohman authored
of a std::vector. llvm-svn: 109597
-
Dan Gohman authored
to avoid undefined behavior on overflow, noticed by John Regehr. llvm-svn: 109594
-
Nate Begeman authored
This is about 4x faster and smaller than the existing scalarization. llvm-svn: 109566
-
Nate Begeman authored
~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches. For: define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp { entry: %shl = shl <4 x i32> %r, %a ; <<4 x i32>> [#uses=1] %tmp2 = bitcast <4 x i32> %shl to <2 x i64> ; <<2 x i64>> [#uses=1] ret <2 x i64> %tmp2 } We get: _shl: ## @shl pslld $23, %xmm1 paddd LCPI0_0, %xmm1 cvttps2dq %xmm1, %xmm1 pmulld %xmm1, %xmm0 ret Instead of: _shl: ## @shl pshufd $3, %xmm0, %xmm2 movd %xmm2, %eax pshufd $3, %xmm1, %xmm2 movd %xmm2, %ecx shll %cl, %eax movd %eax, %xmm2 pshufd $1, %xmm0, %xmm3 movd %xmm3, %eax pshufd $1, %xmm1, %xmm3 movd %xmm3, %ecx shll %cl, %eax movd %eax, %xmm3 punpckldq %xmm2, %xmm3 movd %xmm0, %eax movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm2 movhlps %xmm0, %xmm0 movd %xmm0, %eax movhlps %xmm1, %xmm1 movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm0 punpckldq %xmm0, %xmm2 movdqa %xmm2, %xmm0 punpckldq %xmm3, %xmm0 ret llvm-svn: 109549
-
- Jul 27, 2010
-
-
Michael J. Spencer authored
llvm-svn: 109494
-
Jakob Stoklund Olesen authored
subregister operands like this: %reg1040:sub_32bit<def> = MOV32rm <fi#-2>, 1, %reg0, 0, %reg0, %reg1040<imp-def>; mem:LD4[FixedStack-2](align=8) Make them return false when subreg operands are present. VirtRegRewriter is making bad assumptions otherwise. This fixes PR7713. llvm-svn: 109489
-
Jakob Stoklund Olesen authored
with a too-big register class. llvm-svn: 109488
-
Eli Friedman authored
llvm-svn: 109458
-
Anton Korobeynikov authored
llvm-svn: 109456
-
- Jul 26, 2010
-
-
Evan Cheng authored
llvm-svn: 109450
-
Anton Korobeynikov authored
llvm-svn: 109448
-
Bruno Cardoso Lopes authored
we are using AVX and no AVX version of the desired intruction is present, this is better for incremental dev (without fallbacks it's easier to spot what's missing). Not sure this is the best hack thought (we can also disable all HasSSE* predicates by dinamically marking them 'false' if AVX is present) llvm-svn: 109434
-
Anton Korobeynikov authored
This assumption is not satisfied due to global mergeing. Workaround the issue by temporary disablinge mergeing of const globals. Also, ignore LLVM "special" globals. This fixes PR7716 llvm-svn: 109423
-
Evan Cheng authored
llvm-svn: 109421
-
- Jul 25, 2010
-
-
Douglas Gregor authored
llvm-svn: 109373
-
Douglas Gregor authored
llvm-svn: 109372
-
- Jul 24, 2010
-
-
Anton Korobeynikov authored
llvm-svn: 109359
-
Evan Cheng authored
appropriate for targets without detailed instruction iterineries. The scheduler schedules for increased instruction level parallelism in low register pressure situation; it schedules to reduce register pressure when the register pressure becomes high. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%. llvm-svn: 109300
-
Bruno Cardoso Lopes authored
llvm-svn: 109295
-
Jim Grosbach authored
function live in set. This will give us tGPR for Thumb1 and GPR otherwise, so the copy will be spillable. rdar://8224931 llvm-svn: 109293
-
Dale Johannesen authored
comments explaining why it was wrong. 8225024. Fix the real problem in 8213383: the code that splits very large blocks when no other place to put constants can be found was not considering the case that the block contained a Thumb tablejump. llvm-svn: 109282
-
Evan Cheng authored
it's too late to start backing off aggressive latency scheduling when most of the registers are in use so the threshold should be a bit tighter. - Correctly handle live out's and extract_subreg etc. - Enable register pressure aware scheduling by default for hybrid scheduler. For ARM, this is almost always a win on # of instructions. It's runtime neutral for most of the tests. But for some kernels with high register pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by 54 and sped up by 20%. llvm-svn: 109279
-
Bruno Cardoso Lopes authored
llvm-svn: 109276
-
- Jul 23, 2010
-
-
Bruno Cardoso Lopes authored
llvm-svn: 109248
-
Gabor Greif authored
llvm-svn: 109224
-
Gabor Greif authored
llvm-svn: 109222
-
Bruno Cardoso Lopes authored
llvm-svn: 109207
-
Bruno Cardoso Lopes authored
llvm-svn: 109206
-
Bruno Cardoso Lopes authored
Add complete assembler support for FMA3 instructions, with descriptions and encodings taken from the AVX manual llvm-svn: 109204
-