- Jul 29, 2010
-
-
Jim Grosbach authored
llvm-svn: 109691
-
- Jul 28, 2010
-
-
Jakob Stoklund Olesen authored
The size of this object isn't used for anything - technically it is of variable size. This avoids a false positive from the assert in X86InstrInfo::loadRegFromStackSlot, and fixes PR7735. llvm-svn: 109652
-
Dan Gohman authored
of a std::vector. llvm-svn: 109597
-
Dan Gohman authored
to avoid undefined behavior on overflow, noticed by John Regehr. llvm-svn: 109594
-
Nate Begeman authored
This is about 4x faster and smaller than the existing scalarization. llvm-svn: 109566
-
Nate Begeman authored
~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches. For: define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp { entry: %shl = shl <4 x i32> %r, %a ; <<4 x i32>> [#uses=1] %tmp2 = bitcast <4 x i32> %shl to <2 x i64> ; <<2 x i64>> [#uses=1] ret <2 x i64> %tmp2 } We get: _shl: ## @shl pslld $23, %xmm1 paddd LCPI0_0, %xmm1 cvttps2dq %xmm1, %xmm1 pmulld %xmm1, %xmm0 ret Instead of: _shl: ## @shl pshufd $3, %xmm0, %xmm2 movd %xmm2, %eax pshufd $3, %xmm1, %xmm2 movd %xmm2, %ecx shll %cl, %eax movd %eax, %xmm2 pshufd $1, %xmm0, %xmm3 movd %xmm3, %eax pshufd $1, %xmm1, %xmm3 movd %xmm3, %ecx shll %cl, %eax movd %eax, %xmm3 punpckldq %xmm2, %xmm3 movd %xmm0, %eax movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm2 movhlps %xmm0, %xmm0 movd %xmm0, %eax movhlps %xmm1, %xmm1 movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm0 punpckldq %xmm0, %xmm2 movdqa %xmm2, %xmm0 punpckldq %xmm3, %xmm0 ret llvm-svn: 109549
-
- Jul 27, 2010
-
-
Michael J. Spencer authored
llvm-svn: 109494
-
Jakob Stoklund Olesen authored
subregister operands like this: %reg1040:sub_32bit<def> = MOV32rm <fi#-2>, 1, %reg0, 0, %reg0, %reg1040<imp-def>; mem:LD4[FixedStack-2](align=8) Make them return false when subreg operands are present. VirtRegRewriter is making bad assumptions otherwise. This fixes PR7713. llvm-svn: 109489
-
Jakob Stoklund Olesen authored
with a too-big register class. llvm-svn: 109488
-
Eli Friedman authored
llvm-svn: 109458
-
Anton Korobeynikov authored
llvm-svn: 109456
-
- Jul 26, 2010
-
-
Evan Cheng authored
llvm-svn: 109450
-
Anton Korobeynikov authored
llvm-svn: 109448
-
Bruno Cardoso Lopes authored
we are using AVX and no AVX version of the desired intruction is present, this is better for incremental dev (without fallbacks it's easier to spot what's missing). Not sure this is the best hack thought (we can also disable all HasSSE* predicates by dinamically marking them 'false' if AVX is present) llvm-svn: 109434
-
Anton Korobeynikov authored
This assumption is not satisfied due to global mergeing. Workaround the issue by temporary disablinge mergeing of const globals. Also, ignore LLVM "special" globals. This fixes PR7716 llvm-svn: 109423
-
Evan Cheng authored
llvm-svn: 109421
-
- Jul 25, 2010
-
-
Douglas Gregor authored
llvm-svn: 109373
-
Douglas Gregor authored
llvm-svn: 109372
-
- Jul 24, 2010
-
-
Anton Korobeynikov authored
llvm-svn: 109359
-
Evan Cheng authored
appropriate for targets without detailed instruction iterineries. The scheduler schedules for increased instruction level parallelism in low register pressure situation; it schedules to reduce register pressure when the register pressure becomes high. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%. llvm-svn: 109300
-
Bruno Cardoso Lopes authored
llvm-svn: 109295
-
Jim Grosbach authored
function live in set. This will give us tGPR for Thumb1 and GPR otherwise, so the copy will be spillable. rdar://8224931 llvm-svn: 109293
-
Dale Johannesen authored
comments explaining why it was wrong. 8225024. Fix the real problem in 8213383: the code that splits very large blocks when no other place to put constants can be found was not considering the case that the block contained a Thumb tablejump. llvm-svn: 109282
-
Evan Cheng authored
it's too late to start backing off aggressive latency scheduling when most of the registers are in use so the threshold should be a bit tighter. - Correctly handle live out's and extract_subreg etc. - Enable register pressure aware scheduling by default for hybrid scheduler. For ARM, this is almost always a win on # of instructions. It's runtime neutral for most of the tests. But for some kernels with high register pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by 54 and sped up by 20%. llvm-svn: 109279
-
Bruno Cardoso Lopes authored
llvm-svn: 109276
-
- Jul 23, 2010
-
-
Bruno Cardoso Lopes authored
llvm-svn: 109248
-
Gabor Greif authored
llvm-svn: 109224
-
Gabor Greif authored
llvm-svn: 109222
-
Bruno Cardoso Lopes authored
llvm-svn: 109207
-
Bruno Cardoso Lopes authored
llvm-svn: 109206
-
Bruno Cardoso Lopes authored
Add complete assembler support for FMA3 instructions, with descriptions and encodings taken from the AVX manual llvm-svn: 109204
-
Dale Johannesen authored
SSE, so we can't return floating point values if this is disabled. Detect this error for clang. With SSE1 only, f64 is a problem; it can be done, but neither llvm-gcc nor clang has ever generated correct code for it. Since nobody noticed this I think it's OK to treat it as an error for now. This also handles SSE-sized vectors of floating point. 8207686, 8204109. llvm-svn: 109201
-
Bruno Cardoso Lopes authored
Fix some AVX instructions which didnt had HasAVX prefix. And also a problem with PINSRW, which was totally wrong because of a typo I introduced previously llvm-svn: 109198
-
- Jul 22, 2010
-
-
Chris Lattner authored
ARM/PPC/MSP430-specific code (which are the only targets that implement the hook) can directly reference their target-specific instrinfo classes. llvm-svn: 109171
-
Bruno Cardoso Lopes authored
Add remaining AVX instructions (most of them dealing with GR64 destinations. This complete the assembler support for the general AVX ISA. But we still miss instructions from FMA3 and CLMUL specific feature flags, which are now the next step llvm-svn: 109168
-
Chris Lattner authored
llvm-svn: 109167
-
Chris Lattner authored
This is probably not the best way to implement "Force LR to be spilled if the Thumb function size is > 2048." do this, it should use the branch shortening infrastructure, but I'm just preserving functionality here. llvm-svn: 109165
-
Chris Lattner authored
llvm-svn: 109154
-
Chris Lattner authored
rip out the implementation of X86InstrInfo::GetInstSizeInBytes. The code being ripped out just implemented a copy and hacked up version of the (old) instruction encoder, and is buggy and terrible in other ways. Since "GetInstSizeInBytes" is really only there to support the JIT's "NeedsExactSize" hook (which noone is using), just rip out the code. I will rip out the NeedsExactSize hook next. This resolves rdar://7617809 - switch X86InstrInfo::GetInstSizeInBytes to use X86MCCodeEmitter llvm-svn: 109149
-
Xerxes Ranby authored
llvm-svn: 109125
-