Commits · 9bc48e521518407efb7910ac81be022cf66c91f6 · Roger Ferrer / llvm-epi-0.8

Jan 11, 2012

Disable the transformation I added in r147936 to see if it fixes some · 9bc48e52

Chandler Carruth authored Jan 11, 2012

strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.

llvm-svn: 147945

9bc48e52

Hoist a really redundant code pattern into a helper function, and delete · 3eacfb83
Chandler Carruth authored Jan 11, 2012
```
lots of lines of code. No functionality changed.

llvm-svn: 147942
```
3eacfb83
Simplify the AND-rooted mask+shift checking code to match that of the · b0049f4a
Chandler Carruth authored Jan 11, 2012
```
SRL-rooted code.

llvm-svn: 147941
```
b0049f4a

Unify the interface of the three mask+shift transform helpers, and · 3dbcda84

Chandler Carruth authored Jan 11, 2012

factor the differences that were hiding in one of them into its other
caller, the SRL handling code. No change in behavior.

llvm-svn: 147940

3dbcda84

Clarify and make explicit some of the requirements for transforming · aa01e666

Chandler Carruth authored Jan 11, 2012

mask+shift pairs at the beginning of the ISD::AND case block, and then
hoist the final pattern into a helper function, simplifying and
reflowing it appropriately. This should have no observable behavior
change, but several simplifications fell out of this such as directly
computing the new mask constant, etc.

llvm-svn: 147939

aa01e666

Hoist the logic to transform shift+mask combinations into sub-register · 51d3076b

Chandler Carruth authored Jan 11, 2012

extracts and scaled addressing modes into its own helper function. No
functionality changed here, just hoisting and layout fixes falling out
of that hoisting.

llvm-svn: 147937

51d3076b

Teach the X86 instruction selection to do some heroic transforms to · 55b2cdee

Chandler Carruth authored Jan 11, 2012

detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

llvm-svn: 147936

55b2cdee

Jan 09, 2012

Don't rely on the fact that shift values are never very large, and thus · c16622da

Chandler Carruth authored Jan 09, 2012

this substraction will result in small negative numbers at worst which
become very large positive numbers on assignment and are thus caught by
the <=4 check on the next line. The >0 check clearly intended to catch
these as negative numbers.

Spotted by inspection, and impossible to trigger given the shift widths
that can be used.

llvm-svn: 147773

c16622da

Nov 16, 2011
- Added missing comment about new custom lowering of DEC64 · 48784ed5
  Pete Cooper authored Nov 16, 2011
  
  llvm-svn: 144811
  48784ed5
Nov 15, 2011
- Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used · 7c7ba1ba
  Pete Cooper authored Nov 15, 2011
  
  by later instructions. Only done for DEC64m right now. Fixes <rdar://problem/6172640> llvm-svn: 144705
  7c7ba1ba
Nov 03, 2011

Reapply r143206, with fixes. Disallow physical register lifetimes · 198b7ffc

Dan Gohman authored Nov 03, 2011

across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660

198b7ffc

Oct 29, 2011
- Revert r143206, as there are still some failing tests. · 9b9c9701
  Dan Gohman authored Oct 29, 2011
  
  llvm-svn: 143262
  9b9c9701
Oct 28, 2011

Reapply r143177 and r143179 (reverting r143188), with scheduler · 73057ad2

Dan Gohman authored Oct 28, 2011

fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.

llvm-svn: 143206

73057ad2

Speculatively disable Dan's commits 143177 and 143179 to see if · 225a7037

Duncan Sands authored Oct 28, 2011

it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

Delete #if 0 code accidentally left in.

llvm-svn: 143188

225a7037

Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW · 4db3f7dd

Dan Gohman authored Oct 28, 2011

on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

llvm-svn: 143177

4db3f7dd

Oct 08, 2011

Add TEST8ri_NOREX pseudo to constrain sub_8bit_hi copies. · 729abd36

Jakob Stoklund Olesen authored Oct 08, 2011

In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
target all GR8 registers, only those in GR8_NOREX.

TO enforce this, we ensure that all instructions using the
EXTRACT_SUBREG are GR8_NOREX constrained.

This fixes PR11088.

llvm-svn: 141499

729abd36

Aug 01, 2011
- Teach PreprocessISelDAG to be aware of vector types and to not process them. · 616fe605
  Bruno Cardoso Lopes authored Aug 01, 2011
  
  llvm-svn: 136653
  616fe605
Jul 13, 2011

Make sure we don't combine a large displacement and a frame index in the same... · 344ec797

Eli Friedman authored Jul 13, 2011

Make sure we don't combine a large displacement and a frame index in the same addressing mode on x86-64.  It can overflow, leading to a crash/miscompile.

<rdar://problem/9763308>

llvm-svn: 135084

344ec797

Refactor out checking for displacements on x86-64 addressing modes. No... · ef67e7d6

Eli Friedman authored Jul 13, 2011

Refactor out checking for displacements on x86-64 addressing modes. No functionality change. Refactoring in preparation for an additional safety check in FoldOffsetIntoAddress.

Part of <rdar://problem/9763308>.

llvm-svn: 135079

ef67e7d6

Jul 02, 2011
- TargetConstant immediates won't be placed into registers so tighten · a8a56f7e
  Eric Christopher authored Jul 01, 2011
  
  up the valid constant check earlier. rdar://9692967 llvm-svn: 134286
  a8a56f7e
Jun 30, 2011
- Fix a small thinko for constant i64 lock/orq optimization where we · c9321737
  Eric Christopher authored Jun 30, 2011
  
  we didn't have an opcode for 64-bit constant or expressions. Fixes rdar://9692967 llvm-svn: 134121
  c9321737
May 20, 2011
- Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends. · 91f1d247
  Stuart Hastings authored May 20, 2011
  
  rdar://problem/8614450 llvm-svn: 131746
  91f1d247
May 17, 2011
- Update comment. · 56a42ebf
  Eric Christopher authored May 17, 2011
  
  llvm-svn: 131459
  56a42ebf
- Support XOR and AND optimization with no return value. · a1d9e295
  Eric Christopher authored May 17, 2011
  
  Finishes off rdar://8470697 llvm-svn: 131458
  a1d9e295
- Couple less magic numbers. · abfe3131
  Eric Christopher authored May 17, 2011
  
  llvm-svn: 131457
  abfe3131
- Make this code a little less magic number laden. · eb47a2a1
  Eric Christopher authored May 17, 2011
  
  llvm-svn: 131456
  eb47a2a1
May 11, 2011
- Turn this into a table, this will make more sense shortly. · 2a9dbbbb
  Eric Christopher authored May 11, 2011
  
  Part of rdar://8470697 llvm-svn: 131200
  2a9dbbbb
- Optimize atomic lock or that doesn't use the result value. · 4a34e61e
  Eric Christopher authored May 10, 2011
  
  Next up: xor and and. Part of rdar://8470697 llvm-svn: 131171
  4a34e61e
Apr 23, 2011
- Silence an overzealous uninitialized variable warning from GCC. · 3db05465
  Benjamin Kramer authored Apr 23, 2011
  
  llvm-svn: 130053
  3db05465
Apr 22, 2011

X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X &... · 4c816247

Benjamin Kramer authored Apr 22, 2011

X86: Try to use a smaller encoding by transforming (X << C1) & C2 into (X & (C2 >> C1)) & C1. (Part of PR5039)

This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a]
movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
andq %rdi, %rax ## encoding: [0x48,0x21,0xf8]
ret ## encoding: [0xc3]

with this patch we can fold the immediate into the and:
andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a]
ret ## encoding: [0xc3]

It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.

llvm-svn: 129990

4c816247

Feb 16, 2011
- Swap VT and DebugLoc operands of getExtLoad() for consistency with · 81c43060
  Stuart Hastings authored Feb 16, 2011
  
  other getNode() methods. Radar 9002173. llvm-svn: 125665
  81c43060
Feb 13, 2011

Enhance ComputeMaskedBits to know that aligned frameindexes · 46c01a30

Chris Lattner authored Feb 13, 2011

have their low bits set to zero.  This allows us to optimize
out explicit stack alignment code like in stack-align.ll:test4 when
it is redundant.

Doing this causes the code generator to start turning FI+cst into
FI|cst all over the place, which is general goodness (that is the
canonical form) except that various pieces of the code generator
don't handle OR aggressively.  Fix this by introducing a new
SelectionDAG::isBaseWithConstantOffset predicate, and using it
in places that are looking for ADD(X,CST).  The ARM backend in
particular was missing a lot of addressing mode folding opportunities
around OR.

llvm-svn: 125470

46c01a30

Jan 27, 2011
- lib/Target/X86/X86ISelDAGToDAG.cpp: __main should be WINCALL64 on Win64. · f3e20b9f
  NAKAMURA Takumi authored Jan 27, 2011
  
  CALL64 marks %xmm* as dead. llvm-svn: 124354
  f3e20b9f
Jan 16, 2011

fix PR8514, a bug where the "heroic" transformation of shift/and · 35a2e65b

Chris Lattner authored Jan 16, 2011

into and/shift would cause nodes to move around and a dangling pointer
to happen.  The code tried to avoid this with a HandleSDNode, but 
got the details wrong.

llvm-svn: 123578

35a2e65b

Jan 14, 2011
- 'HiReg' is written but never read. Nuke its · b5241b2b
  Ted Kremenek authored Jan 14, 2011
  
  declaration and its assignments. Found by clang static analyzer. llvm-svn: 123486
  b5241b2b
Jan 06, 2011

PR8918 - When used with MinGW64, LLVM generates a "calll __main" at the · 81d40711

Bill Wendling authored Jan 06, 2011

beginning of the "main" function. The assembler complains about the invalid
suffix for the 'call' instruction. The right instruction is "callq __main".
Patch by KS Sreeram!

llvm-svn: 122933

81d40711

Dec 21, 2010
- rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for · 3e5fbd74
  Chris Lattner authored Dec 21, 2010
  
  something that just glues two nodes together, even if it is sometimes used for flags. llvm-svn: 122310
  3e5fbd74
Dec 05, 2010

it turns out that when ".with.overflow" intrinsics were added to the X86 · 364bb0a0

Chris Lattner authored Dec 05, 2010

backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935

364bb0a0

Oct 27, 2010

Use a MemIntrinsicSDNode for ISD::PREFETCH, which touches · e660f4d0

Dale Johannesen authored Oct 26, 2010

memory, so a MachineMemOperand is useful (not propagated
into the MachineInstr yet).  No functional change except
for dump output.

llvm-svn: 117413

e660f4d0

Oct 06, 2010
- Use #NAME# to have the CMOV multiclass define things with the same names as before · 1a1c6001
  Chris Lattner authored Oct 05, 2010
  
  (e.g. CMOVBE16rr instead of CMOVBErr16). llvm-svn: 115705
  1a1c6001