Commits · d38a7605d8386e3996f24e8d394bd44189a2dc6c · Roger Ferrer / llvm-epi

Oct 14, 2004
- Checking in code that works on my simple test case. However, there is still a... · d38a7605
  Tanya Lattner authored Oct 14, 2004
  
  Checking in code that works on my simple test case. However, there is still a bug with branches that I need to fix. llvm-svn: 16979
  d38a7605
- There is only one field in an instruction, and that is `Inst', the final view of · 189f3dc8
  Misha Brukman authored Oct 14, 2004
  
  the instruction binary format, all others are simply operands and should not have the `field' label llvm-svn: 16978
  189f3dc8
- PowerPC instruction definitions use LittleEndian-style encoding [0..31] · d6ac8f5e
  Misha Brukman authored Oct 14, 2004
  
  llvm-svn: 16977
  d6ac8f5e
- Add isLittleEndianEncoding to InstrInfo class, defaults to `off' · dba1f62e
  Misha Brukman authored Oct 14, 2004
  
  llvm-svn: 16976
  dba1f62e
Oct 13, 2004
- Update to reflect changes in Makefile rules. · ace94df7
  Reid Spencer authored Oct 13, 2004
  
  llvm-svn: 16950
  ace94df7
Oct 11, 2004
- Fix a warning that is emitted on the suns · 58043a14
  Chris Lattner authored Oct 11, 2004
  
  llvm-svn: 16917
  58043a14
- Add ModuloScheduling to the recursive build tree · b4cb9fc7
  Misha Brukman authored Oct 10, 2004
  
  llvm-svn: 16905
  b4cb9fc7
- Adjust header file inclusion due to move · 9da1134b
  Misha Brukman authored Oct 10, 2004
  
  llvm-svn: 16904
  9da1134b
- Adjust comment header and paths to refect move · c70014be
  Misha Brukman authored Oct 10, 2004
  
  llvm-svn: 16903
  c70014be
- ModuloScheduling moved to lib/Target/SparcV9 as it is SparcV9-specific · 4a4af7e2
  Misha Brukman authored Oct 10, 2004
  
  llvm-svn: 16902
  4a4af7e2
- Initial version of automake Makefile.am file. · 823f302f
  Reid Spencer authored Oct 10, 2004
  
  llvm-svn: 16898
  823f302f
- Add the new InstrSched directory. · ef6ba8db
  Reid Spencer authored Oct 10, 2004
  
  llvm-svn: 16897
  ef6ba8db
- Initial version of automake Makefile.am file. · 97327f05
  Reid Spencer authored Oct 10, 2004
  
  llvm-svn: 16893
  97327f05
Oct 10, 2004
- Fix assertion failure when calling or returning from a function which · 4a69c9d6
  Brian Gaeke authored Oct 10, 2004
  
  returns 'bool' type. llvm-svn: 16884
  4a69c9d6
- Implement eliminateCallFramePseudoInstr(). · 7be91b34
  Brian Gaeke authored Oct 10, 2004
  
  Wrap a long comment line. llvm-svn: 16883
  7be91b34
- Model calls as *both* using *and* killing O0..O5, because callees use the · 9770e416
  Brian Gaeke authored Oct 10, 2004
  
  argument values passed in (so they're not dead until *after* the call), and callees are free to modify those registers. llvm-svn: 16882
  9770e416
- Fix whitespace and wrap some long lines. · 8a4d5caa
  Brian Gaeke authored Oct 10, 2004
  
  Deal with allocating stack space for outgoing args and copying them into the correct stack slots (at least, we can copy <=32-bit int args). We now correctly generate ADJCALLSTACK* instructions. llvm-svn: 16881
  8a4d5caa
- bling bling! · 3acdb7fa
  Chris Lattner authored Oct 10, 2004
  
  llvm-svn: 16873
  3acdb7fa
Oct 09, 2004
- Instead of silently breaking, print notification of why this doesn't work. · f9ae6db7
  Chris Lattner authored Oct 09, 2004
  
  llvm-svn: 16870
  f9ae6db7
- update according to tonight's info · 74584cd8
  Brian Gaeke authored Oct 09, 2004
  
  llvm-svn: 16866
  74584cd8
- Implement getModuleMatchQuality and getJITMatchQuality so that v8 will be the · 9c071078
  Brian Gaeke authored Oct 09, 2004
  
  default 32/BE target on sparc hosts, and ppc will continue to be the default on other hosts. llvm-svn: 16865
  9c071078
- The person who was planning to add SSE support isn't anymore, so disable · 23c8d0b6
  Chris Lattner authored Oct 08, 2004
  
  the -sse* options (to avoid misleading people). Also, the stack alignment of the target doesn't depend on whether SSE is eventually implemented, so remove a comment. llvm-svn: 16860
  23c8d0b6
- Fix a major regression from the bugfix for 2004-10-08-SelectSetCCFold.llx, · 97ea4206
  Chris Lattner authored Oct 08, 2004
  
  which prevented setcc's from being folded into branches. It appears that conditional branchinst's CC operand is actually operand(2), not operand(0) as we might expect. :( llvm-svn: 16859
  97ea4206
Oct 08, 2004

Adjust paths due to moving InstrSched to lib/Target/SparcV9 · e4e1360e
Misha Brukman authored Oct 08, 2004
```
llvm-svn: 16852
```
e4e1360e
InstrSched is SparcV9-specific and so has been moved to lib/Target/SparcV9/ · 24eb38af
Misha Brukman authored Oct 08, 2004
```
llvm-svn: 16849
```
24eb38af
Single-space instead of double-spacing in the Makefile · 5a9976ac
Misha Brukman authored Oct 08, 2004
```
llvm-svn: 16848
```
5a9976ac
Build InstrSched as well, and all three subdirs can be built independently · e75c2668
Misha Brukman authored Oct 08, 2004
```
llvm-svn: 16847
```
e75c2668
Fix bug: 2004-10-08-SelectSetCCFold.llx. Normally this is hidden by the · 0be2f504
Chris Lattner authored Oct 08, 2004
```
instcombine xform, which is why we didn't notice it before.

llvm-svn: 16840
```
0be2f504

Implement logical and with an immediate that consists of a contiguous block · b58dd679

Nate Begeman authored Oct 08, 2004

of one or more 1 bits (may wrap from least significant bit to most
significant bit) as the rlwinm rather than andi., andis., or some longer
instructons sequence.

int andn4(int z) { return z & -4; }
int clearhi(int z) { return z & 0x0000FFFF; }
int clearlo(int z) { return z & 0xFFFF0000; }
int clearmid(int z) { return z & 0x00FFFF00; }
int clearwrap(int z) { return z & 0xFF0000FF; }

_andn4:
        rlwinm r3, r3, 0, 0, 29
        blr

_clearhi:
        rlwinm r3, r3, 0, 16, 31
        blr

_clearlo:
        rlwinm r3, r3, 0, 0, 15
        blr

_clearmid:
        rlwinm r3, r3, 0, 8, 23
        blr

_clearwrap:
        rlwinm r3, r3, 0, 24, 7
        blr

llvm-svn: 16832

b58dd679

Several fixes and enhancements to the PPC32 backend. · 6e6514c4

Nate Begeman authored Oct 07, 2004

1. Fix an illegal argument to getClassB when deciding whether or not to
   sign extend a byte load.

2. Initial addition of isLoad and isStore flags to the instruction .td file
   for eventual use in a scheduler.

3. Rewrite of how constants are handled in emitSimpleBinaryOperation so
   that we can emit the PowerPC shifted immediate instructions far more
   often.  This allows us to emit the following code:

int foo(int x) { return x | 0x00F0000; }

_foo:
.LBB_foo_0:     ; entry
        ; IMPLICIT_DEF
        oris r3, r3, 15
        blr

llvm-svn: 16826

6e6514c4

Add ori reg, reg, 0 as a move instruction. This can be generated from · c6b63cd2

Nate Begeman authored Oct 07, 2004

loading a 32bit constant into a register whose low halfword is all zeroes.

We now omit the ori after the lis for the following C code:

int bar(int y) { return y * 0x00F0000; }

_bar:
.LBB_bar_0:     ; entry
        ; IMPLICIT_DEF
        lis r2, 15
        mullw r3, r3, r2
        blr

llvm-svn: 16825

c6b63cd2

Remove unnecessary header include · 70a9d9c0
Nate Begeman authored Oct 07, 2004
```
llvm-svn: 16824
```
70a9d9c0

Oct 06, 2004

Correct some typeos · f94f985b
Chris Lattner authored Oct 06, 2004
```
llvm-svn: 16770
```
f94f985b
Remove debugging code, fix encoding problem. This fixes the problems · 93867e51
Chris Lattner authored Oct 06, 2004
```
the JIT had last night.

llvm-svn: 16766
```
93867e51
Turning on fsel code gen now that we can do so would be good. · 9a1fbaf1
Nate Begeman authored Oct 06, 2004
```
llvm-svn: 16765
```
9a1fbaf1

Implement floating point select for lt, gt, le, ge using the powerpc fsel · fac8529d

Nate Begeman authored Oct 06, 2004

instruction.

Now, rather than emitting the following loop out of bisect:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f4
	bge .LBB_main_64	; no_exit.0.i
.LBB_main_63:	; no_exit.0.i
	b .LBB_main_65	; no_exit.0.i
.LBB_main_64:	; no_exit.0.i
	fmr f2, f1
.LBB_main_65:	; no_exit.0.i
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f5, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f5
	bge .LBB_main_67	; no_exit.0.i
.LBB_main_66:	; no_exit.0.i
	b .LBB_main_68	; no_exit.0.i
.LBB_main_67:	; no_exit.0.i
	fmr f4, f1
.LBB_main_68:	; no_exit.0.i
	fadd f1, f2, f4
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fcmpu cr0, f4, f0
	bgt .LBB_main_70	; no_exit.0.i
.LBB_main_69:	; no_exit.0.i
	b .LBB_main_71	; no_exit.0.i
.LBB_main_70:	; no_exit.0.i
	fmr f0, f4
.LBB_main_71:	; no_exit.0.i
	fsub f1, f2, f1
	addi r2, r2, -1
	fcmpu cr0, f1, f3
	blt .LBB_main_73	; no_exit.0.i
.LBB_main_72:	; no_exit.0.i
	b .LBB_main_74	; no_exit.0.i
.LBB_main_73:	; no_exit.0.i
	fmr f3, f1
.LBB_main_74:	; no_exit.0.i
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

We emit this instead:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	fsel f1, f1, f1, f2
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f2, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f2
	fsel f2, f2, f2, f4
	fadd f1, f1, f2
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fsub f5, f0, f4
	fsel f0, f5, f0, f4
	fsub f1, f2, f1
	addi r2, r2, -1
	fsub f2, f1, f3
	fsel f3, f2, f3, f1
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

llvm-svn: 16764

fac8529d

Codegen signed mod by 2 or -2 more efficiently. Instead of generating: · 6835dedb

Chris Lattner authored Oct 06, 2004

t:
        mov %EDX, DWORD PTR [%ESP + 4]
        mov %ECX, 2
        mov %EAX, %EDX
        sar %EDX, 31
        idiv %ECX
        mov %EAX, %EDX
        ret

Generate:
t:
        mov %ECX, DWORD PTR [%ESP + 4]
***     mov %EAX, %ECX
        cdq
        and %ECX, 1
        xor %ECX, %EDX
        sub %ECX, %EDX
***     mov %EAX, %ECX
        ret

Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.

Compare this to GCC, which generates:

t:
        mov     %eax, DWORD PTR [%esp+4]
        mov     %edx, %eax
        shr     %edx, 31
        lea     %ecx, [%edx+%eax]
        and     %ecx, -2
        sub     %eax, %ecx
        ret

or ICC 8.0, which generates:

t:
        movl      4(%esp), %ecx                                 #3.5
        movl      $-2147483647, %eax                            #3.25
        imull     %ecx                                          #3.25
        movl      %ecx, %eax                                    #3.25
        sarl      $31, %eax                                     #3.25
        addl      %ecx, %edx                                    #3.25
        subl      %edx, %eax                                    #3.25
        addl      %eax, %eax                                    #3.25
        negl      %eax                                          #3.25
        subl      %eax, %ecx                                    #3.25
        movl      %ecx, %eax                                    #3.25
        ret                                                     #3.25

We would be in great shape if not for the moves.

llvm-svn: 16763

6835dedb

Really fix FreeBSD, which apparently doesn't tolerate the extern. · e4c60eb7
Chris Lattner authored Oct 06, 2004
```
Thanks to Jeff Cohen for pointing out my goof.

llvm-svn: 16762
```
e4c60eb7

Fix a scary bug with signed division by a power of two. We used to generate: · 7bd8f133

Chris Lattner authored Oct 06, 2004

s:   ;; X / 4
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        mov %EDX, %EAX
        add %EDX, %ECX
        sar %EAX, 2
        ret

When we really meant:

s:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        add %EAX, %ECX
        sar %EAX, 2
        ret

Hey, this also reduces register pressure too :)

llvm-svn: 16761

7bd8f133

Codegen signed divides by 2 and -2 more efficiently. In particular · 147edd2f

Chris Lattner authored Oct 06, 2004

instead of:

s:   ;; X / 2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        ret

t:   ;; X / -2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        negl %eax
        ret

Emit:

s:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        ret

t:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        negl %eax
        ret

llvm-svn: 16760

147edd2f