Chilly Willy, on 18 Jun 2014 - 9:33 PM wrote:
Just a note about trying to get the GPU gcc running. I installed dosbox, and you can run gcc like this:
cd to the directory where the official devkit is.
Run under dosbox using the command line
That sets the C drive to the devkit, makes C: the current drive, makes a path for the compiler as well as where the test code is (just devkit/test/ for the test), and then runs gcc on the test file. It converts thisCode: Select all
dosbox -c "MOUNT C ." -c "C:" -c "PATH=C:\\jaguar\\Bin;C:\\jaguar\\Bin\\AGPU\\2.6\\" -c "GCC.EXE -mnoalt -c -S C:\\test\\test.c -o C:\\test\\test.s" -c "EXIT"
into thisCode: Select all
/* simple test file for gcc */ /* compile (curdir is devkit): dosbox -c "MOUNT C ." -c "C:" -c "PATH=C:\\jaguar\\Bin;C:\\jaguar\\Bin\\AGPU\\2.6\\" -c "GCC.EXE -mnoalt -c -S C:\\test\\test.c -o C:\\test\\test.s" -c "EXIT" */ int global = 5; int GetTest1(int val) { int test = 0; if (val >= 10) { test = 100; } else if (val >= 10) { test = 10; } else if (val) { test = 1; } return test; } int GetTest2(int val) { int test = 0; switch (val) { case 1: case 2: case 3: case 4: case 5: case 6: case 7: case 8: case 9: test = 1; break; default: test = 10; } return test; } int Test3(int val) { int x, y, z; for (x=0, z=0; x<val; x++) { y = 1 << x; z += (GetTest1(x) + GetTest2(y)); } return z; }
I need to play around with the gcc switches (only using -mnoalt in the test), then make it use smac on the output file. My intention is to make a standard up-to-date gcc cross-compiler for the 68000, and have the makefile call the GPU GCC as above. The 68000 object files will be converted to COFF so that in the end, sln can link it all up. All of that will be handled by rules in the makefile so that the programmer doesn't need to worry about any of the internals. They'll simple define a set of 68k objects, a set of gpu objects, a set of dsp objects, and all the rest should be handled by the rules.Code: Select all
;GCC for Atari Jaguar GPU/DSP (Jun 12 1995) (C)1994-95 Brainstorm MACRO _RTS load (ST),TMP jump T,(TMP) addqt #4,ST ;rts ENDM _test_start:: .GPU .ORG $F03000 ST .REGEQU r18 FP .REGEQU r17 TMP .REGEQU r16 GT .CCDEF $15 gcc2_compiled_for_madmac: ;(.DATA) .LONG _global:: .DC.L 5 ;(.TEXT) .EVEN _GetTest1:: subqt #4,ST store FP,(ST) move ST,FP ;link subqt #8,ST move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #4,r2 ;isubqtsi3 r2-#4->r2 move r2,r3 ;movsi r2->r3 store r0,(r3) ;movsi r0->(r3) move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #0,r6 ;movsi #0->r6 store r6,(r0) ;movsi r6->(r0) move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #4,r4 ;isubqtsi3 r4-#4->r4 move r4,r0 ;movsi r4->r0 load (r0),r4 ;movsi (r0)->r4 moveq #9,r0 ;movsi #9->r0 cmp r4,r0 ;rcmpsi r4,r0 movei #L2,TMP jump GT,(TMP) nop ;jgt L2 move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 movei #100,r6 ;movsi #100->r6 store r6,(r0) ;movsi r6->(r0) movei #L3,TMP jump T,(TMP) nop ;jt L3 .EVEN L2: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #4,r4 ;isubqtsi3 r4-#4->r4 move r4,r0 ;movsi r4->r0 load (r0),r4 ;movsi (r0)->r4 moveq #9,r0 ;movsi #9->r0 cmp r4,r0 ;rcmpsi r4,r0 movei #L4,TMP jump GT,(TMP) nop ;jgt L4 move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #10,r6 ;movsi #10->r6 store r6,(r0) ;movsi r6->(r0) movei #L5,TMP jump T,(TMP) nop ;jt L5 .EVEN L4: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #4,r4 ;isubqtsi3 r4-#4->r4 move r4,r0 ;movsi r4->r0 load (r0),r4 ;movsi (r0)->r4 cmpq #0,r4 ;tstsi r4 movei #L6,TMP jump EQ,(TMP) nop ;jeq L6 move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #1,r6 ;movsi #1->r6 store r6,(r0) ;movsi r6->(r0) L6: L5: L3: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r5 ;movsi r4->r5 load (r5),r0 ;movsi (r5)->r0 movei #L1,TMP jump T,(TMP) nop ;jt L1 .EVEN L1: move FP,ST load (ST),FP addqt #4,ST ;unlk _RTS .EVEN _GetTest2:: subqt #4,ST store FP,(ST) move ST,FP ;link subqt #8,ST move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #4,r2 ;isubqtsi3 r2-#4->r2 move r2,r3 ;movsi r2->r3 store r0,(r3) ;movsi r0->(r3) move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #0,r6 ;movsi #0->r6 store r6,(r0) ;movsi r6->(r0) move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #4,r4 ;isubqtsi3 r4-#4->r4 move r4,r5 ;movsi r4->r5 load (r5),r0 ;movsi (r5)->r0 moveq #9,r4 ;movsi #9->r4 cmp r4,r0 ;cmpsi r4,r0 movei #L18,TMP jump GT,(TMP) nop ;jgt L18 moveq #1,r4 ;movsi #1->r4 cmp r4,r0 ;cmpsi r4,r0 movei #L18,TMP jump MI,(TMP) nop ;jlt L18 movei #L9,TMP jump T,(TMP) nop ;jt L9 .EVEN L9: L10: L11: L12: L13: L14: L15: L16: L17: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #1,r6 ;movsi #1->r6 store r6,(r0) ;movsi r6->(r0) movei #L8,TMP jump T,(TMP) nop ;jt L8 .EVEN L18: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r0 ;movsi r4->r0 moveq #10,r6 ;movsi #10->r6 store r6,(r0) ;movsi r6->(r0) L8: move FP,r0 ;movsi FP->r0 move FP,r4 ;movsi FP->r4 subqt #8,r4 ;isubqtsi3 r4-#8->r4 move r4,r5 ;movsi r4->r5 load (r5),r0 ;movsi (r5)->r0 movei #L7,TMP jump T,(TMP) nop ;jt L7 .EVEN L7: move FP,ST load (ST),FP addqt #4,ST ;unlk _RTS .EVEN _Test3:: subqt #4,ST store FP,(ST) move ST,FP ;link subqt #32,ST subqt #8,ST move ST,r14 store r19,(ST) store r20,(r14+1) store r21,(r14+2) store r22,(r14+3) store r23,(r14+4) store r24,(r14+5) move FP,r19 ;movsi FP->r19 move FP,r20 ;movsi FP->r20 subqt #4,r20 ;isubqtsi3 r20-#4->r20 move r20,r21 ;movsi r20->r21 store r0,(r21) ;movsi r0->(r21) move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #8,r1 ;isubqtsi3 r1-#8->r1 move r1,r0 ;movsi r1->r0 moveq #0,r3 ;movsi #0->r3 store r3,(r0) ;movsi r3->(r0) move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #16,r1 ;isubqtsi3 r1-#16->r1 move r1,r0 ;movsi r1->r0 moveq #0,r3 ;movsi #0->r3 store r3,(r0) ;movsi r3->(r0) L21: move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #8,r1 ;isubqtsi3 r1-#8->r1 move r1,r0 ;movsi r1->r0 move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #4,r2 ;isubqtsi3 r2-#4->r2 move r2,r1 ;movsi r2->r1 load (r0),r0 ;movsi (r0)->r0 load (r1),r1 ;movsi (r1)->r1 cmp r1,r0 ;cmpsi r1,r0 movei #L24,TMP jump MI,(TMP) nop ;jlt L24 movei #L22,TMP jump T,(TMP) nop ;jt L22 .EVEN L24: move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #12,r1 ;isubqtsi3 r1-#12->r1 move r1,r0 ;movsi r1->r0 move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #8,r2 ;isubqtsi3 r2-#8->r2 move r2,r1 ;movsi r2->r1 moveq #1,r2 ;movsi #1->r2 load (r1),r1 ;movsi (r1)->r1 neg r1 ;negsi2 r1->r1 sha r1,r2 ;iashlsi3 r2<<r1->r2 neg r1 ;negsi2 r1->r1 store r2,(r0) ;movsi r2->(r0) move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #16,r1 ;isubqtsi3 r1-#16->r1 move r1,r22 ;movsi r1->r22 move ST,r0 ;movsi ST->r0 move FP,r1 ;movsi FP->r1 move FP,r0 ;movsi FP->r0 subqt #8,r0 ;isubqtsi3 r0-#8->r0 move r0,r1 ;movsi r0->r1 load (r1),r0 ;movsi (r1)->r0 movei #_GetTest1,r23 ;movsi #_GetTest1->r23 move PC,TMP subqt #4,ST addqt #10,TMP jump T,(r23) store TMP,(ST) ;call r23->r0 move r0,r23 ;movsi r0->r23 move ST,r0 ;movsi ST->r0 move FP,r1 ;movsi FP->r1 move FP,r0 ;movsi FP->r0 subqt #12,r0 ;isubqtsi3 r0-#12->r0 move r0,r1 ;movsi r0->r1 load (r1),r0 ;movsi (r1)->r0 movei #_GetTest2,r24 ;movsi #_GetTest2->r24 move PC,TMP subqt #4,ST addqt #10,TMP jump T,(r24) store TMP,(ST) ;call r24->r0 move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #16,r2 ;isubqtsi3 r2-#16->r2 move r2,r1 ;movsi r2->r1 add r23,r0 ;iaddsi3 r23+r0->r0 load (r1),r1 ;movsi (r1)->r1 add r1,r0 ;iaddsi3 r1+r0->r0 store r0,(r22) ;movsi r0->(r22) L23: move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #8,r1 ;isubqtsi3 r1-#8->r1 move r1,r0 ;movsi r1->r0 move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #8,r2 ;isubqtsi3 r2-#8->r2 move r2,r0 ;movsi r2->r0 move FP,r1 ;movsi FP->r1 move FP,r2 ;movsi FP->r2 subqt #8,r2 ;isubqtsi3 r2-#8->r2 move r2,r1 ;movsi r2->r1 load (r1),r2 ;movsi (r1)->r2 move r2,r1 ;movsi r2->r1 addqt #1,r1 ;iaddqtsi3 #1+r1->r1 move r1,r2 ;movsi r1->r2 store r2,(r0) ;movsi r2->(r0) movei #L21,TMP jump T,(TMP) nop ;jt L21 .EVEN L22: move FP,r0 ;movsi FP->r0 move FP,r1 ;movsi FP->r1 subqt #16,r1 ;isubqtsi3 r1-#16->r1 move r1,r2 ;movsi r1->r2 load (r2),r0 ;movsi (r2)->r0 movei #L20,TMP jump T,(TMP) nop ;jt L20 .EVEN L20: move ST,r14 load (ST),r19 load (r14+1),r20 load (r14+2),r21 load (r14+3),r22 load (r14+4),r23 load (r14+5),r24 move FP,ST load (ST),FP addqt #4,ST ;unlk _RTS .LONG .68000 _test_end:: _test_size .EQU *-_test_start .GLOBL _test_size .IF _test_size>$1000 .PRINT "Code size (",/l/x _test_size,") is over $1000" .FAIL .ENDIF
The second structure would be the 68000 running code like normal, but the gpu also running out of main ram. The 68000 code could halt the 68000, leaving the gpu as the main processor. The gpu could still load and execute gpu functions in local ram for best speed on code that needs the speed. Remember that in most apps/games, only maybe 5% of the code needs to be optimized, and part of the optimizing of gpu code is running from local ram. Of course, this sort of setup requires the gpu gcc be generating good code.
Running jwarn is a good suggestion; I also need a small command line tool (I will write it myself) that will go through generated assembly files for code meant to be in main ram that converts jump/jr into mjump/mjr opcodes, and sticks mpad in front of code labels (replacing the .EVEN directive the generated files have in front of all labels). With those in place, smac should be able to compile the code with the proper alignment for running in main ram.
The DSP doesn't have a lot of bandwidth for running from main ram, so it should probably always be loaded and run in local ram.
The way I see the Jaguar working is like this:
The DSP runs code in local ram, acting as a sound mixer and manually handling serial (to avoid the serial bug).
The 68000 spends most of its time halted, occasionally interrupted to process the music score, read the pads, and maybe setup the OP lists.
The GPU now runs the game, mainly from main ram (which is faster than the 68000 would run the same code in main ram), but also still calling code in local ram as needed for things that need the most speed.
It wouldn't be the absolute fastest way to run things, but it would simplify things for programmers while still being fairly fast (faster than trying to run most of the game code on the 68000).
Chily Willy diving into the risc gcc
Moderator: a31chris
Chily Willy diving into the risc gcc
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
Re: Chily Willy diving into the risc gcc
Chilly Willy wrote:By the way, using -O2 -fomit-frame-pointer (as recommended in the readme for gcc) gives a much smaller asm.
Test3() still has the stack setup/cleanup... there's another switch for that.Code: Select all
;GCC for Atari Jaguar GPU/DSP (Jun 12 1995) (C)1994-95 Brainstorm MACRO _RTS load (ST),TMP jump T,(TMP) addqt #4,ST ;rts ENDM _test_start:: .GPU .ORG $F03000 ST .REGEQU r18 TMP .REGEQU r16 GT .CCDEF $15 gcc2_compiled_for_madmac: ;(.DATA) .LONG _global:: .DC.L 5 ;(.TEXT) .EVEN _GetTest1:: move r0,r2 ;movsi r0->r2 cmpq #9,r2 ;pcmpsi #9,r2 jr GT,L4 ;jgt L4 moveq #0,r1 ;movsi #0->r1 movei #100,r1 ;movsi #100->r1 jr T,L7 ;jt L7 move r1,r0 ;movsi r1->r0 .EVEN L4: cmpq #0,r2 ;tstsi r2 jr EQ,L7 ;jeq L7 move r1,r0 ;movsi r1->r0 moveq #1,r1 ;movsi #1->r1 move r1,r0 ;movsi r1->r0 L7: _RTS .EVEN _GetTest2:: cmpq #9,r0 ;cmpsi #9,r0 jr GT,L19 ;jgt L19 moveq #10,r1 ;movsi #10->r1 cmpq #1,r0 ;cmpsi #1,r0 jr MI,L21 ;jlt L21 move r1,r0 ;movsi r1->r0 moveq #1,r1 ;movsi #1->r1 L19: move r1,r0 ;movsi r1->r0 L21: _RTS .EVEN _Test3:: subqt #32,ST move ST,r14 store r19,(ST) store r20,(r14+1) store r21,(r14+2) store r22,(r14+3) store r23,(r14+4) store r24,(r14+5) store r25,(r14+6) store r26,(r14+7) move r0,r23 ;movsi r0->r23 moveq #0,r22 ;movsi #0->r22 cmp r23,r22 ;cmpsi r23,r22 movei #L24,TMP jump PL,(TMP) ;jge L24 moveq #0,r21 ;movsi #0->r21 moveq #1,r26 ;movsi #1->r26 movei #_GetTest1,r25 ;movsi #_GetTest1->r25 movei #_GetTest2,r24 ;movsi #_GetTest2->r24 L26: neg r21 ;negsi2 r21->r21 move r26,r20 ;movsi r26->r20 sha r21,r20 ;iashlsi3 r20<<r21->r20 neg r21 ;negsi2 r21->r21 move r21,r0 ;movsi r21->r0 move PC,TMP subqt #4,ST addqt #10,TMP jump T,(r25) store TMP,(ST) ;call r25->r0 move r0,r19 ;movsi r0->r19 move r20,r0 ;movsi r20->r0 addqt #1,r21 ;iaddqtsi3 #1+r21->r21 move PC,TMP subqt #4,ST addqt #10,TMP jump T,(r24) store TMP,(ST) ;call r24->r0 add r0,r19 ;iaddsi3 r0+r19->r19 cmp r23,r21 ;cmpsi r23,r21 movei #L26,TMP jump MI,(TMP) ;jlt L26 add r19,r22 ;iaddsi3 r19+r22->r22 L24: move r22,r0 ;movsi r22->r0 move ST,r14 load (ST),r19 load (r14+1),r20 load (r14+2),r21 load (r14+3),r22 load (r14+4),r23 load (r14+5),r24 load (r14+6),r25 load (r14+7),r26 addqt #32,ST _RTS .LONG .68000 _test_end:: _test_size .EQU *-_test_start .GLOBL _test_size .IF _test_size>$1000 .PRINT "Code size (",/l/x _test_size,") is over $1000" .FAIL .ENDIF
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
Re: Chily Willy diving into the risc gcc
I am going to assume that one of the big pushes almost immediately was for the risc gcc to produce good code knowing it would be running in only 4k of local ram. I am going to guess that the push was for it to be producing good optimized code as soon and as fast as possible.Chilly Willy wrote: Now that I know how to call dosbox to run the gpu gcc, I'm thinking about different program structures. The first is the default structure Atari used - the main program runs on the 68000 with blocks of helper code loaded to the gpu/dsp local ram as needed. That is easy to handle, but I will need functions to load and execute risc code.
The second structure would be the 68000 running code like normal, but the gpu also running out of main ram. The 68000 code could halt the 68000, leaving the gpu as the main processor. The gpu could still load and execute gpu functions in local ram for best speed on code that needs the speed. Remember that in most apps/games, only maybe 5% of the code needs to be optimized, and part of the optimizing of gpu code is running from local ram. Of course, this sort of setup requires the gpu gcc be generating good code.
And HVS was of course using it so it must of been very usable.
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
Re: Chily Willy diving into the risc gcc
When attempting to assemble the test.s with smac this error is generated from the '.IF _test_size>$1000' line.
Code: Select all
load (ST),TMP
jump T,(TMP)
addqt #4,ST ;rts
ENDM
_test_start::
.GPU
.ORG $F03000
ST .REGEQU r18
FP .REGEQU r17
TMP .REGEQU r16
GT .CCDEF $15
gcc2_compiled_for_madmac:
;(.DATA)
.LONG
....
.LONG
.68000
_test_end::
_test_size .EQU *-_test_start
.GLOBL _test_size
.IF _test_size>$1000
.PRINT "Code size (",/l/x _test_size,") is over $1000"
.FAIL
.ENDIF
Smac (Win32) report 1 error:
test.s[339]: Error: bad (section) expression
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
SMAC bugfix notification
The above mentioned problem with Smac has now been fixed thank you very much Chilly Willy!
http://3do.cdinteractive.co.uk/viewtopi ... =35&t=3591
http://3do.cdinteractive.co.uk/viewtopi ... =35&t=3591
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
Re: Chily Willy diving into the risc gcc
Code: Select all
;GCC for Atari Jaguar GPU/DSP (Jun 12 1995) (C)1994-95 Brainstorm
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman
Re: Chily Willy diving into the risc gcc
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman