Documentation questions GPU
Hi everyone, some beginner's questions on the GPU below, all from the TechRef v8. Greatly appreciate any help or clarification that anyone can provide here!
(1) On page 35 there's a statement regarding the register score boarding:
"WARNING - No score-board protection applies to writes. Therefore, if two instructions both write to the same
register and the first one completes after the second, the data will be written out of sequence. If they both write
at the same time, then the results are unpredictable. This only appplies where the second instruction does not
read the register."
Is the only time when this scenario can happen when there are two LOADs to the same register from memory without any intervening commands that would read that register (and trigger wait states), such as something like:
0: LOAD [external memory] r10
1: ... stuff not reading r10 ...
2: LOAD [internal memory] r10
With 2 completing before 1? I was struggling to think of any other command sequence that could trigger this event.
(2) On page 38 there's an example ISR that clears the interrupt mask:
1: movei GPU_FLAGS,r30 ; point R30 at flags register
2: load (r30),r29 ; get flags
3: bclr 3,r29 ; clear IMASK
4: bset 11,r29 ; and interrupt 2 latch
5: load (r31),r28 ; get last instruction address
6: addq 2,r28 ; point at next to be executed
7: addq 4,r31 ; updating the stack pointer
8: jump (r28) ; and return
9: store r29,(r30) ; restore flags
Regaring line 8 above, if r31 is meant to hold the address of the last instruction that was executed before the interrupt occurred, why is:
(i) the interrupt service routine altering it at all?
(ii) and why is it altering it by adding 4 specifically?
(3) On the systolic matrix multiplies command - assuming you have an appropriate calculation, is there any performance benefit to MMULT over manually typing the IMULTN ... IMACN ... RESMAC sequence yourself? From the reading it appeared to me to be like a C style MACRO, and would function identically either way.
(4) Is there any additional performance overhead when moving data between registers in different banks when compared to within the same bank, i.e. MOVEFA vs MOVE?
(5) Excluding the interrupt service do both register banks behave identically in terms of register usage?
(6) And time for the painfully basic question that I'll kick myself later if I don't ask, when you want to schedule work on another processor, say the M68K or Blitter, this is done by sending them an interrupt right? And if you wanted to wait for completion (say the Blitter copying a new batch of program code into the GPU internal memory), you would clear the GPUGO bit and wait for that processor to set it again, there would be no requirement for them to invoke an interrupt on the GPU?
As I said, if anyone's able to help on these thank you very much.