Jaguar programming tips and tricks

Let's get coding!

Moderator: a31chris

Locked
User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

Jaguar programming tips and tricks

Post by a31chris » Tue Feb 18, 2014 4:57 am

Some Jaguar programming tips and tricks discovered through the years. So that these things are easy to find and never lost I am creating this sticky where I will gather them as I find them or as they are pointed out.

This first one may be the solution to the quoted problem:
While the 68k has the bus, the DSP and GPU can apparently run their own code, but they can not touch any of the other hardware, even on the same chip. (For instance, if the GPU wants to talk to the line buffers or even the video registers, even though they are on the same chip, it needs to acquire the bus --
And here we go...
Kskunk wrote:
Tursi wrote:Even though they have higher bus priority and can steal the bus away from the 68k, they still lose a cycle doing so and it's noticeable in throughput.
It's usually several cycles. You can't interrupt a 68K bus cycle in progress. (This is a 68K limitation.) The 68K bus cycle takes 8 Jag RISC cycles. So depending on your luck you could wait anywhere from 1 to 8 cycles. It's almost inevitable that the 68K has changed the current DRAM page, which makes it 6 to 13 cycles round trip.

This works out fine for simple 2D games, since the OP wants to lock up the bus while it works (the overhead is minimal) and if you're using the blitter, it's probably being activated at the end of a 68K bus cycle.

But once you're using the GPU and DSP, the 68K is a pretty bad idea. The DSP is also pretty difficult to use for all the same reasons. It locks up the bus for 6 cycles on reads, 12 for writes (because of the workaround for the write bug). It's better than the 68K because it can run code out of its SRAM, but it's often hard to find algorithms that do a ton of computation and almost no I/O. That's one reason the DSP is usually idle except for some sound synthesis.
Tursi wrote:While the 68k has the bus, the DSP and GPU can apparently run their own code, but they can not touch any of the other hardware, even on the same chip. (For instance, if the GPU wants to talk to the line buffers or even the video registers, even though they are on the same chip, it needs to acquire the bus -- at least according to my experimentation. Someone who can read the netlists may be able to confirm or deny that more accurately.)
Who do you know that can read netlists?

Both the GPU and DSP have a dedicated 'local bus' that can be accessed without using the main bus. On the GPU, the local bus contains SRAM, GPU control regs (for interrupts, divider, matrix, etc), the blitter, and line buffer writes (everything mapped from F02-F0F). This helps it set up polygons when scan converting without needing the bus.

The line buffer feature only works on 32-bit writes (no reads or 16-bit access), but apparently does not disturb the main bus. I'm far away from my Jaguar right now so I can't test if this really works. The netlists imply it was designed to work this way. This feature might be useful for some kind of special effect but I'm not creative enough to think of any.

On the DSP, the local bus contains SRAM, DSP control regs, math table ROM and some DSP peripherals (everything mapped from F12-F1F). This means the DSP is able to do CD access (at least the serial kind) and audio playback without touching the main bus. Off-chip stuff like joystick access obviously requires external bus cycles, but so do the UART and timers.

- KS
Emboldened areas added by moderator

This appears to not be so much something new as it is something perhaps overlooked by the above two having the conversation. From page 36 of the v8 Jaguar Tech ref manual:
To the GPU programmer the local RAM, local hardware registers, and external memory all appear in the same
address space. The GPU memory controller determines whether a transfer is local or external, and generates
the appropriate cycle. The only difference to the programmer is that only 32-bit transfers are possible within the
GPU local address space, whereas 8, 16, 32 or 64-bit transfers are permitted externally.
The local RAM sits on an internal GPU 32-bit bus. Also present on this bus are various GPU control registers,
and the Blitter control registers. When a GPU transfer occurs outside the local address space, a gateway
connects the local bus to the main bus. If a sixty-four bit transfer is requested, a special register is used for the
other half of the data.
The address space is organised as follows:
F02000 - F021FF graphics processor control registers
F02200 - F022FF Blitter registers
F02300 - F02FFF reserved
F03000 - F03FFF local RAM
F04000 - F0FFFF reserved
This local address space is also available to external devices via the I/O mechanism.
The GPU local bus can therefore perform transfers for three quite separate mechanisms. These are, in
decreasing order of priority:
- CPU I/O access
- Operand data transfer
- Instruction fetch
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

Display List Tricks

Post by a31chris » Tue Feb 18, 2014 5:00 am

Here is the programmer of Super Burnout Olliver Nallet's recounting of a Display List trick he used to get an incredible number of sprites on screen at once:
Because there was only 4 KB of memory on the GPU, I was hot-swapping portions of the assembly code, modules by modules, trying to reuse as much as possible the cached memory (something that is also done commonly on PS3 SPUs). Everything worked well together and I'm pretty sure I still had CPU cycles left even with more than 1000 sprites on the screen. By the way, I was displaying the 1000+ sprites on Jag with a trick on the display-list. The Jag was a killer in 2D (ok, at that time , but the only downside was that if the display list contained too many sprites, it actually ate bandwidth on the sprites to display,
creating sometime this wobbling effect on some line (due to the fact the jag didn't have screen buffer but was diplaying everything on the fly, nice design to save expensive memory
but with some constraints.
So the way I resolved that was to actually use the branches of the display list, by having 3 levels of branches I was actually able to split the screen in 8 horizontal bands of 30 pixels or so.
Then I just had to fill each of the sub-display lists separatly, thus I just had 125+ sprites per display list, but 1000+ total displayed on the screen. That way each horizontal line had almost full bandwidth to display the sprite. The nightware was on the GPU side though, every single sprite had to be split in pieces to be placed on the proper horizontal band with the propoer offset initialized. Sometime sprites like bosses could be split over the whole 8 bands. Everything was displayed with that, even the multiple level of tiles for the background.
Here is a link to the original post and thread courtesy of AtariAge.

http://atariage.com/forums/topic/111887 ... try1399401
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

GPU in Main bug workaround rules

Post by a31chris » Tue Feb 18, 2014 9:37 am

Discovered or perhaps rediscovered around 2006 by AtariOwl and Gorf here are the rules for working around the bug that stops the GPU from successfully running code out of main memory:

RISC in Main RAM rules:

Here they are in a nutshell.

Definition:
Page is one block of 256 bytes.

All JUMP Instructions must sit on an address ending in 0,4,8 or C hex

All JUMP Instructions must jump to an address to an external page on 0,4,8 or C Hex

All JUMP Instructions must jump to an address with in a page on 2,6,A or E Hex

The JR instruction can sit any where

The JR instruction follows the same destination rules as jump.

all JUMP or JR instructions must be followed by two NOP's but certain instructions can be used in place of the first NOP.

Stay away from tight loops out in main ram.

Since Owl already revealed it, I'll post once again main to local and local to main.

JUMP instructions only.....must sit on an address ending in 0 or 8 hex...to or from local to main and main to local.

Though they may not be needed to be done by hand anymore because SMAC(SubQMods Macro Assembler) is suppose to have some macro functionality to handle these rules automatically if you want to use them,(see instructions included with SMAC)

Further reading:
http://atariowlproject.blogspot.com/sea ... 0in%20Main

This workaround does not appear to have been known by HVS but McGroarty talks about something like this in a slashdot response to Carmack:
Subject: Ah, the Jaguar... (Score:1)
Author: bvmcg
Date: Sunday March 05, @08:07AM EST
Forum: Slashdot com

>The little RISC engines were decent processors. [...] the only
>thing truly wrong with them was that they had scratchpad memory
>instead of caches, and couldn't execute code from main memory.
>I had to chunk the DOOM renderer into nine sequentially loaded
>overlays to get it working (with hindsight, I would have done it
>differently in about three...)

Actually, you could execute code out of main memory. You merely had to be
careful about crossing page boundaries because the instruction pointer
wouldn't update properly. I'd say the biggest problem with the processors was
Atari & Brainstorm's documentation. =)

We manually paged pieces in for NBA Jam, White Men Can't Jump and Ruiner
Pinball. Vid Grid sat entirely in one chunk on either RISC with the 68000 just
facilitating major modes. (and you thought 64k games were gone!)

For Dactyl Joust, we were using an automatic memory paging system which was
started with Ruiner. This worked by augmenting function calls to load in each
function in 256-byte chunks, as many as needed, and doing address fixups.
Rarely-called support routines remained in main store, specially tagged to
avoid being loaded in. (See above re: running from main RAM and crossing page
boundaries. The addresses had to be guaranteed by creating a million sections
in the link file. Can you say link file nightmare?) In the end though, C and
eventually C++ use became pretty invisible (read easy and efficient) even on
the GPU RISC processor.

Going back and looking at Jaguar code again when I did Tempest/X3 for
Playstation was a total trip. Even just a couple years later, I'd forgotten
how fun/weird/ugly that beastie was. I honestly miss it though. I really do.
For all its quirks (especially because of its quirks!) it was a great little
box.
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

Re: Jaguar programming tips and tricks

Post by a31chris » Fri Feb 28, 2014 8:56 pm

Parallelism on the Jaguar.
Gorf wrote:if you run the GPU in the local only while the DSP runs in its local only and you run the 68k , you can run all three processors in parallel, but then you kill the band width of the bus with the 68k running. if you do not run the 68k and let the blitter and the OPL use the main bus while the other two run in local(GPU/DSP) you will get some pretty good results....however if you run the GPU out in main using some careful and properly thought out interrupt processing so that when the bliter and the OPL are running the GPU runs only from the local at the time( this would be probably when you are drawing the current frame and then setting up the next frame using the blitter to move the next amount of frame info into the GPU local) you should be able to pull off some amazing efficient processing. while the Blitter and the OPL are being set up for the next frame, run the GPU out in main for AI and game logic.

As far as bench marks, there are none for this particular method that I know of other than a few tests which have shown that in a tight loop the GPU out in main is slower than the 68k...however with an unrolled loop the GPU can achive about the same performance in main as in local.

It's all a balancing act. This of course is using the 68k only for setup of the system initially and for setting up a new game level and then killing it.

The 68k should not be used at all in the main loop and should only be used in between game processing for new levels or such. Any use otherwise will only result in hammering the bus and reducing it's bandwidth to one quarter and it's speed to half. a serious blow to the Jaguar's performance and the main reason why most of the games had really bad frame rate performance. Using the 68k at all, even for just a few instructions puts a serious dampning on the Jaguar's performance.
Kskunk wrote:I went back to the hardware and found a new texturing hack: In parallel, the blitter can generate addresses while the GPU reorders memory access to exploit page locality. It's faster per-pixel, but you lose so much GPU that small polygons suck. See, hard to go one post without tripping over new Jaguar hacks! If only Carmack had known blah blah 640x480...
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

The Blitter Trick

Post by a31chris » Mon Apr 21, 2014 8:02 pm

There is a rumor out there of a 'Blitter Hack' that Scatologiic found when developing Battlesphere that improves the Blitters performance. From the BS development diary:
Latest cut of Battlesphere™ is running just fine. Framerate is indeed up, thanks to the special hardware 'hack' devised by Scott and Myself. Nobody has thought of this little ditty before... it's too COOL! For what it's worth, this little trick would have easily made DOOM a 320x240 game at 20-30fps all the time.. This game is running so smooth now. Things are shaping up nice...

... in the 25-30fps just about all the time. Sure, flood the screen with ships, debris, explosions, and shots and we're down to 15 or so, but man does this thing haul... Heh heh, no one's gonna figure out the little magic trick it took to make that one happen.... Reminds me of the olden days of 800 programming where there things you could make the hardware do that the designers never dreamed of. This is so cool.

Framerate is stilll very high. We run constantly over 20FPS, usually between 30-60fps, depending on the amount of action onscreen. Our little Blitter Trick™ has insured that even with lots of explosions going off at once, the framerate is really high. We're quite proud of this little 'hack' we came up with. It really works!! Not that we were anything but screaming fast before... the load management going on between the processors by our custom engines is no slouch. It's also 'generic' enough that we'll re-use most of it for our next Jag title.
Here is some other clues in another interview by Scatologic that the blitter 'trick' improved polygon performance:
Scatologic wrote:Our polygon engine uses the blitter in some strange ways that make it about the fastest rendering engine anyone ever wrote. Heck, we beat on the Jaguar so hard that we had to put breaks in the screen video objects to give the DSP cycles to play the audio.
More discussion on AA:

http://atariage.com/forums/topic/198087 ... ter-trick/

But at any rate this hack is most likely out there waiting to be found by someone else.
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

The Graphics Programming Black Book/3D anmation

Post by a31chris » Wed Apr 23, 2014 11:06 pm

The Graphics Programming Black Book. Written by John Carmack's friend and part of the original Quake development team.

http://www.jagregory.com/abrash-black-book/

This book is touted by Carmack himself and recommended by Thunderbird, Oppressor and Tursi lion as a must read for any serious Jag coder.
Tursi mentioned that it talks about the Quake engine and how it uses a linebuffer and he hypothesized it could be adapted for the Jaguar.

Scott Corley's guidelines to a 3D animation engine

http://www.gamasutra.com/view/feature/1 ... engine.php
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

Some more BLITTER tips

Post by a31chris » Mon Nov 17, 2014 8:54 pm

Kskunk wrote:There's another Jaguar video mode I found a long time ago while poking around with BJL. In the Jaguar docs there are some "gaps" where registers "should" be, such as between F00054 (HEQ) and F00058 (BG). If you set bit 2 in the undocumented register F00056, you get "black and white CRY" mode.

In this mode, C chooses a grayscale shade from 0-255 and I shades that intensity. This mode is well-supported by the blitter (using TOPNEN) and can produce a few interesting shading effects not possible in normal CRY mode. The downside is, obviously, no color.
Tursi wrote: I *did* find an unexpected [Blitter] combination that made a small performance boost, improving my previous best score - previously I was copying in phrase mode using 16-bit pixels, just because I really am using 16-bit color. I changed it to 32-bit pixels, and saw an improvement of about 20 pixels per frame (I'm not certain why the pixel depth should even matter in phrase mode...)
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

General tips

Post by a31chris » Wed Nov 19, 2014 8:48 pm

Gunstar wrote:I have to admit that I really like SuperCross3D too. Too bad about the terrible frame-rate, not so bad in practice mode or when your way ahead of the pack though...but that should have been the slowest speed, not the fastest (as far as overall framerate goes). I often think when I'm playing it that it seems very rushed and unfinished, but if they had just dropped some things, in this case less would have been better. For example, since the framerate sucks, they should have dropped the whole "Arena Screen" thing in the background, I'm sure that's eating up hords of processor time that the framerate could have used.
Thunderbird replied and wrote:Actually, the "Arena Screen" trick is pretty simple, and doesn't take up any extra CPU cycles (mostly). You just use the visible frame buffer as the source for the pixels on the frame currently being rendered. It's an old trick. It's not like a separate screen has to be rendered for that screen. If it were a rearview mirror or something, THEN it would have a different view which would require it's own rendering.
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

User avatar
a31chris
Jaguar MOD
Posts: 841
Joined: Mon Apr 01, 2013 7:09 am

More BLITTER hints, rumors and allegations

Post by a31chris » Sat Mar 21, 2015 9:17 pm

The conventional wisdom is the Blitter cannot use the system bus very well to facilitate texture mapping. Another rumor is that a workaround has been found for this as well:
Gorf wrote: The texture mapping is the fault of the blitter, not the bus. the blitter had bugs and this was one that only
allowed PIXEL mode texturing....even this has a work around now, so this is not even true anymore.

We know the blitter has full speed access to the ram in the GPU..yes, its way too little but its there.
The fact is the blitter not being able to read more than a pixel, not that the bus could not handle it.
This is no longer true anyway. Like every console new discoveries and workarounds come along
and prove otherwise. We have discoverd such workarounds.

What they knew then is very different to what WE know now.
http://atariage.com/forums/topic/110830 ... ga-saturn/
The Iron Soldier guys discovered a ‘hack’ which allowed the texture palette to be a texture source, doubling the speed of texture mapping for small textures
source

So those are out there as well as a possibility...
What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Locked