quake Engine on the Jaguar (Scrummy posts)

a31chris · Post by **a31chris** » Sat Apr 19, 2014 9:00 pm

I was looking for Scrummy posts on AA. I don't know if you guys remember him but he would just post some bizarre and hysterical posts back in the day on AA.

While I was referencing his name I found this interesting thread:

http://atariage.com/forums/topic/43179- ... ke-engine/

and here is the Scrummy post inside the thread that the search found. So funny:

The Jaguar main testicle could easily handle the frame rate with texture mapping, but there is a problem with the memory, particularly if we were to use the CD. The whole thing was not very well thought out. Here you had this really fat pipe, and at the end of it, your pond was kinda small.

Friggin awesomely funny.

But on further note whatever happened to Downix? Around 2004 he did not have access to the Jaguar 'netlistst' that let it read the architecture of how it was built. When the thread was necrobumped (thank you, thank you very much) 4 years later he had some interesting things to say.

Such as the Jaguars memory can be expanded up to 8mb RAM via cartridge.

And from what I can tell the Jaguar may not be lacking in horsepower to do the Quake engine but its non-existent video memory presents a problem. His comparisons of the Jaguar to the Nintendo DS and the Ngage are interesting.

Whatever happened to Downix? He just disappeared.

a31chris · Post by **a31chris** » Sat Apr 19, 2014 9:03 pm

Things I've been thinking about. In the thread they talk about the Jaguar needing 1mb more of RAM. vid ram? The PSX does Quake II with 2mb of main ram and 1mb vid ram or roundabouts so must be vid ram needed.

MikeFulton · Post by **MikeFulton** » Sat Jun 21, 2014 7:50 pm

a31chris wrote:I was looking for Scrummy posts on AA. I don't know if you guys remember him but he would just post some bizarre and hysterical posts back in the day on AA.

While I was referencing his name I found this interesting thread:

http://atariage.com/forums/topic/43179- ... ke-engine/

and here is the Scrummy post inside the thread that the search found. So funny:

The Jaguar main testicle could easily handle the frame rate with texture mapping, but there is a problem with the memory, particularly if we were to use the CD. The whole thing was not very well thought out. Here you had this really fat pipe, and at the end of it, your pond was kinda small.
Friggin awesomely funny.

But on further note whatever happened to Downix? Around 2004 he did not have access to the Jaguar 'netlistst' that let it read the architecture of how it was built. When the thread was necrobumped (thank you, thank you very much) 4 years later he had some interesting things to say.

Such as the Jaguars memory can be expanded up to 8mb RAM via cartridge.

And from what I can tell the Jaguar may not be lacking in horsepower to do the Quake engine but its non-existent video memory presents a problem. His comparisons of the Jaguar to the Nintendo DS and the Ngage are interesting.

Whatever happened to Downix? He just disappeared.

The "small pond" idea was likely not referring to video memory, per se. It was probably referring to the built-in RAM on the GPU/DSP, which would be used to hold textures for texture-mapping.

The Jaguar didn't have dedicated video memory. The display processor used main memory for the frame buffers. And "frame buffers" is an idea that requires explanation for the Jaguar context. The Jaguar video processor didn't necessarily have to use a single full-screen frame buffer. Your program maintained a display list of raster items that could be arbitrarily scaled and positioned, and could be overlapped. This couldn't be done willy-nilly, as it used up a lot of memory bandwidth when you had multiple overlapping rasters, but you could do some cool stuff with it.

Many games, if not most, did use a single, full-screen frame buffer.

I dunno what the reference to using the CD is all about, unless it's just a "not enough memory for everything" issue.

MikeFulton · Post by **MikeFulton** » Sat Jun 21, 2014 8:48 pm

a31chris wrote: While I was referencing his name I found this interesting thread:

http://atariage.com/forums/topic/43179- ... ke-engine/

Interesting thing... that thread says the PSX has dedicated 3D hardware but the Jaguar didn't. That's actually not really true. The PSX does not actually have what most people would consider to be "dedicated 3D hardware". At least, not in a way that would distinguish it from Jaguar.

The main processor in the PSX is a MIPS R3000, which a basic, general purpose RISC processor otherwise known for being used in early Silicon Graphics workstations. Sony added a co-processor chip (the "GTE") that implements matrix math functions. This is used for doing your basic 3D graphics transformations on polygon vertices but has nothing to do with the actual pixel pushing. By comparison, the Jaguar GPU has similar matrix math instructions, so the two machines are on fairly even ground at that point.

The rendering loop of a PSX game is basically this:

* Do your basic game-play calculations (R3000)
* Build a list of the polygons needed for the next frame (R3000)
* Do your 3D transforms for the the polygon vertices for the next frame (R3000/GTE)
* SUBDIVIDE POLYGONS AS NEEDED (R3000/GTE)
* Create the render object list. (This step might be integrated into the previous 3 in some cases) (R3000)
* Wait for the previous frame to finish rendering, if needed, then wait for the next vertical blank
* Pass the render object list to the GPU for rendering the next frame. (GPU asynchronous operation)

Once you pass the render object list to the GPU, the actual polygon rendering process is strictly 2D, using only X-Y values for each vertex, no Z (depth) value, and a uniform texture scale factor for the entire polygon. This is why PSX games, particularly early ones, sometimes have texture warping. This occurs when there are large polygons with a significant difference in the Z values for each vertex. This most commonly happens on walls or ground objects that extend into the frame from the edge of the screen. The way around this is to subdivide polygons with Z-values that are "close" to the viewpoint so that the texture for each can be scaled more appropriately.

By comparison, a true 3D polygon renderer would use X-Y-Z values for each vertex and adjust the texture scale factor individually for each pixel, according to the interpolated Z value.

Compared to the Jaguar, the PSX setup is more efficient because the GPU has its own dedicated RAM and it can be building a new buffer for a frame while the R3000 and GTE are busy processing the game logic and building the render list for the next frame. In many cases, the only thing the GPU has to get from main memory while processing a frame is the list of rendering instructions. This means the rendering process doesn't have a huge impact on how fast the other stuff gets done. (There are possible exceptions beyond the scope of this post.)

On Jaguar, if you use the GPU mainly for rendering and try to do the calculations for the next frame on the 68000, the two processors end up competing for the bus constantly, slowing everything down. The question is, how much? It's very context dependent, and a lot of games did it this way.

a31chris · Post by **a31chris** » Sun Jun 22, 2014 4:52 am

I'll reprint one of Gorfs post on what his thoughts were on the Atari Polygon renderer. I think it relates to what you are saying.

Concerning the Atari Rendererer Gorf wrote:The problem with the Atari renderer that is in source form is that it is loaded with a lot of 68k code. Not only that, the renderer itself is poorly written in the fact that it not only uses 16 bit values to work with, but it also uses the matrix instuctions and that it was written to draw one model at a time as opposed to building a list of polygons to rip through, as it should. So for every model you want to draw per frame, it processes each one through all the set up....very inefficient. A good renderer would consider ALL models per frame, build a list of polies and then render them using 32 bit values and instead of the matrix multiply instructions which take WAY too much setup, it should use the MAC instructions which would better handle 32 bit values. Something the MMULT instruction can't do.

What Scott (JagMod) did to speed up the renderer is get rid of the 68k code all together, eliminate all the blits of the code to GPU local and put them in to one big chunk. The other issue with the Atari renderer is it has the GPU stopped completely while the blitter keeps loading in vertices to the RAM. What should be happening is that the verts should be out in main and keep the GPU running at all times. OR use a small portion of the local to have the blitter move the verts to WHILE the GPU is constanly processing them or at least doing something else while waiting for the blitter to load the next set.

A big problem besides the 68k running way too much is the fact that the GPU is constantly waiting in a tight loop for the blitter to finish moving the data.....a ton of wasted cycles right there. JagMod had a small sample code where the GPU never stops working and got well over 80,000 blits PER FRAME using this method. This was a first attempt too.

I maintain that we have not seen anywhere near the true ability of the Jaguar's 3D capabilities just based on these serious flaws of coding inefficiencies. Just about every game uses this piss poor method of GPU waiting and over use of the 68k.

a31chris · Post by **a31chris** » Sun Jun 22, 2014 5:51 am

They wanted to use the CD to hold all of Quake but they figured Jaguar did not had enough RAM and they proposed expanding ram via cartridge. It would be slower of course but not useless.

MikeFulton · Post by **MikeFulton** » Mon Jun 23, 2014 4:29 am

a31chris wrote:I'll reprint one of Gorfs post on what his thoughts were on the Atari Polygon renderer. I think it relates to what you are saying.

Concerning the Atari Rendererer Gorf wrote:The problem with the Atari renderer that is in source form is that it is loaded with a lot of 68k code. Not only that, the renderer itself is poorly written in the fact that it not only uses 16 bit values to work with, but it also uses the matrix instuctions and that it was written to draw one model at a time as opposed to building a list of polygons to rip through, as it should. So for every model you want to draw per frame, it processes each one through all the set up....very inefficient. A good renderer would consider ALL models per frame, build a list of polies and then render them using 32 bit values and instead of the matrix multiply instructions which take WAY too much setup, it should use the MAC instructions which would better handle 32 bit values. Something the MMULT instruction can't do.

What Scott (JagMod) did to speed up the renderer is get rid of the 68k code all together, eliminate all the blits of the code to GPU local and put them in to one big chunk. The other issue with the Atari renderer is it has the GPU stopped completely while the blitter keeps loading in vertices to the RAM. What should be happening is that the verts should be out in main and keep the GPU running at all times. OR use a small portion of the local to have the blitter move the verts to WHILE the GPU is constanly processing them or at least doing something else while waiting for the blitter to load the next set.

A big problem besides the 68k running way too much is the fact that the GPU is constantly waiting in a tight loop for the blitter to finish moving the data.....a ton of wasted cycles right there. JagMod had a small sample code where the GPU never stops working and got well over 80,000 blits PER FRAME using this method. This was a first attempt too.

I maintain that we have not seen anywhere near the true ability of the Jaguar's 3D capabilities just based on these serious flaws of coding inefficiencies. Just about every game uses this piss poor method of GPU waiting and over use of the 68k.

Assuming we're talking about the same thing, the 3D Polygon Renderer was really more of a demo / sample program than anything else. It was never intended to be a ready-to-go 3D library. (Admittedly, it's possible we referred to it as a "library" at times, but it really wasn't.) As such, it doesn't surprise me that it doesn't scale up very well.

I was all for making a real 3D library to give out, but the consensus was that developers would want to roll their own code. Keep in mind that this was in the days before console developers provided any sort of libraries to developers. Plus, that particular code was done by Eric Smith who had about 14 billion things on his plate already.

a31chris · Post by **a31chris** » Mon Jun 23, 2014 4:39 am

Its amazing you remember all these details from 20 years ago. Who did what etc etc.

SoilentGreen · Post by **SoilentGreen** » Wed Jun 25, 2014 3:11 am

It's even more amazing that the post I started in 2004 (I was Andreaus44 then, now I'm SoilentGreen) is still being talked about.

That was almost 10 years ago! Wow, time flies!

MikeFulton · Post by **MikeFulton** » Wed Jun 25, 2014 6:45 am

a31chris wrote:Its amazing you remember all these details from 20 years ago. Who did what etc etc.

I'm sure there is no shortage of stuff I've forgotten...

Walter_j64bit · Post by **Walter_j64bit** » Tue Aug 05, 2014 2:38 am

I've just seen this post, I have to agreed with Stone on this, if iD say the Jag can do Quake, I'm happy with that to bad iD didn't have the time to make the game for the Jag.

a31chris · Post by **a31chris** » Tue Aug 05, 2014 6:00 am

Walter_j64bit wrote:I've just seen this post, I have to agreed with Stone on this, if iD say the Jag can do Quake, I'm happy with that to bad iD didn't have the time to make the game for the Jag.

Where did Carmack say that?

Walter_j64bit · Post by **Walter_j64bit** » Tue Aug 05, 2014 11:17 am

a31chris wrote:
Walter_j64bit wrote:I've just seen this post, I have to agreed with Stone on this, if iD say the Jag can do Quake, I'm happy with that to bad iD didn't have the time to make the game for the Jag.
Where did Carmack say that?

I've not seen that but it must be on the net. Cyberroach has it listed. Say maybe we ask John Carmack.
http://cyberroach.com/

a31chris · Post by **a31chris** » Wed Oct 22, 2014 7:09 pm

Well maybe Carmack didn't say it but it seems doable. Here is Douglas Little's work converting the Quake II ENGINE to a STOCK Falcon030. He is not porting it. He is completely rebuilding it for the Falcon. And for now its an EXPERIMENT to see how far he can go. He does not know if he can get the whole game in.

https://www.youtube.com/channel/UCTB3TZ ... vpEobZzMEw

It's amazing what can be done if we just lose the love affair with the whole MISSION IMPOSSIBRU! thing and TRY. And encourage others instead of trying to discourage them.

dml wrote: I agree with anyone who has given the opinion that it would be *hard* to implement a Quake based engine on the Jag. That is certainly true. I just suggest that it is doable, that past Jag games (especially from that time!) are not proof of hard limits - and that if JC was curious to try it at the time I'm sure he would have got results.

a31chris · Post by **a31chris** » Wed Oct 22, 2014 8:18 pm

dml's work is so interesting I recommend everyone read it who has the least bit of interest in technical stuff. Here he talks about replacing the need for floating point texturing with something that doesn't require floating point:

dml on Mon Aug 11, 2014 7:32 pm wrote: I figured something out this evening which allows per-pixel perspective correct texture mapping... using no divides, and needing only a tiny amount of RAM.

This raises the probability of Quake-style texturing on a plain F030 from 'no chance' to 'technically feasible'. Doesn't mean it can be done fast enough to be used in the Q2 engine but it's much, much closer to the target.

pctm.png

Experiment conducted on PC version using floats. I have still to solve some problems with a fixedpoint implementation. Not insignificant problems, but hopefully not as hard as figuring out the technique in the first place.

It is not the same as the other strange method I came up with for the first version of BadMood, but there was always that to fall back on - an improved, modified interpretation of it anyway.

The challenge is to make it work with fixedpoint. If that can be done then it opens up some new opportunities the old F030.

dml on Tue Aug 12, 2014 8:00 pm wrote:
I have found a second way to do this, which is even cheaper in some cases but not nearly so accurate. In fact it is so cheap that it might also work on a plain 68000. It's based on a mathematical technique I used to compress audio samples in a game long ago.

The first method though will be better on DSP because the cost difference isn't much and with some changes it can be shown that accuracy is practically as good as the original divide method, when used per pixel. I also now think it will convert to fixedpoint. I think maybe it is now a matter of effort - at least as sure as I can be without having done it.

Rethink EVERYTHING we think we know:Original thread

txg/mnx · Post by **txg/mnx** » Thu Oct 23, 2014 7:37 am

About memory, I recently opened a Jaguar to see what rams that are used, it seems according to the Jaguar schematic a second bank of memory could be added to bring the internal Jaguar memory to 2MB. I did track down the same memory chips but didn't order them yet. But it's on my todo list. The upgrade looks pretty simple the only thing is that the Jaguar uses SOJ chips, this requires a little better soldering skills. But still must be doable.
When it works, I will check if a simple upgrade board can be done, so more people can upgrade there Jaguar. To keep a 100% compatible jaguar the extra RAM must also be enabled/disabled by a switch because it could give issues on programs not knowing that there is extra 2MB ram. Should not be an issue but in rare cases maybe.
a 4MB jag would be much more powerfull with graphix I think.

a31chris · Post by **a31chris** » Thu Oct 23, 2014 4:41 pm

What are you talking about? The Jaguars already has 2mb internal ram. You mean to bring it up to 4mb?

MikeFulton · Post by **MikeFulton** » Thu Oct 23, 2014 10:21 pm

a31chris wrote:Well maybe Carmack didn't say it but it seems doable. Here is Douglas Little's work converting the Quake II ENGINE to a STOCK Falcon030. He is not porting it. He is completely rebuilding it for the Falcon. And for now its an EXPERIMENT to see how far he can go. He does not know if he can get the whole game in.

https://www.youtube.com/channel/UCTB3TZ ... vpEobZzMEw

This makes me wonder what the real performance bottleneck for Quake on Falcon030 might be. I haven't had time to do more than briefly preview Doug Little's videos.

There are basically three parts to the game loop...

1) Game logic
2) scene preparation
3) scene rendering

I presume the division of labor would have the 68030 doing stages 1 and 3 while stage 2 would be handled, at least in part, by the DSP.

Regarding the game logic, even if the rendering is taken out of the equation, I have to wonder what sort of frame rate you could get out of the Falcon030's 16mhz processor, particularly in the slowed-down 16-bit graphics mode.

With regards to scene preparation, I would presume one would have the 68030 queue up vertex values that need the 3D math stuff and then hand that off to the DSP. Then the DSP would spit back a transformed list of polygon vertices ready for the rendering stage. What does the 68030 do while it's waiting for the DSP? In order to minimize idle time, one might want to have the 68030 cranking away on the game logic for the next frame while the DSP is cooking stuff to be rendered.

As far as rendering goes, the first question I have is, does your OpenGL (-ish) library for Falcon030 do real 3D polygon rendering, with the texture scaling factor interpolated between vertices? If so, then the Falcon030's blitter is pretty much a paperweight because each pixel will need to be output individually. Because of the setup required for each blitter operation, there's a certain minimum number of pixels you have to be pushing at one time before it becomes cheaper to use it instead of just accessing the frame buffer directly.

If you used a constant texture scale factor per polygon (like PS1), using the blitter is more practical, but then you realistically have to spend more CPU time doing polygon subdivision.

a31chris · Post by **a31chris** » Fri Oct 24, 2014 12:32 am

In the next post I link to the original thread if you ever get time to read it and are interested. I'll re-link it here:

http://www.atari-forum.com/viewtopic.php?f=68&t=26775

a31chris · Post by **a31chris** » Fri Oct 24, 2014 4:02 pm

Posting by proxy for Douglas Little

Hi Mike,

This makes me wonder what the real performance bottleneck for Quake on Falcon030 might be. I haven't had time to do more than briefly preview Doug Little's videos.

The main bottleneck - the area causing most difficulty in the current version - is the sheer number of edges in the scene coupled with the 'lumpy' PVS which switches on big chunks of geometry as you move around. The Quake engine needs to process/classify all edges yielded by the PVS (first against frustum planes, hierarchically then against the viewport, individually) and that involves a lot of organizing/reindexing and transferring global edge and vertex information from system RAM to a local, packed representation on DSP in small batches, There isn't enough DSP ram to hold it all at once so it needs batched <=256 faces at a time.

There are basically three parts to the game loop...

1) Game logic
2) scene preparation
3) scene rendering

So far this codebase just draws maps - but is also processing events for game objects and collision detection for the player. It doesn't support enemies just now and doesn't draw any of the dynamic objects (pickups, doors etc).

I presume the division of labor would have the 68030 doing stages 1 and 3 while stage 2 would be handled, at least in part, by the DSP.

This would probably be a nice scenario for decent concurrency but the Falcon is not so well equipped for it. The DSP has 32kwords of local ram (really, 16kwords of paired/wide memory) and much of that is used to generate/collect global scene information (surfaces, clipped edges, drawing spans) before the drawing phase. It can't access system RAM itself. There is a narrow port to get stuff between the CPU and DSP. A small amount of local ram is used for a processing buffer for incoming geometry (in batches). But the 32MHz DSP is fast with its local memory, at 2 clocks for most operations including mul.

The DSP is needed to accelerate most stages in the system, and is fully responsible for the geometry pipeline after the PVS & BSP stage, up to the production of spans for drawing. Some long-duration concurrency can be found during stages such as scan-conversion into spans, but most of the concurrency is short duration and cooperative with the CPU.

This does present some challenges for making it run quickly.

Regarding the game logic, even if the rendering is taken out of the equation, I have to wonder what sort of frame rate you could get out of the Falcon030's 16mhz processor, particularly in the slowed-down 16-bit graphics mode.

I had exactly this problem when I bolted my own Doom 3D engine (for Falcon) onto Id's Doom game code. The game code took as long as the rendering. In some cases - at least briefly - the game code would actually take longer than the rendering. Hardly any of the code would fit in the tiny CPU cache, and thrashed badly from the main bus.

Correcting that was painful - it involved changing the collision detection system (implementing a DSP based BSP raycast among many other changes) and the game object tick management system and base rate, and it was a lot of hassle. In the end though it was possible to push the game code costs into the background and the game ended up playable.

The same problems would occur in Quake2, plus a few new ones (edge density of enemy objects, ZBuffering those faces or inserting them into the spanbuffer, cost of full collision detection for moving map objects). For this reason I'm doubtful the Quake2 singleplayer mode will be practical to run in its original form. But I'm still interested in drawing the worlds in any case, and I think drawing 2 players plus map objects is doable, for example.

With regards to scene preparation, I would presume one would have the 68030 queue up vertex values that need the 3D math stuff and then hand that off to the DSP. Then the DSP would spit back a transformed list of polygon vertices ready for the rendering stage. What does the 68030 do while it's waiting for the DSP? In order to minimize idle time, one might want to have the 68030 cranking away on the game logic for the next frame while the DSP is cooking stuff to be rendered.

Almost, yes. The DSP is also responsible for a later stage - the spanbuffer, which sorts and clips polygons against each other, and issues non-overlapping spans back to the CPU for drawing (as part of a texture chain),

This would let the CPU perform drawing while the DSP does something else, so additional concurrency could be set up, but this doesn't work for texturing because the DSP is again needed to generate texture uvs for each pixel. The CPU could do this on its own for affine texturing but not for perspective-correct texturing.

As far as rendering goes, the first question I have is, does your OpenGL (-ish) library for Falcon030 do real 3D polygon rendering, with the texture scaling factor interpolated between vertices?

The Falcon version currently only fills flat surfaces, mainly because I'm still optimizing the upper and middle-lower levels of the 3D pipe. Most recently optimizing the spanbuffer scan conversion pass.

However, yes. I have prototype code in the PC version of the same engine which implements z-correction with the integer unit only (24bits, to make it DSP friendly), no floating point involved. It generates per-pixel perspective correct uvs.

It does not use edge subdivision, and does not use spanlets (divide-in-flight) as with the original Quake. It uses a approximation of the divide curve as the driver and avoids divide instructions at all stages, since that's way too slow for CPU and very awkward on DSP (it has an iterative divide, but not a concurrent one - need to divide 24 times for a 24bit result!).

If so, then the Falcon030's blitter is pretty much a paperweight because each pixel will need to be output individually.

For this engine, the Falcon's blitter is a paperweight, yes. It is far less sophisticated than the Jaguar blitter - it is a two-source, integer-addressing block transfer device with some logic operations and bitplane scrolling support. Aside a few interesting quirks and tricks which have been found to work with it, the blitter is really not capable of 3D work, except where it is needed to fill flat colour or copy contiguous data in rows or rectangles.

The Jaguar blitter is a lot faster, wider bus, has fixedpoint addressing IIRC and I'm sure I had it affine-texturing cubes back in the day as one of my early tests with the devkit. (I think I recently found the COF file for that and gave it to the Jaguar emulator guy to help improve emulation).

Because of the setup required for each blitter operation, there's a certain minimum number of pixels you have to be pushing at one time before it becomes cheaper to use it instead of just accessing the frame buffer directly.

Yes, and that is even true when filling flat colour on the Falcon using the blitter. I do use it for that in the current version, but the gain is small over the CPU, and can go slightly negative if the scene is dense enough with many tall thin surfaces.

For texturing the CPU+DSP need to be coupled, with the CPU performing the system ram pixel transfers from texture to framebuffer, and the DSP generating texture uvs concurrently, slightly ahead of each plot. This will certainly be slower than filling flat colour, but the time is constant. The resolution will be dropped from 320x200/160 down to 160x120 most probably - chunky columns, and the spanbuffer rotated 90' to minimize edge crossings in the scene and save a bit more time.

If you used a constant texture scale factor per polygon (like PS1), using the blitter is more practical, but then you realistically have to spend more CPU time doing polygon subdivision.

Yes that's right. In this case it will be correcting for each pixel (or can do, but we'll see if distributing across N pixels helps too) so there won't be a need to generate additional edges. However I had some fun with subdivision in the past, on the same problem.

Back in '97 I attempted a Quake (1) port which tried to adaptively subdivide the spans using the DSP, by testing the z-correction error at the span midpoint and splitting the span length. This worked, but the result was visibly very wobbly and nasty to watch, because the eye was not expecting the correction error to move around, even if it was kept fairly small. I think in the end, to stop it wobbling, it was more efficient to just leave it as it was, with fixed intervals for the divides

Thanks for the questions on the project!

Doug

a31chris · Post by **a31chris** » Sat Oct 25, 2014 2:47 am

DMLs newest video just put up: Quake II revised spanbuffer on the Falcon030

https://www.youtube.com/watch?v=RFbTfX770Lg

Very cool. Almost the entire q2dm1 map. We would play the hell out of this map on Quake 1 vs each other with Rocket Arena free for all.

Damn I gotta install Quake 1 again and get all the maps. I miss RA.

a31chris · Post by **a31chris** » Sun Oct 26, 2014 7:55 pm

Here is where dml has prototyped a technique on the PC for fixed point perspective correct texturing on the Quake engine. Very cool to watch.

https://www.youtube.com/watch?v=yocH61FLKFY

a31chris · Post by **a31chris** » Sun Oct 26, 2014 10:55 pm

MikeFulton wrote:The Jaguar didn't have dedicated video memory. The display processor used main memory for the frame buffers. And "frame buffers" is an idea that requires explanation for the Jaguar context. The Jaguar video processor didn't necessarily have to use a single full-screen frame buffer. Your program maintained a display list of raster items that could be arbitrarily scaled and positioned, and could be overlapped. This couldn't be done willy-nilly, as it used up a lot of memory bandwidth when you had multiple overlapping rasters, but you could do some cool stuff with it.

The Jaguar does have a hardware line buffer though doesn't it? It can hold one line of a frame rather than a whole frame in hardware?

Tursi was mentioning that Michael Abrash's Graphics programmers Black Book talked about running the Quake rendering engine with a linebuffer. He hypothesized that this could be adapted to the Jaguar.

http://3do.cdinteractive.co.uk/viewtopi ... 926#p36926

MikeFulton · Post by **MikeFulton** » Tue Oct 28, 2014 9:28 am

a31chris wrote:
MikeFulton wrote:The Jaguar didn't have dedicated video memory. The display processor used main memory for the frame buffers. And "frame buffers" is an idea that requires explanation for the Jaguar context. The Jaguar video processor didn't necessarily have to use a single full-screen frame buffer. Your program maintained a display list of raster items that could be arbitrarily scaled and positioned, and could be overlapped. This couldn't be done willy-nilly, as it used up a lot of memory bandwidth when you had multiple overlapping rasters, but you could do some cool stuff with it.
The Jaguar does have a hardware line buffer though doesn't it? It can hold one line of a frame rather than a whole frame in hardware?

Tursi was mentioning that Michael Abrash's Graphics programmers Black Book talked about running the Quake rendering engine with a linebuffer. He hypothesized that this could be adapted to the Jaguar.

http://3do.cdinteractive.co.uk/viewtopi ... 926#p36926

The line buffer you're referring to is where the object processor would output the result of the object compositing that it did, line by line. It quite literally represents the scanline being output. It has to be updated during each horizontal blank period.

I know I've read Abrash's book but it's been quite awhile. I dunno if what he was talking about was something that would apply to the Jaguar... I tend to doubt it.

a31chris · Post by **a31chris** » Wed Oct 29, 2014 8:45 am

Thanks Mike! Perhaps he was talking about Line Buffer in general on the Jaguar, like using some of the 4k local and not that specific one on the Jaguar. I dunno. Hard to understand you geniuses sometimes.

Some updates from dml's thread on his Q2 engine:

https://www.youtube.com/watch?v=Vp7CHxY9H2I

Have fixed an annoying bug which caused lots of maps to hang - there was a limit of 15 edges per face, and some maps were expecting 16-20.

It's now possible to run the huge q2dm8 map - 'warehouse'. Not exactly fast, but it runs

https://www.youtube.com/watch?v=kILvl6Kzj18

In the time the original Atari-Forum thread has started a few months ago this guy has moved fast. Really fast. He really knows his stuff.

Of course Atari didn't hire this guy either when they had the chance. They went with the Phaze Zero crew who took the money Atari gave them and ran over to Sega or wherever. sigh.

a31chris · Post by **a31chris** » Wed Oct 29, 2014 7:20 pm

Textures with Vram. The PSX has like 1mb and the Jaguar has zilch. How Vram works is a little foggy to me for textures. Are the textures held in system ram on the psx and then shot to the Vram where the 1mb helps enable it to display more colors and more textures at once? Or are these textures held in one part of vram then displayed on the screen in another part? While we know the Jaguar is crippled at polygon texturing we know it can be done.

Gouraud shaded Quake with decal textures placed strategically might be called for on a system like the Jaguar.

MikeFulton · Post by **MikeFulton** » Wed Oct 29, 2014 8:21 pm

a31chris wrote:Textures with Vram. The PSX has like 1mb and the Jaguar has zilch. How Vram works is a little foggy to me for textures. Are the textures held in system ram on the psx and then shot to the Vram where the 1mb helps enable it to display more colors and more textures at once? Or are these textures held in one part of vram then displayed on the screen in another part? While we know the Jaguar is crippled at polygon texturing we know it can be done.

Gouraud shaded Quake with decal textures placed strategically might be called for on a system like the Jaguar.

On PSX, all textures are rendered from VRAM. The VRAM buffer is organized so that you have raster areas known as "texture pages" and your individual textures are a specific subsection of that texture page. You can have multiple texture pages setup, but there is a cost to switch from one to another, so you often need to try to do a secondary sort of your polygons by texture page.

Ideally, all of your textures for a given frame fit into VRAM at once, and you can preload them once and after that you just render polygons for each frame.

However, it's likely you have more textures that can fit into VRAM all at once. So to accommodate this, there's a GPU command for "COPY BITMAP TO/FROM MAIN MEMORY" which can be used to load new texture pages into VRAM. (It could also be used for things like background images.)

Since this bangs the main system bus and memory pretty hard, obviously, you want this to be done as little as possible, so while the usual situation was to order your polygons according to Z-value, there were some games that did a secondary ordering according to which texture page(s) needed to be in VRAM, so that it could render as many polygons as possible with one set of textures, then swap in different ones and do the next set.

a31chris · Post by **a31chris** » Thu Oct 30, 2014 10:52 pm

dml wrote:Some of the Q2 maps are extremely dense. This is partly because textures never truly 'repeat', so flat surfaces must be broken into unique tiles, creating many more vertices than would be needed for a flat-filled version. this is why the floors of big rooms seem to be made of randomly sized tiles - it is necessary in order to texture each 'world pixel' uniquely. Unique texturing is required for unique pixel lighting (lightmaps). So its really the Quake lighting that forces the polycount to be much higher than the 'geometric surface' count - which is already fairly high to give the game a decent look.

I think this was somehow sidestepped in the PSX port, but I'm not sure of the details. They did a very, very good job of that port, but a significant portion of the savings came from changing the content (e.g. the maps) to better suit the HW polygon engine in that box. They didn't need to retain anything that was designed with software rast in mind.

The Q2 tech is very close to Q1 with some algorithm optimizations, added capabilities and other improvements, the approach though is the same. The Q2 maps are however more complex than Q1 because the machine spec went up quite a lot in the years between releases.

One of the 'other' reasons I picked Q2, is simply that Q1 maps would also potentially be usable with smallish changes, and probably run faster since they were aimed at slower processors.

I am doing my best to squeeze everything out of both chips(stock Falcon030) and in this case the DSP is struggling, needing a lot of hand optimization in many places.

It is very difficult to approach performance of a processor at least 2 generations on from the 030, and probably 10x the clock rate. I am not finding it easy at all. But I don't give up easily so long as it is interesting, and there are still things waiting to be done to help the speed.

More video of Dougs progress:
https://www.youtube.com/watch?v=Tp965ZL9Uvs

While uploading the vid I had also collected some updated profiling info which looks quite promising - the optimizations are really making an impact in the right areas, so a lot of the time is being wasted now in the remaining code - stuff which still need rewritten or was otherwise expected to move up the list again. That means there's definitely room to make it faster.

In one of the slowest areas of a particularly bad map (under 4fps), converting the scene from polygons into spans takes only 11% now - that's about 1.5 VBLs.

For the same scene, filling takes only about 20% - around 3 VBLs.

So it is really input-bound now - the output side is fast enough. The remaining C code is also beginning to matter again.

Original posts:
http://www.atari-forum.com/viewtopic.ph ... 00#p260828

a31chris · Post by **a31chris** » Fri Oct 31, 2014 7:55 pm

Can you define in laymans terms what you mean by input/output/middle stage of the rendering engine?

I'm assuming 'output' is the Falcon outputting gfx to the screen.

Input is how the program is feeding 3d info to the Falcons processors?

Whats middle stages?

Thanks for your time. These videos are awesome.

dml wrote:Sure - explanation below.

By 'input stages' I mean gathering relevant visible data from the map from the BSP structure, using the PVS, camera location, camera bounding planes (frustum) and associated filtering and reindexing of faces. This is almost all CPU-side in the current version, but will be broken up soon between the two processors.

By 'middle stages' I mean transforming vertices by the camera view, classifying and clipping the face edges, generating the GET (global edge table) and AET (active edge table) activation lists which drive the rasterizer later. This is now all DSP-side, but not optimized.

By 'output stages' I mean all 2D rasterization work - scan-converting the GET into polygon spans for drawing, transmitting the spans back to the host CPU, and drawing them with the host CPU or blitter. This work is split between the two processors and mostly optimized.

a31chris · Post by **a31chris** » Thu Nov 06, 2014 6:05 pm

dml wrote:Quick update. Last night's changes seem to have made a big difference to speed. Best of all, it's now even faster on a real F030 than in Hatari. I had already seen a good speedup in Hatari but didn't try F030 until tonight.

There are still some slow areas (like the smashed wall & rubble geometry at the start of base1), and some new problems are showing up...

The FPU performance of the collision detection is beginning to show on real hardware (FPU is artificially very fast in Hatari) so the framerate suddenly drops if the player's feet touch complex geometry. Took me a while to figure out what was going on there but it makes sense now. In all other areas the F030 HW now seems to be quicker and maintains a decent framerate in most areas of many maps.

Currently the best way to avoid the FPU slowdown is to avoid maps with lots of detail in the floors (like craters and smashed up tiles etc.) this is mainly single-player maps. Multiplayer maps don't seem to be much affected - mainly because those maps were designed to stop players getting snagged in the geometry during competitions.

The collision detection needs rewritten but I don't have time for it just now. It can at least be improved though given time.

There is one very nice thing about Q2 though which makes this less of a concern for multiplayer mode:

The server doesn't need to run on the same 16MHz machine, and the server does all of the collision detection and game object management. The client only needs to draw stuff quickly and handle player input and network packets.

New video showing several maps with current improvements:

https://www.youtube.com/watch?v=amr0JBi0xdk

Original posts:

http://www.atari-forum.com/viewtopic.ph ... 25#p261100

3DO ZONE Forums

quake Engine on the Jaguar (Scrummy posts)

quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)

Re: quake Engine on the Jaguar (Scrummy posts)