3DO ZONE Forums

quake Engine on the Jaguar (Scrummy posts)
Page 2 of 2

Author:  a31chris [ Tue Nov 11, 2014 2:38 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

And here we go. Preliminary texture mapping tests:


Author:  a31chris [ Wed Nov 12, 2014 3:11 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

dml wrote:
So if anyone wants to have a go at it on the Falcon (or another retro box with a RISC chip inside), here's how the mapper works:

First you need to build a table of coefficients for quadratic equations, offline.

This involves solving and storing A,B,C terms in order to later query y=(Ax^2+Bx+C) for any (x), where (x) is essentially the scene (z) term for a given pixel, and the result (y) approximates (1/z). I'm using a table of 1024 equations but you can optimize in either direction to save space or gain range/accuracy.

It should be possible to use linear equations if the table is big enough, and perhaps save some cycles while still getting a decent(ish) approximation - but for the Falcon's DSP it's not much trouble to just do it properly.

Note that the table stores equations - triplets, not single values. This means you're performing a lookup on a set of curves - not single value samples.

Generating the table is a bit hard - it involves performing a best-fit on equations using a set of sample points on each curve. I use subdivision but random sampling may work. For most of the table entries, the same 3 points will converge to the best fit, but for entries near the ends of the table the choices will move due to clamping effects enforced on A,B,C for the legal fixedpoint range. It's important to be aware of this detail or you'll get stuck. There are a few gotchas involved in generating the table and due to the nature of best-fit algorithms, you can end up with a broken solver that looks like it is nearly working - beware.

Despite those problems, It's relatively easy to understand/test in floating point because A,B,C can be kept in their natural range. A fixed-point version however is much more difficult since the terms need to be normalized to maximize use of available bits, and for optimal precision they must be differently normalized. This part is a challenge but it can be shown to (just) work with as few as 23 bits + sign for all source terms.

2) Implement the runtime part, which efficiently performs y=(Ax^2+Bx+C).

For this to be efficient, you really need a RISC device with a multiply-accumulate and fast shifting capability. Or at the very least, a very fast multiplier and careful coding. Unfortunately the Falcon's DSP is terrible at shifting and does present some problems of its own here, getting it to work fast. Left as an exercise for the reader

The transform looks a bit like this:

normbits = 23; // for Falcon's 24/48 DSP accumulator - 1 bit auto denorm on this device. should be 32 for a 64bit RISC accumulator.
qbits = 13; // 10 table bits + 13 precision bits == 23
tbits = 8; // arbitrary fraction retained for texture u,v, multiply precision

// during setup, get z, uz, vz normalized into fixedpoint range
z *= (int)(1<<normbits);

x = (int)pixel_z;
ix = (x >> qbits);
A = qtab[ix].A;
B = qtab[ix].B;
Q = qtab[ix].C; // C already shifted by (tbits)
Q += (B*x) >> (normbits-tbits);
Q += (((A*A)>>normbits) * x) >> (normbits-tbits);
return Q;

On the DSP it looks a bit like this (not optimized, not scheduled and missing some details):

;   x0   z
   schedule these moves elswhere, fuse across >1 iteration
   move            y:qtab_ptr,a
   move            y:rshft12,y0
   mac   x0,y0,a         y:c_FFFFFE,y0   ; &qtab[(X>>12)]               
   and   y0,a               ; &qtab[(X>>13)*2]
   move   a,r4
   schedule u,v part here, overlap x/y access if table overlaps low memory etc. etc.
   move            x:(r4),b   ; C
   mpy   x0,x0,a         y:(r4)+,y0   ; B
   mac   y0,x0,b      a,x1   y:(r4)+,y0   ; A
   mac   y0,x1,b               ;
   move    b,x0 ; 1/z

now multiply x0 (1/z) by uz,vz and combine into texture address. uz,vz should be pre-normalized.

Author:  a31chris [ Wed Nov 12, 2014 3:16 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

MikeFulton wrote:
a31chris wrote:
While I was referencing his name I found this interesting thread:

http://atariage.com/forums/topic/43179- ... ke-engine/

Interesting thing... that thread says the PSX has dedicated 3D hardware but the Jaguar didn't. That's actually not really true. The PSX does not actually have what most people would consider to be "dedicated 3D hardware". At least, not in a way that would distinguish it from Jaguar.

The main processor in the PSX is a MIPS R3000, which a basic, general purpose RISC processor otherwise known for being used in early Silicon Graphics workstations. Sony added a co-processor chip (the "GTE") that implements matrix math functions. This is used for doing your basic 3D graphics transformations on polygon vertices but has nothing to do with the actual pixel pushing. By comparison, the Jaguar GPU has similar matrix math instructions, so the two machines are on fairly even ground at that point.

The rendering loop of a PSX game is basically this:

Mike if you are reading this thread, do you have any idea how on the PSX version of Quake II they sidestepped the complex geometry lighting issue? DML hazarded a guess:

I guess it's probably quite clever and PSX specific - maybe they subdivide the faces efficiently and vertex-light it (although it doesn't appear that way to me, I didnt't notice mach banding) or maybe they use the hardware triangle engine to compose lightmaps with textures in VRAM on the fly.

Author:  MikeFulton [ Thu Nov 13, 2014 4:36 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

I don't recall having had any interaction with the developers of Quake II, but it would have been in a fairly early stage of development when I left SCEA in April '98, if it had started at all.

Author:  a31chris [ Thu Nov 13, 2014 8:26 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Thanks Mike! Thought I'd run it by you and see if you had any insight into how they did it.

Anyways he has put up another video for everyone to enjoy:


Author:  Walter_j64bit [ Sun Nov 30, 2014 1:49 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

WOW, Chris you have been busy with the idea of Quake being plausible on the Jag. 8)

Author:  a31chris [ Sun Jan 11, 2015 8:10 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

After experimenting with textures he has went back to flat shaded poly work.

DML wrote:
FPS on 'fatal1' startpoint rose from 12 to over 16fps.

I'm going to keep hacking at this until I get completely stuck for ways to speed it up in TC mode, and then will switch back to texturing performance.

What map is Fatal1 startpoint on Quake II?

Author:  a31chris [ Fri Jan 23, 2015 8:11 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

DML wrote:
I compiled a new video last night, focused mainly on outdoor or sprawling maps / big geometry / angled, non-boxy scenery. Deep stress testing for recent optimization work. (I think my Falcon actually squeaked - so much violence after 15 years asleep )

There are no textures in this one - it is flat-fill @ 320x160 / 16bit TC. This is the format I use for performance testing the engine code.

Anyway I think it's starting to run up against some hardware limits at 16/32, or at least design limits for what I've done with the program. I'm sure there are still ways to optimize it, but it's getting harder and taking longer with each try. The last optimization I tested was nasty, complicated and didn't really make much difference in the end... 1-2%. So I'm going to finally stop with this and fix the newly added bugs before looking at textures again.

Trivia: ARMA5 is a map I used to play at lunchtimes while I was working on PC games. I had a 450MHz PII (P3?) at the time, with an early NV graphics card. It's quite heavy going but the old bird just about copes :)


Author:  a31chris [ Sun Feb 01, 2015 5:46 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Another interesting kick ass video is up.


Quick test demonstrating working transparency without a z-buffer, and z-clipping of transparencies with exaggerated nearplane.

Author:  a31chris [ Fri Feb 06, 2015 8:47 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Here's his latest video with texturing on a Falcon with a 16 bit bus:

https://dl.dropboxusercontent.com/u/129 ... 160x80.avi


Author:  a31chris [ Tue Feb 17, 2015 7:15 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Even after all that he's still finding more optimizations...

DML wrote:
After spending a bit of time with the texturing routines I had two minor breakthroughs which will probably result in better texture fillrate.

So I expect the following will likely become possible soon:

- 192x120 resolution with textures (fullscreen+overscan chunky mode on RGB)
- 16bit surfaces (a bit like BadMooD - better lighting and less fuzz. currently all texture and lighting is 8bit)
- colour lightmaps, like HW/OpenGL (software Q2 used monochrome lighting)

and then...
Another speedup is on the way. After fixing most of the correctness issues and getting a stable render at all z-distances, I tried dropping the span arithmetic from 48bit to 24bit effective, and got nearly the same result. So there will soon be a 6x reduction in the amount of code needed to set up each span, and a 50% reduction in data transmitted to DSP per face - which is nice

This guy is amazing.

Author:  a31chris [ Tue Feb 24, 2015 11:09 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Another even more impressive video is up and available:


Correlating original post: http://www.atari-forum.com/viewtopic.ph ... 75#p269069

Author:  a31chris [ Wed Feb 25, 2015 7:15 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Did you guys see the Egyptian motif in the last minute of that video? That was awesome! I love Egyptian stuff! So badass. :)

Author:  a31chris [ Wed Mar 18, 2015 8:26 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Reading DML's blog on Atari forum where he is posting info on optimization updates he has done since the last video was posted is impressive. He's finding more and more ways to speed things up.

I made a few more simple improvements that actually got rid of all of the padding nops and doubled the size of the jumptower. The impact of this on speed of complex scenery is actually quite good - it spends more time drawing pixels and less flyback time on very short spans.

For now it's partly a negative trade because it meant pushing some other code out of DSP fast memory - slowing other areas down - and causing more cache misses on the CPU side (bad!) but these can be bought back later with other changes. The fact that it is faster eveywhere despite this is a good sign.

Imagine he or someone else doing these optimizations on the 32x or Jaguar where the caches are much larger and things won't get 'bumped out' so easily.

It really makes you wonder...

Original post:
http://www.atari-forum.com/viewtopic.ph ... 00#p270509

Author:  a31chris [ Wed May 13, 2015 5:45 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)


Vid #1 mainly covers continuity with big data. The camera is no longer stuck in one place carefully viewing the same stuff.

https://www.youtube.com/watch?v=LHsmzo0 ... e=youtu.be

Author:  a31chris [ Wed May 13, 2015 6:19 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Vid #2 covers breadth - different kinds of environment and complicated scenery.


Author:  a31chris [ Mon Jul 13, 2015 8:05 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

DML wrote:
So the other thing I had been working on is an alternate way to perform square-root operations for realtime 3D.

These are very expensive to perform via 882 FPU and even more so using algorithms on the CPU. Tables help but consume a lot of space to make a real difference and this is useless on a DSP 56k with very little RAM. The excellent Carmack/3DLabs sqrt() trick - which exploits floating point bit representation - deserves a mention. But it requires FPU and therefore still expensive (and limiting) on a Falcon, and useless on the DSP.

(I will point out that square-root is highly valuable for 3D graphics. Having access to a fast sqrt() makes a real difference to what is possible!)

So far I had been using a modified/improved bitwise algorithm on the DSP, both integer and fixedpoint versions. This works quite well but requires 23 iterations of a 5-instruction sequence. That's 23*5*2 = 230+ cycles (!!!). I tried translating other algorithms to DSP but this remained the general winner for speed/accuracy. There is a partial-table solution which should be faster but it didn't save much and consumed a lot more space and registers. In any case all methods tried are either so slow (or so inaccurate) that they have limited use.

But I didn't give up!

After some experiments I developed a solution which closely approximates a 23bit fixedpoint sqrt() in just 10 cycles.

A modified/compound version can also approximate 1.0 / sqrt(x) - albeit less accurately - which can then be used to normalize 3D vectors very very quickly. I wouldn't use this for important math (!) but I think it should suffice for most graphics uses.

The fun part - this method is continuous, accurate enough to replace other methods and fast enough to use per-pixel.

There is some other stuff going which ties in with this, but it is early stages and I'm not close to describing it yet.

Below is a dump from random samples using this integer-only sqrt() approximation. Only result deviations >= 0.01% vs expected are reported, indicating that accuracy decreases with small source values, which turns out to be ok for most common cases of sqrt() in graphics problems and isn't too much of a surprise for integer-based formulas anyway as fewer bits are available for smaller numbers, unlike floats.

[x=realvalue]  [y=expected] [y=actual] [error >= 0.01%]
r:0.4277 ye:0.6540 ya:0.653809 e:0.02%
r:0.0586 ye:0.2421 ya:0.241943 e:0.06%
r:0.1701 ye:0.4124 ya:0.412109 e:0.08%
r:0.2395 ye:0.4893 ya:0.489258 e:0.02%
r:0.3707 ye:0.6088 ya:0.608398 e:0.07%
r:0.3865 ye:0.6217 ya:0.621826 e:0.02%
r:0.1175 ye:0.3427 ya:0.342529 e:0.06%
r:0.3985 ye:0.6313 ya:0.631104 e:0.03%
r:0.4556 ye:0.6750 ya:0.675293 e:0.04%
r:0.3674 ye:0.6061 ya:0.605957 e:0.03%
r:0.2812 ye:0.5302 ya:0.530029 e:0.04%
r:0.1479 ye:0.3846 ya:0.384521 e:0.01%
r:0.3074 ye:0.5544 ya:0.554199 e:0.04%
r:0.0485 ye:0.2203 ya:0.219971 e:0.16%
r:0.4198 ye:0.6479 ya:0.647705 e:0.03%
r:0.0109 ye:0.1045 ya:0.104248 e:0.19%
r:0.4058 ye:0.6370 ya:0.636475 e:0.08%
r:0.1180 ye:0.3434 ya:0.343262 e:0.05%
r:0.3377 ye:0.5812 ya:0.580811 e:0.06%
r:0.2195 ye:0.4685 ya:0.468262 e:0.05%
r:0.3707 ye:0.6088 ya:0.608398 e:0.07%
r:0.0187 ye:0.1369 ya:0.136719 e:0.12%
etot: 0.000649

Author:  a31chris [ Fri Jul 17, 2015 4:55 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

DML wrote:
Maybe the DSP one can be improved but 5 ops and 23bit result was as near as I got for the traditional way. Note that it operates on 23bit fractions so in/out values are shifted by 1 bit, as is typical for DSP.
sqrt   macro      xysqr,xyroot,Txy
   move      y:<cy_point5,b
   tfr      b,a      #<0,xyroot   ;            : pattern-accumulator
   do      #<23,_loop
   lsr      b      a,Txy      ; shift   trial bit      : new trial pattern
   mpy      Txy,Txy,a         ; trial   (x*x)
   cmp      xysqr,a      xyroot,a   ; (x*x)>a?         : restore pattern-acc for update
   tle      Txy,a            ; condition update pattern-acc
   add      b,a      a,xyroot   ; combine bit         : save updated pattern-acc

Sorry about the tabs.

It is possible to get rid of the lsr shift but seemingly not the parallel move - so 5 ops it is for now. BTW unrolling it a bit can remove a few ops I think from the final iter but I didn't bother, kept it small. It takes forever anyway.

Original Post

Author:  a31chris [ Fri Jul 17, 2015 5:02 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

DML wrote:
I just got the DSP version of approximate sqrt() working, have tested it and I am now pretty certain that it will be accurate enough to replace the other one in most cases.

The body of the calculation is 8 cycles (4 ops) but there is a 12bit normalizing shift involved afterwards and the fastest I could do this was 10 cycles (5 ops - beating my previous impl for a 48bit dynamic shift by 2 ops). So the full arithmetic takes 18 cycles on DSP after all...

There is also some addressing setup code - which can be amortized into a loop (same as was done for the texturemapper) but standalone its another bunch of cycles. So lets say the first pass on the DSP is 18 cycles best case, up to 30 worst case if just called once. More than the 10 cycles I had sketched out but I won't complain. Definitely better than 230 though

For the texturemapper I was able to play with normalization of each term, at the expense of accuracy and removed nearly all of the shifting from the original version. Not sure I can do that here but it's only the first iteration. Maybe another day.

It's definitely nice to see my test running waaaaay faster with this upgrade

Author:  a31chris [ Mon Oct 05, 2015 2:13 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

I am going to give VladR a nod here. A few years ago he oddly became interested in working on the Jaguar. He was the head 3D researcher for Nokia or somesuch. He is a very talented 3D engine programmer.

The reason I say it's odd because he seems to like working on 3D algorithms in higher level languages such as C and seems to have no real taste for going 'to the metal' as it were. Which, considering the relative state of the Jaguar's development tools makes his choosing the Jaguar a rather odd one IMO.

But that being said while he seems to distance himself from going low level as much as possible he seems to really get a thrill out of pushing 3D algorithms and finding faster ways to do things in 3D, on the Jaguar, in C. Which currently means working with the 68k mainly.

He seems to have made some breakthroughs and has some interesting ideas.

VladR wrote:
I finally reorganized my dev set-up, and connected jag (including skunk) to a small tv that fit along the wall next to my PC, so I can deploy the builds within 10 seconds to jag and test it right away.

I obviously tried the H.E.R.O. first and have been optimizing and refactoring it since. You guys didn't mention that my last public 30-fps build ran more like 24 fps on real HW.

But that's fixed now. I crossed the 30-fps about 3 weeks ago, then jumped to 45 fps about 2 weeks ago.

Getting over 55 fps was quite challenging, but I came up with a new line drawing algorithm that is incredibly fast compared to Bresenham (and others).

Yesterday I finally crossed the 60 fps barrier. On an actual jag. No GPU, No DSP, No ASM - just the 'slow' C compiled to bus-hogging 68k driving the Blitter/OP :-)

This code, when rewritten to GPU, should be able to handle something like 640x480 in an acceptable smooth framerate, I believe.


Above he talks about a new line drawing algorithm. If it is the breakthrough he claims something like this may be useful on the Falcon030/Sega32x.

Also brings up a question. On a system with relatively hardwired 3D hardware such as the PSX I wonder if such a new/faster line drawing algorithm would benefit the PSX or if you're just stuck with whatever the 3D hardware may have built into it.

Another idea he had that he has started a blog on is procedural texturing. He wants to see what he can do in this regard to get Doom 3 style textures on the Jaguar. Once again these techniques would most likely be more useful on the 32x rather than the Sega Saturn.

http://vladr.blog.com/2015/05/31/doom-3 ... ri-jaguar/

With his love for 3D work and high level languages it's unfortunate that the GPU compiler tools and gpu manager developed by High Voltage Software are not currently available. Such a thing would give him everything he is dreaming of. High level access to the gpu and he wouldn't have to worry about overlays or the hardware bug that currently hampers the GCC for GPU. Everything would be transparent and automated for him.

But perhaps in the future they will be found. Or recreated.

Author:  Saturn [ Wed Oct 14, 2015 3:02 am ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

Some great info in here. Just watched every video and was blown away by those graphics :shock:
Hope this becomes a release someday. In whatever form the last build is in. Demo or finished product. Some outstanding work :!:

Author:  a31chris [ Tue Jan 03, 2017 10:58 pm ]
Post subject:  Re: quake Engine on the Jaguar (Scrummy posts)

DML has changed directions a little bit. Not sure why.


Page 2 of 2 All times are UTC [ DST ]
Powered by phpBB® Forum Software © phpBB Group