Optimus, though unable to register here, contacted me on DIngoonity and he reacted to my post.
We had a very long discussion and after a while, he agreed to post the source code to his 3DO demo :
https://bitbucket.org/Optimus6128/3dold
It can be seen in action here :
https://www.youtube.com/watch?v=Oj2LrdX-sMU
https://www.youtube.com/watch?v=o4LrTO62A5o
I'm still not that knowledgeable about the 3DO so my comments may not be accurate.
Optimus, bug me if you have an issue.
3D Graphics
I got the confirmation that the 3DO is indeed Quad-based.
However, he dislikes the term "Distorted sprites" that many people adopted when talking about the Sega Saturn's VDP chips and how they work.
In his own words :
Because at the end, even the triangles on PS1 are 2D interpolation in the final stage. It's just that quads fit better for sprites too, while you would need two triangles in PS1 to draw a quad for a 2d sprite.
As i suspected, the CEL engine (GPU) only takes care of the drawing : the transformation/distortion/rotations is entirely done on the main ARM60 Cpu.
The 3DO has a math co-processor but it's unknown if the official libraries are even using it or if it even exists (the include headers don't talk about it in length).
He explains how drawing 3D graphics works :
So, anyway the whole process is to rotate/translate 3d points, and in a final state project it back to 2d coordinates on screen. This can be usually done on the CPU...
Now, the quad rasterizer of 3DO takes some vectors. When I have to draw sprite, either zooming/rotated or simple, I have different cases, where I need to calculate less for the vectors needed. So, it's faster to do simple sprites, than zoom than rotate, as I prepare and calculate much less and then prepare my cell quads with the sprite bitmap for rendering. In 3D there is more, I might waste time on CPU, I could invest better ways, but the whole thing is the CPU is used for 3d rotation/transformation, then projection on 2D, my geometry only uses only uses quads instead of triangles, so I have 4 projected 2D points for each quad of the geometry, just have to calculate some vectors for the interpolation
Because the CPU might be bottlenecked by expensive transformation/rotations, Optimus told me he had to resort to fixed-point.
in my demo I used fixed point math to be as fast as possible, because the 3DO is quite slow even for that.
...
So, the quad rasterization is all on hardware so it's fast enough, no software rendering needed, but to rotate/translate and prepare the cells each frame is on CPU much slower.
I would invite everything to look at his repo he want to know how it is actually implemented.
I'm honestly surprised by how short it is.
3DO Doom
Then we talked about Doom and wondered how the framerate could be improved...
Optimus suggested some ways to improve it :
Doom is drawing the walls using cells, but drawing floors/ceilings with software rendering directly on the videobuffer...
Also, even the walls are drawn as columns, but with many square cels with width 1 and height the height of the column
3DO Doom is partially software rendered, as Bucker didn't have enough time to make the cellings/floor fully hardware rendered.
She did have the time to accelerate the walls though. (she even explained in her livestream how she did it, via ARM assembly)
Also, even the walls are drawn as columns, but with many square cels with width 1 and height the height of the column.
I would expect the most optimal would be to find the rectangle that fits the linedef wall on screen coordinates and only render a single cell per linedef. But interestingly enough, even good ports like PS1 did it like that, many thin triangles as columns, instead of drawing two triangles for a wall surface.
According to Optimus, the walls are drawn very inefficiently using multiple CELs.
However, 3DO's own documentation seems to contradict this :
Huge Cels Draw Slowly
Q: I have a 16-bit, 320-x-440 cel. I'd like to spin it, zoom it, and scroll it over the screen, but when I use DrawCel() to display it, the screen update rate seems to be only 5 to 10 frames per second. If a cel is less than one third the size of the screen, updates seem to occur at 30 frames a second. Do you think I'm doing something wrong?
A: Nothing's really wrong; you're just trying to render a boatload of data. 320 * 400 * 16 bits/pixel == 256000 bytes that the cel engine must crunch through.
Though it's possible i'm wrong or that it does not matter anyway...
Because he admits he does not know Doom's true bottleneck :
But anyway, maybe that's not the bottleneck of Doom (I have to compile and profile). I was talking with Rebecca Heinman on youtube, what she thought needed to be done to optimize Doom, and she was talking about visibility and drawing order, and generally big changes to the flow and structure of Doom might have to be done for speed improvements. Most ports, even PS1 which used the GPU, still rendered many individual stretched columns for each wall surface.
So it seems that even the PS1 port may have been rushed... lol
A while later, he may understood why the walls were drawn like so :
I was kinda thinking what I said about Doom, where you have a single wall and you could fit a single quad to rasterize it, but in 3DO they use many quads with width 1 to simulate column rendering like the original Doom (and even PS1 turns to be doing that, surprisingly, which seem wasteful to me). But I know think there would be technical problems doing it without columns. Maybe they can be overcome with more memory or tricks.
It's no wonder Becker did not bother to optimize Doom's rendering of walls when she had little time to do it.
The thing is, a linedef in doom (it's the 2d line in the map that will be an individual wall) has offsets for U,V, which you can change, and many linedefs don't start from 0,0 and don't end at the end of the bitmap. It's very arbitrary. And from what I see from CELs, they stretch a whole bitmap and I didn't found a way to offset the U,Vs (I don't see such flags for this in the CEL structure. Which is a pitty because there are other effects one could do, like envmapping, by offseting the UVs, but it seems the 3DO will stay in pure texture mapped, not even gouraud shaded, just texture mapped with faded pals to darken a whole quad). I could think of some tricks but it seems it made sense to do column rendering with multiple quads.
I honestly thought the 3DO was able to do gouraud shading but alas it can't.
Good bye potential Mario 64 port.
I wish i could post everything he said but :
- His account did not get approved
- He posted a huge wall of text
He also mentioned how we can achieve faster software rendering (at 320x120) by drawing 2 pixels at a time with a 32-bits int,
the fact that the official 3DO API has functions for 3D and faster routines like fasterMapCels...
Then you could see what I am doing on the 3d engine code. The function fasterMapCels is something I found somewhere in the source code of the official API. I coppied it. They had this and then another funciton which was slower. I don't understand everything from it yet. It takes an arbitrary 4 point Quad and creates the CEL vectors that are needed to be fed. For every quad (which will always change shape in a 3d rotating ojbect) I have to call this everytime, it must be a waste for many polygons. I wish I can find even faster ways.