Hi Travis,
Bricksmith 'flattens' library parts (e.g. a 2x4 brick is a single geometry VBO and index VBO with a single pile of tris, lines and quads, even though it was built out of studs and boxes and tubes). Models are then drawn by making one draw call per part, with a transform set to position the part.
The result is that indices and VBOs don't expand unbounded. We definitely have parts larger than 65k indices - the 48x48 baseplate is something like 400k vertices unmerged and about 200k vertices merged. But with 32-bit vertices all the major cards will handle that without blowing up -- the days of 32-bit indices causing havoc are pretty much gone unless you're supporting Win95 users with Voodoo 2's. :-)
GLES will support 16 or 32 bit indices - the motivation for 16 is that most of the GLES driver stacks support small-data-size optimizations, and mobile devices are always bandwidth constrained. :-(
We do a few things to cope with the large number of draw calls that one-draw-per-part and one-VBO-per-library-part induce:
- We draw by part type, to cut down on VBO/pointer binds - changing VBOs is a big CPU cost!
- We use hardware instancing to draw more than one brick per draw call where supported in hardware.
- We actually pass the transform matrix as immediate-mode attributes instead of uniforms (which is what the native glRotate would give you). On Windows YMMV but on OS X replacing uniforms with attributes can get you about a 2x win if (1) you don't have too many attributes (e.g. only do this for the transform, not everything) and (2) you change the transform matrix with every draw call.
cheers
BEn
Bricksmith 'flattens' library parts (e.g. a 2x4 brick is a single geometry VBO and index VBO with a single pile of tris, lines and quads, even though it was built out of studs and boxes and tubes). Models are then drawn by making one draw call per part, with a transform set to position the part.
The result is that indices and VBOs don't expand unbounded. We definitely have parts larger than 65k indices - the 48x48 baseplate is something like 400k vertices unmerged and about 200k vertices merged. But with 32-bit vertices all the major cards will handle that without blowing up -- the days of 32-bit indices causing havoc are pretty much gone unless you're supporting Win95 users with Voodoo 2's. :-)
GLES will support 16 or 32 bit indices - the motivation for 16 is that most of the GLES driver stacks support small-data-size optimizations, and mobile devices are always bandwidth constrained. :-(
We do a few things to cope with the large number of draw calls that one-draw-per-part and one-VBO-per-library-part induce:
- We draw by part type, to cut down on VBO/pointer binds - changing VBOs is a big CPU cost!
- We use hardware instancing to draw more than one brick per draw call where supported in hardware.
- We actually pass the transform matrix as immediate-mode attributes instead of uniforms (which is what the native glRotate would give you). On Windows YMMV but on OS X replacing uniforms with attributes can get you about a 2x win if (1) you don't have too many attributes (e.g. only do this for the transform, not everything) and (2) you change the transform matrix with every draw call.
cheers
BEn