Login

Jonathan N · 2023-04-04, 16:13

I've been working on an experimental renderer for LDraw files. The goal is to create a performant rendering engine targeting modern GPUs. The renderer is built on top of wgpu, which uses Vulkan, DX12, or Metal depending on the platform. These APIs are newer than OpenGL and enable access to more hardware features. This includes most devices made in the last 10 years or so. This should work in the browser eventually once WebGPU is released.

The framerates can easily be several times faster than existing renderers in applications like LeoCAD, Blender, LDView, etc. My desktop with an RTX 3060 only gets 1 fps (1000+ms) per frame when viewing Datsville with standard resolution primitives and no logos on studs. I get about 6-7 fps when Fully zoomed out in ldr_wgpu. Framerates shoot up to 60-120 fps when viewing a more modest area of the scene.

While the newer graphics APIs have less CPU overhead, the biggest performance improvements come from only rendering what's actually visible each frame. This is why the framerate is highly variable as the camera moves around. ldr_wgpu performs culling at the object level. Frustum culling culls any objects that are outside the camera's viewable area. Occlusion culling culls any objects that are completely obscured by other objects. While conceptually simple, occlusion culling is hard to implement efficiently. See the README on the github repository linked below and the source code for more details. The culling is calculated each frame in real time and only requires precalculating bounding boxes and bounding spheres. When zoomed out, frustum culling no longer helps. Occlusion culling can often reduce the amount of rendered geometry by 2x-3x.

There have been a number of posts about level of detail for studs and primitives. This improves performance but greatly reduces the visual quality. Thankfully, this isn't the only way to reduce the amount of vertex processing that needs to be done by the GPU. Indexed drawing can be faster than non indexed drawing but only if duplicate vertices are merged. Modern GPUs can cache vertices with the same index. This cut the vertex processing time in half when profiling on my M1 Macbook Air.

The other main topic of discussion I've seen is BFC as a performance enhancement. Whether this helps or not is highly dependent on the situation. Backface culling happens after the vertices are processed by the vertex shader. If you have a very simple pixel/fragment shader as is common in many CAD programs, there won't be much of a performance difference. I haven't been able to measure any performance difference in my testing since I'm currently just using flat shading. The processing time each frame was easily dominated by the vertex shader. I'll reassess the impact if I implement more complicated lighting and shading in the future.

The code is designed to serve as a reference for people wanting to optimize their own renderers. I've done my best to comment any techniques and link to papers and articles where appropriate. I don't have any prebuilt executables at the moment, but you can download the code from github and compile it yourself. This requires having the Rust language toolchain installed. I plan on making this into a library that people can use for their own Rust projects eventually. There are still a number of features and improvements to address like normals and more realistic shading. The github repository will be updated periodically with new techniques and insights.

https://github.com/ScanMountGoat/ldr_wgpu

Login
Username:
Password:	Lost Password?
	Remember me