Login

Jonathan N · 2023-04-04, 16:13

I've been working on an experimental renderer for LDraw files. The goal is to create a performant rendering engine targeting modern GPUs. The renderer is built on top of wgpu, which uses Vulkan, DX12, or Metal depending on the platform. These APIs are newer than OpenGL and enable access to more hardware features. This includes most devices made in the last 10 years or so. This should work in the browser eventually once WebGPU is released.

The framerates can easily be several times faster than existing renderers in applications like LeoCAD, Blender, LDView, etc. My desktop with an RTX 3060 only gets 1 fps (1000+ms) per frame when viewing Datsville with standard resolution primitives and no logos on studs. I get about 6-7 fps when Fully zoomed out in ldr_wgpu. Framerates shoot up to 60-120 fps when viewing a more modest area of the scene.

While the newer graphics APIs have less CPU overhead, the biggest performance improvements come from only rendering what's actually visible each frame. This is why the framerate is highly variable as the camera moves around. ldr_wgpu performs culling at the object level. Frustum culling culls any objects that are outside the camera's viewable area. Occlusion culling culls any objects that are completely obscured by other objects. While conceptually simple, occlusion culling is hard to implement efficiently. See the README on the github repository linked below and the source code for more details. The culling is calculated each frame in real time and only requires precalculating bounding boxes and bounding spheres. When zoomed out, frustum culling no longer helps. Occlusion culling can often reduce the amount of rendered geometry by 2x-3x.

There have been a number of posts about level of detail for studs and primitives. This improves performance but greatly reduces the visual quality. Thankfully, this isn't the only way to reduce the amount of vertex processing that needs to be done by the GPU. Indexed drawing can be faster than non indexed drawing but only if duplicate vertices are merged. Modern GPUs can cache vertices with the same index. This cut the vertex processing time in half when profiling on my M1 Macbook Air.

The other main topic of discussion I've seen is BFC as a performance enhancement. Whether this helps or not is highly dependent on the situation. Backface culling happens after the vertices are processed by the vertex shader. If you have a very simple pixel/fragment shader as is common in many CAD programs, there won't be much of a performance difference. I haven't been able to measure any performance difference in my testing since I'm currently just using flat shading. The processing time each frame was easily dominated by the vertex shader. I'll reassess the impact if I implement more complicated lighting and shading in the future.

The code is designed to serve as a reference for people wanting to optimize their own renderers. I've done my best to comment any techniques and link to papers and articles where appropriate. I don't have any prebuilt executables at the moment, but you can download the code from github and compile it yourself. This requires having the Rust language toolchain installed. I plan on making this into a library that people can use for their own Rust projects eventually. There are still a number of features and improvements to address like normals and more realistic shading. The github repository will be updated periodically with new techniques and insights.

https://github.com/ScanMountGoat/ldr_wgpu

***Philippe Hurbain*** · 2023-04-05, 7:31

Looks very interesting! Is there some binary available ready to be tried? - Though my ageing hardware would probably won't benefit from it Wink

Max Murtazin · 2023-04-05, 7:43

(2023-04-05, 7:31)Philippe Hurbain Wrote: Looks very interesting! Is there some binary available ready to be tried? - Though my ageing hardware would probably won't benefit from it

If it got it correctly, only the code to build

Jonathan N · 2023-04-05, 14:26

(2023-04-05, 7:43)Max Murtazin Wrote: If it got it correctly, only the code to build

Yeah. Rust is one of the more straightforward languages to build. I've updated the readme with build instructions. You'll also need the C++ build tools from Visual Studio if you're on Windows. The goal is to eventually create a rendering library and then use it to make a simple viewing application that people can download. The current implementation is still very much a proof of concept, so I'm not providing any precompiled binaries yet.

Out of curiosity, what GPU does your "ageing" machine have? From my experience, the first generation of supported hardware tends to not work well if at all. Intel Haswell CPUs were released around 2013 and apparently support Vulkan, but I've gotten numerous reports from users on other applications that it doesn't actually work. If you have an older dedicated GPU from AMD or Nvidia, the driver support tends to be better since a lot of games use DX12.

***Philippe Hurbain*** · 2023-04-05, 15:48

(2023-04-05, 14:26)Jonathan N Wrote: Out of curiosity, what GPU does your "ageing" machine have? From my experience, the first generation of supported hardware tends to not work well if at all. Intel Haswell CPUs were released around 2013 and apparently support Vulkan, but I've gotten numerous reports from users on other applications that it doesn't actually work. If you have an older dedicated GPU from AMD or Nvidia, the driver support tends to be better since a lot of games use DX12.

My machine is equipped with a GeForce GTS240 board. That's a 2009 hardware Blush

. When I say ageing...

Jonathan N · 2023-04-08, 0:12

(2023-04-05, 15:48)Philippe Hurbain Wrote: My machine is equipped with a GeForce GTS240 board. That's a 2009 hardware . When I say ageing...

That is indeed quite old. That's great that it still runs Smile

. I wouldn't expect ldr_wgpu to run much faster than existing programs on your GPU even if I could get it working due to hardware limitations.

For those curious, an OpenGL 3.3 compatible way to do occlusion culling is with occlusion queries. This tells the GPU to draw an object (usually a simplified version like a bounding box), check if it's occluded, and then tell the CPU if it's occluded or not. Issuing all those queries and waiting to hear back can take a while. With lots of small objects like in Lego models, this would add a lot of overhead. Some game engines wait some number of frames to check the queries to improve performance, but this can create flickering if the camera moves too quickly. You can also "bake" the occlusion checks by removing hidden geometry like some scripts have done, but this won't occlude as much geometry and takes a while to calculate.

By "modern" GPUs I mean GPUs with more general purpose computing capabilities like compute shaders. GPUs used to just do rendering. Now you can schedule your own computations to run on the GPU. With compute shaders, the GPU can perform a similar check just using some math operations and run this for many objects in parallel. This scales very well to large scenes with lots of objects and doesn't require the CPU to wait for anything to complete.

Jonathan N · (This post was last modified: 2023-05-29, 19:40 by Jonathan N.)

I've made some more improvements to rendering performance and loading times. I've also implemented calculation of normals using an angle threshold as well as splitting edges similar to what I do for the Blender addon ldr_tools_blender. The edge splitting is in its own module in the source code with associated unit tests for reference. The implementation is inspired by Blender but is implemented slightly differently.

These are the results from loading datsville. The time includes the time to create all the GPU buffers and shaders needed for rendering.
Vertices: 6368359, Indices: 12217668
Load 392515 parts, 3094 unique colored parts, and 1015 unique parts: 1.5907709s

For a high level description of the design decisions and names and links for the papers referenced, see the github repo. The document also has descriptions of how I represent an LDraw scene to better utilize multithreading and reduce memory usage.

https://github.com/ScanMountGoat/ldr_wgp...TECTURE.md

Max Murtazin · 2023-05-30, 5:06

(2023-05-29, 19:39)Jonathan N Wrote: I've made some more improvements to rendering performance and loading times. I've also implemented calculation of normals using an angle threshold as well as splitting edges similar to what I do for the Blender addon ldr_tools_blender. The edge splitting is in its own module in the source code with associated unit tests for reference. The implementation is inspired by Blender but is implemented slightly differently.

Using angle threshold method is not the best thing to do. LDraw models are relatively low-polygonal, and angle threshold would more likely lead to unnecessary edge splitting

Suggest using either edge splitting by absence of condline or presence of line. I believe prior is done by LDView

Jonathan N · 2023-05-30, 13:32

(2023-05-30, 5:06)Max Murtazin Wrote: Using angle threshold method is not the best thing to do. LDraw models are relatively low-polygonal, and angle threshold would more likely lead to unnecessary edge splitting

Suggest using either edge splitting by absence of condline or presence of line. I believe prior is done by LDView

The angle threshold for smoothing is very large (< 90 degrees) and doesn't do much in practice, so I may just end up removing it. I split any of the line type 2 edges. I've found defining the set of "sharp" edges to be easier to work with when triangulating the faces. I haven't checked if the absence of an optional line always indicates the presence of a line for all LDraw part files.

Login
Username:
Password:	Lost Password?
	Remember me