Parsing LDraw


Parsing LDraw
#1
*sigh*

Okay, I've probably been it at it many times and probably even asked some questions on this topic before, but, alas, I'm at it again - trying to build an LDraw file format parser in C#. Why C#? Tl;dr: I need it for other stuff, and that stuff uses C#.

Well, the question itself: How does one even go around effectively parsing LDraw? This topic is a bit confusing to me due to the recursiveness (I guess that's what the word would be) of the format. My current approach is like this:
1. Read through all of the file, store all the faces in an array, store all the submodels in a separate array
2. Go through the submodel array, go to step 1 for each of them
3. Once the process is out of this recursive mess, "resolve" each submodel by applying the transform to it's triangles and adding them to the submodel's parent
4. Done, I guess?

It is very simple and doesn't account for, like, any meta or other stuff that could be important, but it should be enough for the start. I'm just not sure if this approach is good, or how it possibly could be improved

Also, I know I can read stuff like LDView source code, but I'm really bad at reading code made by other people, so it would probably take way more time than just asking it here
Reply
RE: Parsing LDraw
#2
In due time you will likely get an answer from Niels Schmidt or Travis Cobbs, but here's my take:

The file format is modular, not recursive. Recursive would mean that the same file or callpath contained the same file twice. The file format implies branching and one may end up reading the same file hundreds of times if implemented naively. Is your target to minimize the number of times a file is read or the number of open file handles?

Before thinking about populating a data structure most efficiently, you need to define the data structure you will store the contents to. Here, I will assume you want to aggregate all quads, lines, and triangles into one big collection and kind of 'inline' all the type 1 objects. I also assume you have a single-threaded application.

In any case, I would create a temporary object structure that mimics the hierarchy of the files and file references so that each file has only one object, but possibly more than one reference path. This minimized the number of times you need to read one file. Such a data structure could also keep track of the reference path, i.e. the path from the root object to the current object. If the file being read contains a reference that is already in its callpath then you have detected an infinite loop.

Once you know the tree is loopless, you can read each file once and copy the lines, triangles, and quads into temporary arrays. Once the temporary structure has been fully populated, you can just collapse it to a single collection by applying the transforms as you mentioned. You may also wish to keep track of the winding, so keep a lookout for "BFC INVERTNEXT" and BFC CCW/CW/nocertify.
Reply
RE: Parsing LDraw
#3
LDView is written in C++, and the current parser was written over 20 years ago (before "modern" C++), so anybody trying to read its code might have issues. However, its parser works like this at a high level:

  1. The top level is a LDLMainModel object that contains information that is only needed at the top level. This is a subclass of the LDLModel object.
  2. The LDLModel object represents a single LDraw file. It basically contains a list of LDLModelLine objects.
  3. LDLFileLine is the base class for the various different LDraw line types. In the case of LDView, these are LDLCommentLine, LDLEmptyLine, LDLUnknownLine, LDLModelLine, LDLLineLine, LDLConditionalLineLine, LDLQuadLine, and LDLTriangleLine. (There are some other classes in between to group various ones.)
  4. Each "line" object knows how to parse the text in the corresponding line of the LDraw file, and extract the information. So, for example, the LDLModelLine scans in all the parameters in a type 1 line in an LDraw file.

When a user opens a file, the LDLMainModel is asked to read it, and it first reads the whole thing in, creating its array of file lines. After that is done, it then goes through the array and asks each one to parse the data. For most line types, that just means loading the data in. For example, the triangle line scans in the three points. For the model line type, it does that, but then for the model itself it populates an LDLModel object. First, it checks the loaded models dictionary to see if that model has already been loaded. If it has, it just bumps up the reference counter and hooks onto it. If not, it loads the file (using the same code as from the main file), which then repeats the process above. (Oh, it first checks to make sure there isn't an infinite loop, like a->b->c->a. If there's an infinite loop, it generates an error.)

LDView splits the logical loading (above) from the data needed to render the model. All of the above loads the LDraw file into memory, but it's not in a format that is suitable for rendering. It then walks through the loaded data and transfers it into structures more suitable for rendering. So, for example, each model will then have a bunch of triangles all grouped together in one contiguous array that's in the proper formatĀ for rendering. And, like the loader above has LDLModelLine and LDLModel (deduped), the renderer has TRESubModel and TREModel (deduped). The TRESubModel contains information about the transformation, while TREModel contains the rest. And if you have an LDraw file with 500 instances of 3001.dat, there will be 500 3001-specific TRESubModel instances, but only one 3001-specific TREModel instance.

There are many ways to write a program, and I wrote this over two decades ago, so I'm sure if I were writing it today it would be very different. So the above is simply meant to be an overview of how LDView works. Feel free to make us of the information any way you see fit (including ignoring it all).
Reply
« Next Oldest | Next Newest »



Forum Jump:


Users browsing this thread: 1 Guest(s)