LDraw.org Discussion Forums

Full Version: Leading or trailing white space characters in file names
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
Thx Roland. This is far more than I expected.

Roland Melkert Wrote:A LDraw file consists of single lines separated by DOS format line breaks (#13#10).

This is already laid out here:

http://www.ldraw.org/Article512.html#termination

So we have to decide where it belongs.

w.
Travis, Chris, Allen, any comments?

w.
Sorry. When I first saw his post, I was just skimming through new posts, and saw how long it was and put it off to look at later. Then, I forgot to come back later.

I don't see any value in the added paragraph regarding comments intended to be inserted just before the Meta segment. Maybe I'm missing something, but I just don't see how it adds to the document. I'm not dead-set against adding the text, but at the moment, I don't see how it helps.

I believe that the line termination should be part of the official LDraw spec. However, given the results of Jim DeVona's experiment from two years ago, I think perhaps the official spec should allow both "DOS" (<CR><LF>) and "Unix" (<LF>) line endings. I'm OK with making DOS line endings the only ones that are officially allowed, but feel this might actually encourage software authors to explicitly only support DOS line endings.
Travis Cobbs Wrote:Sorry. When I first saw his post, I was just skimming through new posts, and saw how long it was and put it off to look at later. Then, I forgot to come back later.

Me too. The scope of these changes transcended the original discussion, and I forgot to loop back around.

Roland did a lot of hard work here, and has good ideas for changes. Thanks Roland! However, I think we should table these new issues until resolving the immediate issue (just to keep us on a single topic; re-introduce them afterward):
  • Line endings
  • Comments/Meta rewrite

I suggest the following revised text, in which I hope I've capture the spirit of Roland's work while avoiding raising new issues. I want to make the leading whitespace issue explicit. While framing the discussion in terms of tokens is technically accurate, I think it makes the leading whitespace issue subtle enough to be accidentally overlooked.

Replace Wrote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. The whitespace characters allowed for keyword and parameter separation include spaces and tabs. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

With Wrote:Basic parsing / file content

An LDraw file consists of one command per line. There is no line length restriction. Each command consists of optional leading whitespace followed by whitespace-delimited tokens. Some commands also have trailing arbitrary data which may itself include internal whitespace; such data is not tokenized, but treated as single unit according to the command.

Whitespace is defined as one or more spaces (#32), tabs (#9), or combination thereof.

Lines may also be empty or consist only of whitespace. Such lines have no effect.

If a line is non-empty, the first token must be an integer from the list of valid Line Type numbers. This number dictates further parsing of tokens for that line. The parsing rules per linetype follow below.

Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

Replace Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename.

With Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Any leading and/or trailing whitespace must be ignored. Normal token separation is otherwise disabled for the filename value.
I'm ok with this version.

I'm also ok with forking the line endings and comment lines issues.

My main reason for adjusting the comment paragraph was to make it clear it should be handled as a single string to avoid something like html parsing (where all whitespace is replaced by a single space).

As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).
Roland Melkert Wrote:I'm also ok with forking the line endings and comment lines issues.

Just to be clear, I want to break up the discussion into individual issues because it is easier to keep track of everything.

Roland Melkert Wrote:My main reason for adjusting the comment paragraph was to make it clear it should be handled as a single string to avoid something like html parsing (where all whitespace is replaced by a single space).

I figured as much; the line about the "trailing arbitrary data" was intended to address that problem. I agree with you the comment section needs a re-write, but I think it has a few other issues too.

Roland Melkert Wrote:As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).

It is de facto, just not formalized. This whole topic is worth its own separate discussion, although it might be a very short discussion. (Bricksmith also writes out CRLF, by the way.)

Allen
This works for me too.
Allen Smith Wrote:
Roland Melkert Wrote:As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).

It is de facto, just not formalized. This whole topic is worth its own separate discussion, although it might be a very short discussion. (Bricksmith also writes out CRLF, by the way.)

Allen

DOS line-endings are de facto. It may not be formalized, but for the official Parts Library it is enforced duting Parts Update packaging (yes, there were a few files that were erroneously released with bad endings, due to some buggy checking, but they have since been fixed).
Allen Smith Wrote:
' Wrote:Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

I don't dislike Travis' proposal of the not leading first-line whitespace and since it is not mandatory ...

w.
Willy Tschager Wrote:
Allen Smith Wrote:
' Wrote:Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

I don't dislike Travis' proposal of the not leading first-line whitespace and since it is not mandatory ...

The main trouble is that if it's not mandatory, it is of no real use to assist software in determining the file type, and thus doesn't really need to exist. What do others think?

If it stays, the phrase "character encoding" needs to change to "determining whether the file is an LDraw file."

Allen
Pages: 1 2 3 4 5 6