Leading or trailing white space characters in file names


Leading or trailing white space characters in file names
#1
Since the whitespace is out of the door I'd like resume this:

Travis Cobbs Wrote:In addition to the above, I also feel we should codify MLCad's behavior, since it appears to be the de-facto standard. This can be done by adding the following sentence to the end of that paragraph in the spec:

Proposed LDraw Specification Addition Wrote:Furthermore, filenames must not contain leading or trailing white space characters.

However, if there's any disagreement about the above at all, then I think we should vote on the first two changes immediately, and temporarily table discussion of leading and trailing whitespace in the same way I'm proposing we temporarily table discussion of Roland's point.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#2
I don't see any valid reason for allowing leading and or trailing white spaces.

But if we do allow it, we need to think about introducing quotes as-well so parsers know where the start and end.

Backwards compatibility will break with quotes (in means of software failing to find the referenced file) but so does using leading/trailing white spaces.

Also when we declare leading and or trailing white spaces 'invalid' then we must also define exactly what 'white spaces' are.
Reply
Re: Leading or trailing white space characters in file names
#3
I'm against supporting quoted filenames purely on the basis that they aren't supported by most (any?) current LDraw software. They probably would have been a better solution, had that been done back when spaces first started appearing in filenames in LDraw files, but since things have been the way they are right now for so long, I think giving an official stamp of approval to the behavior as I described is the best thing to do.
Reply
Re: Leading or trailing white space characters in file names
#4
Travis Cobbs Wrote:In addition to the above, I also feel we should codify MLCad's behavior, since it appears to be the de-facto standard. This can be done by adding the following sentence to the end of that paragraph in the spec:

Proposed LDraw Specification Addition Wrote:Furthermore, filenames must not contain leading or trailing white space characters.

I strongly support codifying a restriction against leading and trailing whitespace for Line Type 1 file references. Leading whitespace is technically impossible without breaking backward compatibility. Trailing whitespace is technically possible, but the de facto standard behavior of parsers is to ignore it. Unfortunately, that is tribal knowledge. I think it needs to be in the spec.

I oppose mandating any restrictions on freestanding files (unreferenced by any Type 1 lines). This is because users have name control outside of LDraw software, so they can easily create filenames using any characters or arrangements permissible on the filesystem. I think it would be weird to mandate that an editor reject files for any reason other than an internal syntax error.

I happily support any non-binding warnings in the spec advising against doing such a thing for freestanding files due to the impossibility of eventually referencing them in another file. Editor authors would then be free to convey that warning to users as well. I just think it inadvisable to mandate it.

Allen
Reply
Re: Leading or trailing white space characters in file names
#5
That makes sense. How about the following wording as a replacement for what I said above:
Proposed LDraw Specification Addition Wrote:Furthermore, filenames that contain leading or trailing white space characters cannot be referenced from other LDraw files, so they are discouraged.

In addition to the above, update the <file> specification for type 1 lines to be as follows (adding the second sentence):

Proposed LDraw Specification Change Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Any leading and trailing white space around the filename will be ignored.
Reply
Re: Leading or trailing white space characters in file names
#6
I agree with this, but it also needs a definition of what 'white spaces' are. e.g. (\t, \r, \n, \0, ' ')
Reply
Re: Leading or trailing white space characters in file names
#7
Good point. I just looked at LDView's code, and it's actually inconsistent (in other words, broken). For finding spaces prior to the filename it uses the C library function isspace(). For spaces after the filename, it only treats space and tab and white space characters.

According to here, isspace() considers the following to be white space:
  • space
  • horizontal tab
  • newline
  • vertical tab
  • form feed
  • carriage return

The newline and carriage return characters are already excluded, since they have special meaning in an LDraw file. I'm not real sure about form feed and vertical tab, though.
Reply
Re: Leading or trailing white space characters in file names
#8
Roland Melkert Wrote:I agree with this, but it also needs a definition of what 'white spaces' are. e.g. (\t, \r, \n, \0, ' ')

The spec already explicitly defines the valid field-delimiting whitespace to be '\t' and ' '. I would expect the definition of "whitespace" in reference to part names would be identical to the field-delimiting values.

\0 is a pathological case, and anyone who has managed to insert one into his file deserves exactly what he's going to get.

Allen
Reply
Re: Leading or trailing white space characters in file names
#9
Change to:

Proposed LDraw Specification Change Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. In addition, any leading and trailing white space around the filename will be ignored.
Reply
Re: Leading or trailing white space characters in file names
#10
That seems reasonable.
Reply
Re: Leading or trailing white space characters in file names
#11
If there aren't any arguments against it please call for a vote.

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#12
The current spec says: "The whitespace characters allowed for keyword and parameter separation include spaces and tabs". I don't think this is clear and or complete at all.

So maybe replace:

Quote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. The whitespace characters allowed for keyword and parameter separation include spaces and tabs. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

with

Quote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

And add a section below it (just like "text based", "text encoding", etc)

Quote:Whitespace

The whitespace used to separate tokens throughout the document may only exist out the following: used to separate tokens may be any of the following:

- space
- horizontal tab
- vertical tab
- form feed

I've removed the 'new line' and 'carriage' return from that list because of the line orientated nature of LDraw.

And as a result of the above we can adjust the line:
Quote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename.

to

Quote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Any leading and/or trailing whitespaces must be ignored.

Although some native speaking English person could probably word all of this better.
Reply
Re: Leading or trailing white space characters in file names
#13
I just canceled my call for votes in light of this.

In my opinion, we cannot add new white space characters to the LDraw file format at this late date. Space and tab should remain the only valid white space characters.

Also, I request that we split whitespace up into two words. The original whitespace text had it as one word, and I matched that in my original wording. However, it should be two words.

Finally, the following text is messed up:

Quote:The whitespace used to separate tokens throughout the document may only exist out the following: used to separate tokens may be any of the following:

I recommend the following wording for the new White Space section:

Quote:The white space characters used to separate tokens throughout the LDraw file may be either space or tab. Both should be treated the same, and any number of contiguous white space characters (1 or more) are allowed.
Reply
Re: Leading or trailing white space characters in file names
#14
Is it relevant to mention here that tokens on line types 1-4 MUST be separated by whitespace? I belive LDView (but maybe only older versions) recognises a leading minus sign as a delimiter (e.g
Code:
1 16  0 0 0  1 0 0  0 1 0  0 0-1 s\subpart.dat
would be valid syntax).
Chris (LDraw Parts Library Admin)
Reply
Re: Leading or trailing white space characters in file names
#15
Chris Dee Wrote:Is it relevant to mention here that tokens on line types 1-4 MUST be separated by whitespace? I belive LDView (but maybe only older versions) recognises a leading minus sign as a delimiter (e.g
Code:
1 16  0 0 0  1 0 0  0 1 0  0 0-1 s\subpart.dat
would be valid syntax).

I do not consider that valid syntax, whether recognized by any program or not. So yes, fields MUST be delimited by whitespace.

I also agree with Travis: space and tab have been the only documented delimiters for many years, and it is inadvisable to add more.

Allen
Reply
Re: Leading or trailing white space characters in file names
#16
I'd say the current spec implies that white space is required between each parameter, but doesn't outright state such. I agree that this should be added. How about the following as the content of the proposed new White Space section:

Quote:Command parameters for every line type must be separated by white space. The white space characters used to separate these parameters may be either space or tab. Both should be treated the same, and any number of contiguous white space characters (1 or more) are allowed.

We also should decide whether it's valid to have white space before the line type number at the beginning of the line. I think this is also ambiguous.

Just as a note, while the described LDView behavior doesn't surprise me now that it has been pointed out, I don't think anyone has ever mentioned it to me before. And while the current spec may be somewhat ambiguous, I suspect that if you asked any part authors if your sample line was valid, they would say no. I'll update LDView to generate an error (assuming we agree to require white space between command parameters).
Reply
Re: Leading or trailing white space characters in file names
#17
As a minor point, I believe that LDView will in fact reject the sample line, but only because it has special handling of type 1 lines to determine the exact location of the filename. So if anyone tests the above and sees that LDView rejects it, that shouldn't be taken as an indication that LDView isn't broken. A similar missing white space in any other line type (2-5) will probably be ignored by LDView.

For the developers out there, the following sscanf() should succeed on Chris's sample line:

Code:
if (sscanf(line, "%d %d %f %f %f %f %f %f %f %f %f %f %f %f %s", &lineType, &colorNumber, &x, &y, &z, &a, &b, &c, &d, &e, &f, &g, &h, &i, filename) == 15)
{
    // Success!
}

(Note that it has other problems, since %s stops at the first white space character, so it wouldn't work with filenames containing spaces.)
Reply
Re: Leading or trailing white space characters in file names
#18
Travis Cobbs Wrote:We also should decide whether it's valid to have white space before the line type number at the beginning of the line. I think this is also ambiguous.

I think if we go with the tokens separated by with space text above, leading white space on a line is automatically valid. The only exception is the file name which at it start 'disables' the normal white space / token behavior.
Reply
Re: Leading or trailing white space characters in file names
#19
Roland Melkert Wrote:
Travis Cobbs Wrote:We also should decide whether it's valid to have white space before the line type number at the beginning of the line. I think this is also ambiguous.

I think if we go with the tokens separated by with space text above, leading white space on a line is automatically valid. The only exception is the file name which at it start 'disables' the normal white space / token behavior.

I don't consider leading whitespace on a line to be valid. I think there's a difference between "separates" and "starts with." My parser will accept it some places by accident, but considers it a syntax error in many other places.

Have files containing lines with leading whitespace been observed in the wild?

Allen
Reply
Re: Leading or trailing white space characters in file names
#20
Leading whitespace on line types 1 to 5 is very common in the official library, as these were introduced by the inlining function of at least one older authoring tool. In the past, I believe script that makes "~Moved to" files also put a leading space before the Type 1 line.
Chris (LDraw Parts Library Admin)
Reply
Re: Leading or trailing white space characters in file names
#21
Out of curiosity, I did a search in the parts library on my computer using the following regex:
Code:
^[ \t]+[0-5]
(The regex matches <beginning of line><one or more spaces or tabs><number between 0 and 5>.)

It resulted in 113,134 matching lines in 1217 files. For comparison's sake, removing the leading white space from the regex resulted in 1,093,769 matching lines in 6553 files, which means that over 10% of the lines in LDraw files have leading white space.

So I think we should formalize that white space is allowed before the line type.

On a somewhat related, but separate note, I think that we should disallow white space before the 0 on the first line of an LDraw file, because the "0 " at the beginning of the file can be used as something of a magic number for the LDraw file type.
Reply
Re: Leading or trailing white space characters in file names
#22
Chris Dee Wrote:Leading whitespace on line types 1 to 5 is very common in the official library, as these were introduced by the inlining function of at least one older authoring tool. In the past, I believe script that makes "~Moved to" files also put a leading space before the Type 1 line.

Fortunate then that the lines I would reject are all file structure metas like 0 FILE, 0 STEP, etc.

I guess I have to fix my parser!

Allen
Reply
Re: Leading or trailing white space characters in file names
#23
Travis Cobbs Wrote:On a somewhat related, but separate note, I think that we should disallow white space before the 0 on the first line of an LDraw file, because the "0 " at the beginning of the file can be used as something of a magic number for the LDraw file type.

I'm not sure I completely understand what you mean here, but isn't this a 'header issue' instead of a 'file format' issue?
Reply
Re: Leading or trailing white space characters in file names
#24
Many file types have a "magic number" at the beginning so programs can determine whether or not a file is of a given type without having to rely on the file's extension. For example, the first four characters of a GIF file are GIF8 (followed by either 7 or 9a, depending on the GIF version). Given LDraw's original generic .dat extension, having a magic number at the beginning of LDraw files could be useful. And since the first line is supposed to be a comment with the file's description (and is required to be that for parts), disallowing white space before the 0 would give something of a magic number (although not a very unique one). It's not perfect, but I think it's better than nothing.
Reply
Re: Leading or trailing white space characters in file names
#25
Ah, ok.

I thought you mend like is a part or model Smile

In this light I would agree on advising the removal of leading white space for the first line. But I don't think it should be part of the format, only a recommendation in order to keep the format simple.
Reply
Re: Leading or trailing white space characters in file names
#26
I agree that there should be no leading space on type 0 lines.
Chris (LDraw Parts Library Admin)
Reply
Re: Leading or trailing white space characters in file names
#27
I'm OK with this, but I will point out that the current official library contains 11 parts and 2 primitives that contain a total of 31 type 0 lines with leading white space. Since the list is relatively small, I'm including it here:

Code:
C:\LDRAW\p\48\3-16edge.dat(20): 0
  C:\LDRAW\p\48\3-16ndis.dat(22): 0
  C:\LDRAW\parts\2435.dat(825): 0 BFC INVERTNEXT
  C:\LDRAW\parts\2920.dat(104): 0 BFC INVERTNEXT
  C:\LDRAW\parts\2920.dat(106): 0 BFC INVERTNEXT
  C:\LDRAW\parts\2920.dat(117): 0 BFC INVERTNEXT
  C:\LDRAW\parts\2920.dat(124): 0 BFC INVERTNEXT
  C:\LDRAW\parts\2920.dat(126): 0 BFC INVERTNEXT
  C:\LDRAW\parts\30235.dat(17): 0 1 16 0 32 0 40 0 0 0 -20 0 0 0 16 box5.dat
  C:\LDRAW\parts\30235.dat(436): 0
  C:\LDRAW\parts\30382.dat(17): 0 TEXTURE END
  C:\LDRAW\parts\4022.dat(320): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4022.dat(371): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4022.dat(409): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(16): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(56): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(65): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(68): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(71): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4035.dat(74): 0 BFC INVERTNEXT
  C:\LDRAW\parts\4215ap66.dat(14): 0 Red Cross Pattern
  C:\LDRAW\parts\4215ap66.dat(19): 0 Background
  C:\LDRAW\parts\6199.dat(42): 0 Tile  2 x  6
  C:\LDRAW\parts\6199.dat(43): 0 Created by SimLego v 0.31
  C:\LDRAW\parts\7930.dat(22): 0 BFC INVERTNEXT
  C:\LDRAW\parts\7930.dat(26): 0 BFC INVERTNEXT
  C:\LDRAW\parts\7930.dat(28): 0 BFC INVERTNEXT
  C:\LDRAW\parts\7930.dat(30): 0 BFC INVERTNEXT
  C:\LDRAW\parts\7930.dat(32): 0 BFC INVERTNEXT
  C:\LDRAW\parts\973p8a.dat(15): 0 COMMENT neck mark
  C:\LDRAW\parts\s\3739a.dat(300): 0 2 24 29.54 0 5.21  28.19 0 10.26
Reply
Re: Leading or trailing white space characters in file names
#28
In principle I'm against restricting anything without a good reason. So my question: why allow leading white space on 1..5 and not on 0 ?

Except for the file type issue Travis raised (which only affects the first line), I don't see any good reason for issuing this restriction.

Inlining can be a very powerful tool when working with (very) large ldr files. So why suddenly disallowing inlining 0 lines, which basically invalidates all files using inlined coded for readability purposes.

Just imagine writing e.g. a c++ program where it (only) isn't allowed to inline the 'for' statement.

For example:

Code:
0 my file
1 ..
1 ..

0 start of some sub construction (e.g. building book 1/2)
1 ..
1 ..
0 don't forget this next part is a tmp stand in
1 ..

  0 start of some sub sub construction (e.g. generated flex tube)
  1 ..
  1 ..
  0 end of sub sub

1 ..
1 ..
0 end of sub

I know you could use mpd's and or subfiles but, sometimes you just need every thing in a single file that also needs to remain readable.

And besides working with a huge mpd can be simplified inlineing the submodels per '0 FILE' statement, this would also shift the headers of those subfiles which would make the whole mpd invalid using the proposed restriction.

So imho such a restriction doesn't add anything to the spec (and the users), it only takes something away.

edit: Spell check was disabled in my browser for some reason.
Reply
Re: Leading or trailing white space characters in file names
#29
Yes, you are right. Let's just apply it to the first (part description) line.
Chris (LDraw Parts Library Admin)
Reply
Re: Leading or trailing white space characters in file names
#30
I agree with Roland as well. Whitespace everywhere! (Recanting my earlier belief that leading whitespace is wrong.)

Allen
Reply
Re: Leading or trailing white space characters in file names
#31
Here's a summary:

Replace in http://www.ldraw.org/Article218.html#files:

Quote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. The whitespace characters allowed for keyword and parameter separation include spaces and tabs. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

with

Quote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

Add a section below it:

Quote:White space

Command parameters for every line type must be separated by white space. The white space characters used to separate these parameters may be either space or tab. Both should be treated the same, and any number of contiguous white space characters (1 or more) are allowed.

And as a result of the above we can adjust the line:

Quote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename.

to

Quote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Any leading and/or trailing whitespaces must be ignored.

Add to http://www.ldraw.org/Article398.html

PartDescription is the descriptive name of the part ...

Quote:Leading white space in front of this very first linetype 0 is not allowed.

If we all agree I'm going to call for a vote.

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#32
I think the new white space section should be modified to add one more sentence on the end, like so:

Quote:White space

Command parameters for every line type must be separated by white space. The white space characters used to separate these parameters may be either space or tab. Both should be treated the same, and any number of contiguous white space characters (1 or more) are allowed. Any number of white space characters may also precede the line type.

The reason for my suggestion is that without it, the legality of preceding white space characters is somewhat ambiguous. Everything else you had looks good to me.
Reply
Re: Leading or trailing white space characters in file names
#33
Fine with it.

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#34
Maybe it's me not being a native English reader, but to me it could be interpreted like you have to pick ether space or tab for white space. Maybe use 'any combination of white space characters' somewhere?

As a result of the recent discussions concerning the linetype 0 in respect to the first line I would also like to change

Quote:Leading white space in front of this very first linetype 0 is not allowed"

into

Quote:Leading white space in front of this very first linetype 0 is discouraged"

Otherwise you will break inlinineing of mpd's. Actually it's more a recommendation to the very first line of the file whatever it's linetype might be.
Reply
Re: Leading or trailing white space characters in file names
#35
Roland Melkert Wrote:Maybe it's me not being a native English reader, but to me it could be interpreted like you have to pick ether space or tab for white space. Maybe use 'any combination of white space characters' somewhere?

Agree.

Roland Melkert Wrote:"Leading white space in front of this very first linetype 0 is discouraged"

I view discouragements as a means of highlighting cross-platform or serious legacy software pitfalls. In this case, I don't see any great advantage in a discouragement. The practice has already escaped into the wild, and I think clarity and decisiveness would be best here.

More knowledgable people, please correct me.

Allen
Reply
Re: Leading or trailing white space characters in file names
#36
Willy Tschager Wrote:Here's a summary:

Replace in http://www.ldraw.org/Article218.html#files:

Quote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

I think the phrase "Every command starts with a number" is misleading, as a cursory reading would indicate that leading whitespace is forbidden. I also think the whole logical flow of the paragraph was/is jumbled. I would prefer the following:

Quote:Every line of the file contains one command. A command consists of optional leading whitespace, a number indicating the line type, and whitespace-delimited fields. The function and format of the line is determined by the line type number, while the fields contain the line's data. There is no line length restriction. Most commands are independent of other lines, although there are a few rare exceptions in which a line modifies the behaviour of one or more following command lines. For example, the BFC meta-commands change the interpretation of subsequent geometry.

A serious problem with both this text, and the original, is that it appears to disallow empty or whitespace-only lines. I think that should also be addressed.

Allen
Reply
Re: Leading or trailing white space characters in file names
#37
In regards to the latest posts it seems we all agree on the spec changes. However formatting the text seems to be the current problem.

I think the best way of doing this is rewriting the introduction (specifying the entire documents (basic) parse rules.) Followed by noting the exceptions in the affected line-type texts (e.g. comments and filename).

I'm willing to give this a try somewhere this week, unless one of you is already working on it.

On a side note: It seems we forgot to increase the spec doc's version after the changes of last year.
Reply
Re: Leading or trailing white space characters in file names
#38
Roland Melkert Wrote:I'm willing to give this a try somewhere this week, unless one of you is already working on it.

Roland, are you working on this?

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#39
Yes, although I had less time for it than planned due to work etc.

Hoping to post it this weekend.
Reply
Re: Leading or trailing white space characters in file names
#40
Ok it took me somewhat longer than planned, but here goes:

Replace Wrote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. The whitespace characters allowed for keyword and parameter separation include spaces and tabs. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

With Wrote:Basic parsing / file content

A LDraw file consists of single lines separated by DOS format line breaks (#13#10). Each line consists out of a number of tokens separated by whitespace. Whitespace is a string of one or more characters made out of any combination of spaces (#32) and/or tabs (#9). Lines with zero tokens are allowed and should be ignored.

If a line is non empty the first token must be a single digit numeric value. This number identifies the linetype and dictates further parsing of tokens for that line. The parsing rules per linetype follow below.

Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

"Basic parsing / file content" being a bullet just like "Text bases" etc.

Next adjust the comments part....

Add before start of meta part Wrote:Please note for any commentline normal token parsing is disabled, the value of a comment must be threaded as a single string value starting with the first non whitespace character after the '0' token up to the end of the fileline excluding trailing whitespace

And..

Replace Wrote:A META command is a statement used to tell an LDraw compatible program to do something. There are currently many official META commands and even more unofficial META commands. In a META command, a keyword follows the line type in the line. The keyword must be in all caps. The generic META line format is:

0 !<META command> <additional parameters>

Where:

With Wrote:A META command is a statement used to tell an LDraw compatible program to do something. There are currently many official META commands and even more unofficial META commands.

A meta command line, in essence, is a special comment. So to detect them any parser must take a second look at a comment line after it has been read like described above. The first token in the raw comment string is used to identify the command and dictates how (if at all) to parse the remaining of the line. The general format of a command line is (Please note the 'missing' '0' token, this was stripped while parsing the comment):

!<META command> <additional parameters>

Where:

Finally adjust the file reference part...

Replace Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename.

With Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Please note normal token separation is disabled for the filename value. The filename consist of all the characters read from the start (first non whitespace character after last token) up to the end of the fileline excluding trailing whitespace.

Hope this is (more) clear, although a native English speaking person might want to tweak this somewhat. Only part I'm not too sure about is the meta command part, i think it could be difficult to understand for people new to the format. It might need a deeper rewrite?
Reply
Re: Leading or trailing white space characters in file names
#41
Thx Roland. This is far more than I expected.

Roland Melkert Wrote:A LDraw file consists of single lines separated by DOS format line breaks (#13#10).

This is already laid out here:

http://www.ldraw.org/Article512.html#termination

So we have to decide where it belongs.

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#42
Travis, Chris, Allen, any comments?

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#43
Sorry. When I first saw his post, I was just skimming through new posts, and saw how long it was and put it off to look at later. Then, I forgot to come back later.

I don't see any value in the added paragraph regarding comments intended to be inserted just before the Meta segment. Maybe I'm missing something, but I just don't see how it adds to the document. I'm not dead-set against adding the text, but at the moment, I don't see how it helps.

I believe that the line termination should be part of the official LDraw spec. However, given the results of Jim DeVona's experiment from two years ago, I think perhaps the official spec should allow both "DOS" (<CR><LF>) and "Unix" (<LF>) line endings. I'm OK with making DOS line endings the only ones that are officially allowed, but feel this might actually encourage software authors to explicitly only support DOS line endings.
Reply
Re: Leading or trailing white space characters in file names
#44
Travis Cobbs Wrote:Sorry. When I first saw his post, I was just skimming through new posts, and saw how long it was and put it off to look at later. Then, I forgot to come back later.

Me too. The scope of these changes transcended the original discussion, and I forgot to loop back around.

Roland did a lot of hard work here, and has good ideas for changes. Thanks Roland! However, I think we should table these new issues until resolving the immediate issue (just to keep us on a single topic; re-introduce them afterward):
  • Line endings
  • Comments/Meta rewrite

I suggest the following revised text, in which I hope I've capture the spirit of Roland's work while avoiding raising new issues. I want to make the leading whitespace issue explicit. While framing the discussion in terms of tokens is technically accurate, I think it makes the leading whitespace issue subtle enough to be accidentally overlooked.

Replace Wrote:Every line of the file contains one command. With few exceptions, every command is independent of other lines. The exceptions are the BFC meta-commands which modify the behaviour of one or more following command lines. There is no line length restriction. The whitespace characters allowed for keyword and parameter separation include spaces and tabs. Every command starts with a number, called a line type. The function and format of the line is determined by the line type.

With Wrote:Basic parsing / file content

An LDraw file consists of one command per line. There is no line length restriction. Each command consists of optional leading whitespace followed by whitespace-delimited tokens. Some commands also have trailing arbitrary data which may itself include internal whitespace; such data is not tokenized, but treated as single unit according to the command.

Whitespace is defined as one or more spaces (#32), tabs (#9), or combination thereof.

Lines may also be empty or consist only of whitespace. Such lines have no effect.

If a line is non-empty, the first token must be an integer from the list of valid Line Type numbers. This number dictates further parsing of tokens for that line. The parsing rules per linetype follow below.

Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

Replace Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename.

With Wrote:<file> is the filename of the sub-file referenced and must be a valid LDraw filename. Any leading and/or trailing whitespace must be ignored. Normal token separation is otherwise disabled for the filename value.
Reply
Re: Leading or trailing white space characters in file names
#45
I'm ok with this version.

I'm also ok with forking the line endings and comment lines issues.

My main reason for adjusting the comment paragraph was to make it clear it should be handled as a single string to avoid something like html parsing (where all whitespace is replaced by a single space).

As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).
Reply
Re: Leading or trailing white space characters in file names
#46
Roland Melkert Wrote:I'm also ok with forking the line endings and comment lines issues.

Just to be clear, I want to break up the discussion into individual issues because it is easier to keep track of everything.

Roland Melkert Wrote:My main reason for adjusting the comment paragraph was to make it clear it should be handled as a single string to avoid something like html parsing (where all whitespace is replaced by a single space).

I figured as much; the line about the "trailing arbitrary data" was intended to address that problem. I agree with you the comment section needs a re-write, but I think it has a few other issues too.

Roland Melkert Wrote:As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).

It is de facto, just not formalized. This whole topic is worth its own separate discussion, although it might be a very short discussion. (Bricksmith also writes out CRLF, by the way.)

Allen
Reply
Re: Leading or trailing white space characters in file names
#47
This works for me too.
Reply
Re: Leading or trailing white space characters in file names
#48
Allen Smith Wrote:
Roland Melkert Wrote:As for the line endings I was (mistakenly) under the impression the DOS format was defacto. This also sneaked in my LDCad project, it reads anything but always write DOS breaks (even in the Linux version).

It is de facto, just not formalized. This whole topic is worth its own separate discussion, although it might be a very short discussion. (Bricksmith also writes out CRLF, by the way.)

Allen

DOS line-endings are de facto. It may not be formalized, but for the official Parts Library it is enforced duting Parts Update packaging (yes, there were a few files that were erroneously released with bad endings, due to some buggy checking, but they have since been fixed).
Chris (LDraw Parts Library Admin)
Reply
Re: Leading or trailing white space characters in file names
#49
Allen Smith Wrote:
' Wrote:Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

I don't dislike Travis' proposal of the not leading first-line whitespace and since it is not mandatory ...

w.
LEGO ergo sum
Reply
Re: Leading or trailing white space characters in file names
#50
Willy Tschager Wrote:
Allen Smith Wrote:
' Wrote:Although not mandatory, it is recommended to not use leading whitespace before the linetype token on the fist line of a file. This to assist software in determining the character encoding of the file.

I am of the opinion to completely strike the disclaimer about first-line leading whitespace.

I don't dislike Travis' proposal of the not leading first-line whitespace and since it is not mandatory ...

The main trouble is that if it's not mandatory, it is of no real use to assist software in determining the file type, and thus doesn't really need to exist. What do others think?

If it stays, the phrase "character encoding" needs to change to "determining whether the file is an LDraw file."

Allen
Reply
« Next Oldest | Next Newest »



Forum Jump:


Users browsing this thread: 1 Guest(s)