Filenames


Filenames
#1
Hi folks,

The official specification for texture mapping says:

Quote:Filenames that themselves contain double quotes must use \ as an escape character immediately prior to the double quote. The \ character itself must be doubled up (\\) in order to be used to specify a sub-directory, but it is recommended that / be used instead in this case. (In LDraw files, \ and / are interchangeable as path separators.)

I have several questions...

Are these sentences referring to quoted filenames specifically, or all filenames? It's not very clear from the context of the document. The paragraph starts (before the bit I've quoted) by talking about quoted filepaths specifically but then changes tack to talk about appending "textures/" which is surely not specific to a quoted filepath.

It would seem to make sense that a regular non-quoted path to a texture ought to be able to use a single backslash as a separator since it seems to be common practice when referencing parts etc, and the quote that separators are "interchangeable".

Should the backslash "escape character" work with characters other than " and a second backslash or just those two?

The main spec page is even fuzzier about filenames - no mention of spaces or quoting. Is it allowed, disallowed, discouraged? All we seem to get is just:
Quote:"Special characters, such as &, #, |, and ?, should be avoided as they may also cause cross-platform issues and create problems when used in URLs."
The use of the phrase 'such as' lacks clarity: which characters should be discouraged and which are fine? It would surely help people to know. Are any characters disallowed? e.g. are quotes and spaces allowed?

Or am I missing something obvious here?

Thanks everyone!
Reply
RE: Filenames
#2
Filenames are kind of the wild west as they are determined by the user and could be anything including emoji and crazy unicode characters (like: ෴). The only restriction being what the user's OS allows. This is why the filename spec is fuzzy because it's not really controllable by the developer..

We further restrict this in the official library to "a-z", "0-9", "_", and "-". Nothing official will contain anything other than those character and that is enforced by the library software.

In short, if you are writing a parser you are free to impose (or not impose) whatever filename restrictions you wish just be aware that there will always be a user that breaks your rules.
Reply
RE: Filenames
#3
(2024-01-08, 21:32)Orion Pobursky Wrote: Filenames are kind of the wild west as they are determined by the user and could be anything including emoji and crazy unicode characters (like: ෴). ... In short, if you are writing a parser you are free to impose (or not impose) whatever filename restrictions you wish just be aware that there will always be a user that breaks your rules.

Like Orion wrote any unicode character is allowed (limited only to the utf8 subset which is basically everything, and excluding the LDraw nextline characters).

So you only need to do extra work if the file path starts with " by processing the " and \ characters up to the last ". If no " or \ characters are present you can assume the given string can be used 'trimmed'.

If the local filesystem 'likes' the more creative filenames is a different story though.
Reply
RE: Filenames
#4
(2024-01-08, 21:32)Orion Pobursky Wrote: Filenames are kind of the wild west as they are determined by the user and could be anything including emoji and crazy unicode characters (like: ෴). The only restriction being what the user's OS allows. This is why the filename spec is fuzzy because it's not really controllable by the developer..

We further restrict this in the official library to "a-z", "0-9", "_", and "-". Nothing official will contain anything other than those character and that is enforced by the library software.

In short, if you are writing a parser you are free to impose (or not impose) whatever filename restrictions you wish just be aware that there will always be a user that breaks your rules.

Thanks for that clarification, that's good to know.

To answer my own questions:
1. (For texmap filenames) Are these sentences referring to quoted filenames specifically, or all filenames?
Quoted filenames specifically. Non-quoted texmap filenames are always treated literally, terminating in the end of the line or whitespace ( tab(9) or space(32) ).

2. (For quoted filenames) should the backslash "escape character" work with characters other than " and a second backslash or just those two?
Let's say yes it should work with other characters, (as this often is how escape characters work in other contexts)

3. The main spec page is even fuzzier about filenames - no mention of spaces or quoting. Is it allowed, disallowed, discouraged?
Filenames in this context (i.e. not texture map filenames) are interpreted literally, so quotes and spaces have no special meaning - they are regular characters in the filename, and there is no escape character. Basically everything up to the end of the line is a filename, but with whitespace trimmed at each end.

4. The use of the phrase 'such as' lacks clarity: which characters should be discouraged and which are fine?
All UTF-8 characters are allowed. However, filing systems in different operating systems have their own restrictions on the characters allowed in filenames, and other characters create problems for URLs. So for compatibility reasons these should be avoided. In particular the characters below can cause problems:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
? (question mark)
* (asterisk)
& (ampersand)
# (hash)
| (vertical bar or pipe)

Let me know if you disagree with any of these answers, and thanks for your help.
Reply
RE: Filenames
#5
(2024-01-09, 10:11)Toby Nelson Wrote: Thanks for that clarification, that's good to know.

To answer my own questions:
1. (For texmap filenames) Are these sentences referring to quoted filenames specifically, or all filenames?
Quoted filenames specifically. Non-quoted texmap filenames are always treated literally, terminating in the end of the line or whitespace ( tab(9) or space(32) ).

Actually, the first sentence in that paragraph (about putting quotes around filenames with spaces) stands alone. Nothing else in the sentence is dependent on it. So quotes and backslashes anywhere in the filename always need to be escaped with a backslash.

(2024-01-09, 10:11)Toby Nelson Wrote: 2. (For quoted filenames) should the backslash "escape character" work with characters other than " and a second backslash or just those two?
Let's say yes it should work with other characters, (as this often is how escape characters work in other contexts)

No, only double quote and backslash are supported.

(2024-01-09, 10:11)Toby Nelson Wrote: 3. The main spec page is even fuzzier about filenames - no mention of spaces or quoting. Is it allowed, disallowed, discouraged?
Filenames in this context (i.e. not texture map filenames) are interpreted literally, so quotes and spaces have no special meaning - they are regular characters in the filename, and there is no escape character. Basically everything up to the end of the line is a filename, but with whitespace trimmed at each end.

Spaces are allowed in model reference lines, and quoting of filenames is not allowed. Everything from the first character of the model reference filename to the end of the line should be treated as the filename. I believe that the quote wrapping for TEXMAP was done purely so that other tokens could be added after the filename, and that has never been an issue with model filenames.

(2024-01-09, 10:11)Toby Nelson Wrote: 4. The use of the phrase 'such as' lacks clarity: which characters should be discouraged and which are fine?
All UTF-8 characters are allowed. However, filing systems in different operating systems have their own restrictions on the characters allowed in filenames, and other characters create problems for URLs. So for compatibility reasons these should be avoided. In particular the characters below can cause problems:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
? (question mark)
* (asterisk)
& (ampersand)
# (hash)
| (vertical bar or pipe)

Filenames for both textures and model references (outside library parts) are chosen by the end user. As such, we have to assume that they will choose anything that their operating system allows them to choose. So adding to what is already there (along with separate parts library filename restrictions) is counter-productive. Note that both forward and backward slash characters in all filenames must both always treated as directory separators by LDraw-compliant software. As for the others, most should actually work fine. Colon won't work on Windows (and probably also not on Mac). Some of the others cause problems on a command line due to shell interpretation, but that shouldn't affect software that is reading the files.
Reply
RE: Filenames
#6
(2024-01-09, 17:06)Travis Cobbs Wrote: Actually, the first sentence in that paragraph (about putting quotes around filenames with spaces) stands alone. Nothing else in the sentence is dependent on it. So quotes and backslashes anywhere in the filename always need to be escaped with a backslash.
Ok.

Quote:No, only double quote and backslash are supported.
Ok. To clarify, if a backslash is follow by something else (i.e. not double quote or backslash), the backslash should be interpreted as a regular character as part of the filename itself and not as a directory separator. (Note the spec says "The \ character itself must be doubled up (\\) in order to be used to specify a sub-directory...")

I would hope that that this:

    s\\\hello.png

would properly be interpreted as a file "\hello.png" in subdirectory "s" (this is treating the first pair of backslashes as the directory separator), rather than file "hello.png" in a subdirectory named "s\" (regarding the final pair of backslashes as the directory separator).

Quote:Spaces are allowed in model reference lines, and quoting of filenames is not allowed. Everything from the first character of the model reference filename to the end of the line should be treated as the filename. I believe that the quote wrapping for TEXMAP was done purely so that other tokens could be added after the filename, and that has never been an issue with model filenames.
Yes, that makes sense, except I would note that for model reference lines, whitespace before the start and after the end of the filename (including the file's extension) should probably be ignored (so you can't have a filename starting with a space for instance).

Quote:Filenames for both textures and model references (outside library parts) are chosen by the end user. As such, we have to assume that they will choose anything that their operating system allows them to choose.
Agreed, the user can choose whatever they want.

Quote:So adding to what is already there (along with separate parts library filename restrictions) is counter-productive.
I'm adding characters that have special meanings in Windows filenames. I'm not saying these should be banned, just discouraged for the purposes of cross-platform compatibility. In particular I think that parsing software *should* cope with as many of these as it possibly can (ideally all of them). However a user creating or using files with any of these characters present is still more prone to cross platform compatibility problems IMHO.

Quote:Colon won't work on Windows (and probably also not on Mac).
Agreed. It's use should therefore definitely be discouraged.

Quote:Some of the others cause problems on a command line due to shell interpretation, but that shouldn't affect software that is reading the files.
People use shells, and I bet there are Windows file APIs for which they have special meaning. I'd regard them as risky.
Reply
RE: Filenames
#7
For reference: https://learn.microsoft.com/en-us/window...ing-a-file
Reply
RE: Filenames
#8
(2024-01-10, 10:58)Toby Nelson Wrote: Ok.

Ok. To clarify, if a backslash is follow by something else (i.e. not double quote or backslash), the backslash should be interpreted as a regular character as part of the filename itself and not as a directory separator. (Note the spec says "The \ character itself must be doubled up (\\) in order to be used to specify a sub-directory...")

No, a backslash followed by any character other than backslash or double quote is invalid and should cause the entire line to be rejected by LDraw-compliant software.

(2024-01-10, 10:58)Toby Nelson Wrote: I would hope that that this:

    s\\\hello.png

would properly be interpreted as a file "\hello.png" in subdirectory "s" (this is treating the first pair of backslashes as the directory separator), rather than file "hello.png" in a subdirectory named "s\" (regarding the final pair of backslashes as the directory separator).

An LDraw program seeing that texture filename should reject the TEXMAP line entirely.

(2024-01-10, 10:58)Toby Nelson Wrote: Yes, that makes sense, except I would note that for model reference lines, whitespace before the start and after the end of the filename (including the file's extension) should probably be ignored (so you can't have a filename starting with a space for instance).

You are correct; I misspoke. The main LDraw spec later states: Any leading and/or trailing whitespace must be ignored.

(2024-01-10, 10:58)Toby Nelson Wrote: Agreed, the user can choose whatever they want.

I'm adding characters that have special meanings in Windows filenames. I'm not saying these should be banned, just discouraged for the purposes of cross-platform compatibility. In particular I think that parsing software *should* cope with as many of these as it possibly can (ideally all of them). However a user creating or using files with any of these characters present is still more prone to cross platform compatibility problems IMHO.

In my personal opinion, only colon should be mentioned at all (and it isn't on the current list of characters to avoid).

(2024-01-10, 10:58)Toby Nelson Wrote: Agreed. It's use should therefore definitely be discouraged.

People use shells, and I bet there are Windows file APIs for which they have special meaning. I'd regard them as risky.

I don't believe that native file APIs on any OSes have problems with any characters that are valid to use, and filenames referenced inside LDraw files should not ever be accessible from the command line. Any shell script that wants to parse an LDraw file and pass bits and pieces around already is going to have to deal with spaces in the filenames. And I still feel that these specs aren't an appropriate place to try to enumerate potentially problematic characters when the people using the software to create LDraw files will never read the specs.
Reply
RE: Filenames
#9
(2024-01-10, 18:01)Travis Cobbs Wrote: And I still feel that these specs aren't an appropriate place to try to enumerate potentially problematic characters when the people using the software to create LDraw files will never read the specs.

I agree, and also modern filesystems will allow just about anything.

for example I tried this on ext4:
Code:
-rw-rw-r-- 1 roland roland 0 Jan 10 23:57 '   <>:"\?'$'\n''*&#|  '

It's all the 'special' characters mentioned above except the slash, it even has spaces before, after and a new line in it.
Reply
RE: Filenames
#10
(2024-01-10, 18:01)Travis Cobbs Wrote: I don't believe that native file APIs on any OSes have problems with any characters that are valid to use, and filenames referenced inside LDraw files should not ever be accessible from the command line. Any shell script that wants to parse an LDraw file and pass bits and pieces around already is going to have to deal with spaces in the filenames. And I still feel that these specs aren't an appropriate place to try to enumerate potentially problematic characters when the people using the software to create LDraw files will never read the specs.
Thanks for the clarifications.

By definition if characters are 'valid to use', then they will work. But if you read the link I gave above you will see that there are a number of filing systems under Windows that have invalid characters. These are the characters in question.

I agree that users won't be reading the specs, but tool makers will, and they may find it useful. Tools could warn users about potential cross platform issues when writing or verifying the validity of files for example.

(2024-01-10, 23:03)Roland Melkert Wrote: I agree, and also modern filesystems will allow just about anything.

for example I tried this on ext4:
Code:
-rw-rw-r-- 1 roland roland 0 Jan 10 23:57 '   <>:"\?'$'\n''*&#|  '

It's all the 'special' characters mentioned above except the slash, it even has spaces before, after and a new line in it.
Windows has filing systems that are still problematic though right? See the link I gave above for more details on this.

Again, I'm not trying to ban any characters at all. None. They are all valid. The current wording in the specs says that a few special characters "should be avoided", which is not a ban. If it said "must be avoided", that would be an outright ban.

So the question arises: Should the specification document take a position on whether cross platform compatibility and URL compatibility is a good thing at all?
Personally I think it is useful for the reasons I've given. If others think so too, then I'd like to see the list of problem characters extended. It's never going to be a complete solution to cross platform compatibility (since the user can do what they like) but it can't hurt either right?
Reply
RE: Filenames
#11
(2024-01-11, 12:19)Toby Nelson Wrote: So the question arises: Should the specification document take a position on whether cross platform compatibility and URL compatibility is a good thing at all?
I don/t think so, it/s up to the developer to decide depending on their targets.

Personally I don.t want to limit the whole world for the sake of cross-compatibility with out-dated filesystems.

On the etherhand we could state it is recommended to limit names if one is planning to share their stuff worldwide.
Reply
RE: Filenames
#12
(2024-01-10, 11:23)Toby Nelson Wrote: For reference: https://learn.microsoft.com/en-us/window...ing-a-file

I'm pretty sure that NTFS supports most of the characters that that article suggests avoiding. And I feel that the article is quite frankly disgusting to even imply that it's a good idea for any modern software to prevent their users from using all those characters in filenames.

<s>
Code:
1 1 0 0 0 1 0 0 0 1 0 0 0 1 nul
</s>
Reply
RE: Filenames
#13
(2024-01-12, 1:40)Travis Cobbs Wrote: I'm pretty sure that NTFS supports most of the characters that that article suggests avoiding. And I feel that the article is quite frankly disgusting to even imply that it's a good idea for any modern software to prevent their users from using all those characters in filenames.

<s>
Code:
1 1 0 0 0 1 0 0 0 1 0 0 0 1 nul
</s>

NTFS does have some character restrictions, but other filing systems are still being used that have more. Like FAT32 for example.

I don't think the article is hinting at restricting the user's choice. I really don't read it that way. The article is aimed at developers. Developers create and distribute files without user input. For example if I ship an application on Windows, it would be nice to know that any resource files that I choose to ship with that application are going to work across multiple filing systems, not just one.

In the same way, if I distribute ldraw files it would be nice to know they have a chance of working on a wide range of systems.
Reply
RE: Filenames
#14
(2024-01-12, 15:01)Toby Nelson Wrote: In the same way, if I distribute ldraw files it would be nice to know they have a chance of working on a wide range of systems.

This is why the Official Library has more restrictive rules for file naming and more restrictions in general.
Reply
RE: Filenames
#15
(2024-01-11, 22:38)Roland Melkert Wrote: I don/t think so, it/s up to the developer to decide depending on their targets.

Personally I don.t want to limit the whole world for the sake of cross-compatibility with out-dated filesystems.

On the etherhand we could state it is recommended to limit names if one is planning to share their stuff worldwide.

No limitations asked for, just clarification in the specs.

Perhaps it could be moved into a 'Guidelines' section, to emphasise it's more of a 'best practice in some circumstances' than a rule.
Reply
« Next Oldest | Next Newest »



Forum Jump:


Users browsing this thread: 18 Guest(s)