funny characters in identifiers?
BGB
cr88192 at gmail.com
Fri Dec 31 14:36:57 PST 2010
On 12/31/2010 2:25 PM, Per Bothner wrote:
> On 12/28/2010 01:58 PM, Charles Oliver Nutter wrote:
>> On Tue, Dec 28, 2010 at 12:21 PM, Per Bothner<per at bothner.com> wrote:
>>> Is there a plan/consensus for how to handle "illegal" characters
>>> in identifiers? I'm primarily interested in the bytecode level,
>>> not the Java source level. For example identifiers like '/'
>>> used for division in Scheme. It would be good to have a standard
>>> way to deal with this.
>> See John Rose's post on this here:
>> http://blogs.sun.com/jrose/entry/symbolic_freedom_in_the_vm
>>
>> We have implemented it in JRuby, and it works well. The down side is
>> that Java backtraces can be a little hard to read when there's lots of
>> symbolic identifiers.
> A problem with this mangling is that it isn't "safe" for class names,
> or at least not for class files. Using '\' in a filename is obviously
> problematical, especially on Windows. On Posix-based file system the
> funny characters are in principle allowed, but will of course be awkward
> to access from shells and other tools.
>
> Windows disallows the following in file names:
> < (less than)
>> (greater than)
> : (colon)
> " (double quote)
> / (forward slash)
> \ (backslash)
> | (vertical bar or pipe)
> ? (question mark)
> * (asterisk)
> http://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx
> (And of course we have problems with-insensitive file systems.)
>
> Now of course we can use an annotation to specify the source class name
> in case the source class name is invalid - but then we still need to
> mangle the class name somehow.
>
> I think a better prefix character would be '%'. It's not reserved
> for Posix or Windows or JVM, while not being a valid Java character.
> Even better might be '~' or '!' since those are also unreserved for URIs.
> I will assume '~' in the following.
>
> If we want names that a "safe for filenames" or even "safe for URIs"
> then the problem is that there are too many unsafe characters to
> encode as '~' followed a safe non-alphanumeric. Which means that
> we need to use '`' followed by a *letter*.
>
> For example:
> '/' -> '~s' (mnemonic: slash)
> '.' -> '~d' (dot)
> '<' => '~l' (less)
> etc etc
>
> What about non-Ascii characters? I don't know enough to know if
> such characters might cause a problem, but don't know of any reason.
> They might technically be disallowed by URIs, but my impression
> %-mangling is handled somewhat universally and semi-transparently.
just my quick comment...
in my VM, I ended up using a variation on JNI name-mangling for pretty
much anything needing mangling (including filenames...).
however, I did add a few additional escapes (for a few other common
characters), and ended up adding a _9xx escape in addition to the _0xxxx
escape.
list of other escapes:
'_' with '_1';
';' with '_2';
'[' with '_3';
'(' with '_4';
')' with '_5';
'/' with '_6'.
as well, '__' was used as a string-break (mostly when encoding a list of
strings as a single token).
so, little says similar couldn't be used in the class filenames if
needed as well...
dunno if this helps for anything...
More information about the mlvm-dev
mailing list