Valid characters in a module name

Peter Levart peter.levart at gmail.com
Sat Jan 7 06:44:04 UTC 2017


Hi Ess,

I have been reminded that the syntax for "Exotic identifiers" in Java 
language as proposed for JDK 7 but then redrawn was using '#' character 
as a prefix in front of a classical string literal:

http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/001131.html

I accidentally replaced it with a syntax for Obj-C NSString literals 
which uses '@'.

On 01/07/2017 01:40 AM, Ess Kay wrote:
> As far as I can tell, the complete string @"What a wonderful world!" 
> is itself a valid module, package, class, field and method name.

And using the syntax for exotic identifiers, it would be expressed as:

#"@\"What a wonderful world!\""


> The '@' character has a reserved status in a module name but the JVM 
> spec says that it may appear with some yet to be published meaning.  
> Almost every possible string of printable characters is a valid 
> module, package, class, field and method name.  For example, the 
> string \u0022\" is a valid 8 character Java field or method name.

Written with exotic identifier syntax as:

#"\\u0022\\\""

> The string \\u0022\\" is a valid 10 character Java field or method 
> name. So a solution that uses escape characters is not as obvious as 
> it may appear at first glance.You could even throw in a leading, 
> embedded and trailing space and it would still be valid.

No problem. A sequence of any unicode characters is expressible as a 
string literal and consequently as an exotic identifier when prefixed 
with #.

>
> I haven't yet tested this but, prima facie, even non-printable 
> characters such as backspaces and carriage returns are permitted in 
> package, class, field and method names (but not module names.)  Does 
> the JVM support some escaping scheme to allow such characters in JAR 
> manifests and service provider specifications? If the answer is yes 
> then what is it?  If the answer is no then doesn't  that demonstrates 
> the absurdity of the situation?

It appears that NUL, CR, and LF can't be part of header values in JAR 
manifests, but other characters can:

http://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#Manifest_Specification

/Notes on Manifest and Signature Files//
//
//    Line length://
//        No line may be longer than 72 bytes (not characters), in its 
UTF8-encoded form. If a value would make the initial line longer than 
this, it should be continued on extra lines (each starting with a single 
SPACE).//
//
//    Limitations://
//        Because header names cannot be continued, the maximum length 
of a header name is 70 bytes (there must be a colon and a SPACE after 
the name).//
//        NUL, CR, and LF can't be embedded in header values, and NUL, 
CR, LF and ":" can't be embedded in header names.//
//        Implementations should support 65535-byte (not character) 
header values, and 65535 headers per file. They might run out of memory, 
but there should not be hard-coded limits below these values.//
/

Regards, Peter

>
> So at this point Alan's suggested initial 'do nothing' approach is 
> attractive. At this point the flexibility that the JVM spec gives is 
> totally gratuitous in that no one as yet appears to have had any 
> reason to make use of it.
>
> On Sat, Jan 7, 2017 at 8:45 AM, Peter Levart <peter.levart at gmail.com 
> <mailto:peter.levart at gmail.com>> wrote:
>
>     Hi Ess,
>
>
>     On 01/06/2017 05:27 AM, Ess Kay wrote:
>>>     chances of meeting a module-info.class with funky module names is low
>>     When I raised the initial question, I had no idea that the Java verifier
>>     had been changed (with Java 6?) to allow "funky" package, class, field and
>>     method names. Somehow that change passed right under the radar. Yes - a
>>     possible option would be to simply ignore the broad character range allowed
>>     by the JVM specification and trust that in practice no one would actually
>>     use the usual characters in package, class, field, method or module names.
>>     A downside to that option is that we will no longer be able to say to our
>>     users that we fully support the JVM specification which in some cases can
>>     be a problem. Anyway, I guess it is time to accept the overwhelming inertia
>>     of the status quo and move on to the next problem.
>
>     If I remember correctly, there was a crazy proposal in the past to
>     specify a syntax for arbitrary symbol names in Java. It went
>     roughly like:
>
>     @"the syntax of Java string in here"
>
>
>     So you could write code like:
>
>
>     public class @"What a wonderful world!" {
>         public static void @"Let's party..."() {
>         }
>     }
>
>     //
>     @"What a wonderful world!".@"Let's party..."();
>
>
>     You could adopt this in your tool, what do you think?
>
>     Regards, Peter
>
>



More information about the jigsaw-dev mailing list