Proposal: #ModuleNameCharacters

mark.reinhold at oracle.com mark.reinhold at oracle.com
Wed Dec 7 23:40:17 UTC 2016


2016/12/6 0:08:58 -0800, forax at univ-mlv.fr:
> 2016/11/29 16:11:02 -0800, mark.reinhold at oracle.com:
>> ...
>> 
>> As I wrote in my reply to David, I'm open to lifting the traditional
>> restrictions on the class-file representation of qualified names in the
>> case of module names. 
> 
> Ok, cool.
> 
>>                         Given the weight of tradition and the past value
>> of the existing constraints, however, I'd like to have a more compelling
>> reason than "some future hypothetical module system might need this
>> flexibility".
> 
> Existing constraints exist because a package name is a part of a
> qualified class name. There is no tradition for module names. Module
> names in the class file are not mixed with other constrained names, so
> i see no compelling reason to add arbitrary rules to try to restrict
> module names.

Okay, okay ... taken together with David's examples, I get the point.

(Personally I've always considered the whole `.`-to-`/` mapping kind
 of archaic anyway.)

> Note that, JLS module names have to be parsed by the compiler, so for
> JLS module names, having the same constraints as any other qualified
> identifiers make sense, but here, we're talking about module names in
> the JVM spec, not in the JLS.

Correct.

> Now, the constant pool is typed and structured, if we want to have
> constraints on module names, in my opinion, we should introduce a new
> constant pool item to make it clear that module names are not plain
> names but specific names exactly like there is a Class constant pool
> item.

Agreed.  This is, in fact, an inconsistency in the present proposal,
since it imposes constraints on otherwise untagged CONSTANT_Utf8
structures.  If we're going to impose constraints on free-standing
module and package names then we should introduce the obvious new
`CONSTANT_Module_info` and `CONSTANT_Package_info` structures.

> And with my ASM hat, having to add replace('.', '/') and replace('/',
> '.') at the right places is error prone, if we can avoid that is a big
> win in term of usability.

Yep.

>> In trying to think about the future I do wonder if, today, we should
>> reserve a character or two just in case we discover five or ten years
>> from now that we need to add more structure to module names.  Should
>> we set aside `:`, or perhaps some other character, just in case?
> 
> if we want structure, we will add another constant pool item. It's
> what valhalla does for parameterized types.

So the question is, then: Which, if any, characters should we reserve?

Peering into the myriad alternate visions swirling around in my cloudy
crystal ball, I can see:

  - A structured namespace of modules.  `:` is a logical separator here,
    even in the source language if need be, so let's reserve it now in
    class files.

  - Module names encoded in class files together with specific version
    strings, to form compound module identifiers.  We already use `@` to
    separate module names from version strings in the module-system API
    (e.g., the result of `ModuleDescriptor::toString`), so let's reserve
    that in class files now.

(This is just my imagination, not specific suggestions for the future!)

Additionally we should reserve the universal escape character (`\`) and
for sanity also forbid any character whose code point is less than 0x20
(` `).  (Ideally we'd forbid all Unicode non-printing characters, but
it's best not to have the JVMS depend upon the Unicode specification.)

To sum up: Reserve `:`, `@`, and `\` for future use, and forbid the ASCII
non-printing characters (< 0x20).

David -- Are these restrictions acceptable in your use cases, or if not
then at least tolerable?  I'm pretty sure I've never seen any of these
characters in Java EE module names, JAR file base names, Maven group or
artifact names, or the other examples you mentioned.

- Mark


More information about the jpms-spec-experts mailing list