Proposal: #ModuleNameCharacters
David M. Lloyd
david.lloyd at redhat.com
Thu Dec 8 16:26:40 UTC 2016
On 12/08/2016 04:26 AM, forax at univ-mlv.fr wrote:
> ----- Mail original -----
>> De: "mark reinhold" <mark.reinhold at oracle.com>
>> À: forax at univ-mlv.fr
>> Cc: jpms-spec-experts at openjdk.java.net
>> Envoyé: Jeudi 8 Décembre 2016 00:40:17
>> Objet: Re: Proposal: #ModuleNameCharacters
>
>> 2016/12/6 0:08:58 -0800, forax at univ-mlv.fr:
>>> 2016/11/29 16:11:02 -0800, mark.reinhold at oracle.com:
>>>> ...
>>>>
>>>> As I wrote in my reply to David, I'm open to lifting the traditional
>>>> restrictions on the class-file representation of qualified names in the
>>>> case of module names.
>>>
>>> Ok, cool.
>>>
>>>> Given the weight of tradition and the past value
>>>> of the existing constraints, however, I'd like to have a more compelling
>>>> reason than "some future hypothetical module system might need this
>>>> flexibility".
>>>
>>> Existing constraints exist because a package name is a part of a
>>> qualified class name. There is no tradition for module names. Module
>>> names in the class file are not mixed with other constrained names, so
>>> i see no compelling reason to add arbitrary rules to try to restrict
>>> module names.
>>
>> Okay, okay ... taken together with David's examples, I get the point.
>>
>> (Personally I've always considered the whole `.`-to-`/` mapping kind
>> of archaic anyway.)
>>
>>> Note that, JLS module names have to be parsed by the compiler, so for
>>> JLS module names, having the same constraints as any other qualified
>>> identifiers make sense, but here, we're talking about module names in
>>> the JVM spec, not in the JLS.
>>
>> Correct.
>>
>>> Now, the constant pool is typed and structured, if we want to have
>>> constraints on module names, in my opinion, we should introduce a new
>>> constant pool item to make it clear that module names are not plain
>>> names but specific names exactly like there is a Class constant pool
>>> item.
>>
>> Agreed. This is, in fact, an inconsistency in the present proposal,
>> since it imposes constraints on otherwise untagged CONSTANT_Utf8
>> structures. If we're going to impose constraints on free-standing
>> module and package names then we should introduce the obvious new
>> `CONSTANT_Module_info` and `CONSTANT_Package_info` structures.
>>
>>> And with my ASM hat, having to add replace('.', '/') and replace('/',
>>> '.') at the right places is error prone, if we can avoid that is a big
>>> win in term of usability.
>>
>> Yep.
>>
>>>> In trying to think about the future I do wonder if, today, we should
>>>> reserve a character or two just in case we discover five or ten years
>>>> from now that we need to add more structure to module names. Should
>>>> we set aside `:`, or perhaps some other character, just in case?
>>>
>>> if we want structure, we will add another constant pool item. It's
>>> what valhalla does for parameterized types.
>>
>> So the question is, then: Which, if any, characters should we reserve?
>>
>> Peering into the myriad alternate visions swirling around in my cloudy
>> crystal ball, I can see:
>>
>> - A structured namespace of modules. `:` is a logical separator here,
>> even in the source language if need be, so let's reserve it now in
>> class files.
>>
>> - Module names encoded in class files together with specific version
>> strings, to form compound module identifiers. We already use `@` to
>> separate module names from version strings in the module-system API
>> (e.g., the result of `ModuleDescriptor::toString`), so let's reserve
>> that in class files now.
>>
>> (This is just my imagination, not specific suggestions for the future!)
>>
>> Additionally we should reserve the universal escape character (`\`) and
>> for sanity also forbid any character whose code point is less than 0x20
>> (` `). (Ideally we'd forbid all Unicode non-printing characters, but
>> it's best not to have the JVMS depend upon the Unicode specification.)
>>
>> To sum up: Reserve `:`, `@`, and `\` for future use, and forbid the ASCII
>> non-printing characters (< 0x20).
>
> You also need to reserve '/' because the java launcher (-m) use '/' to separate between the module name and the main class.
>
> Rémi
>
>>
>> David -- Are these restrictions acceptable in your use cases, or if not
>> then at least tolerable? I'm pretty sure I've never seen any of these
>> characters in Java EE module names, JAR file base names, Maven group or
>> artifact names, or the other examples you mentioned.
Breaking it down one by one...
':' is going to break two things that I know of: modules generated from
Maven coordinates which have the syntax
<groupId>:<artifactId>:<classifier>, and modules in the JBoss Modules
static loader which have a slot component, using the syntax
"module-name:slot".
'@' might be OK to reserve; I can't think of any specific conflicts,
though we have allowed this character in the past.
'\' is a problem because in JBoss Modules uses that character to escape
':' (particularly in the Maven coordinates case) to avoid mixing up the
slot name with the module name. For a module named `foo\:bar:5`, the
static module loader would treat the name component as "foo:bar" and the
slot as "5", and locate the module accordingly, however the core system
does not treat '\' specially: the proper name of this module would be
`foo\:bar:5` according to the system, and that's the string you would
have to use to load the module by name.
'/' also may be a problem because within our container, we use file
names from the file system as the name of modules that come from the
file system. This also causes a problem for '\' on Windows. We could
possibly work out some kind of alternative in this case, with some
creative thinking.
Definitely in favor of forbidding non-printing code points 0x00 through
0x1F, and probably also 0x80 through 0x9F (we probably don't want to go
any further down the Unicode rabbit hole than that though - at least,
not in the JVM - if we want to get out of here this side of 2020).
--
- DML
More information about the jpms-spec-experts
mailing list