Proposal: #ModuleNameCharacters (revised)
David M. Lloyd
david.lloyd at redhat.com
Tue Jan 3 16:06:53 UTC 2017
I think that this should be a policy decision that is made by the module
system in question. That's really the point of having the JVM use
general rules. And if javac itself is enforcing strict parsing rules,
I'd say the risk is very minimal, especially versus the inconvenience
that module systems like ours will encounter from such restrictions.
I do think that controls are a different story though.
On 01/01/2017 05:43 PM, Remi Forax wrote:
> Re-reading this thread after a message sent privately by Ess Kay aksing why spaces are supported, i think we should disallow 0x20 (space) too,
> having a module not found because a module name has a trailing space will not be fun.
>
> Rémi
>
> ----- Mail original -----
>> De: "mark reinhold" <mark.reinhold at oracle.com>
>> À: jpms-spec-experts at openjdk.java.net
>> Envoyé: Vendredi 9 Décembre 2016 22:45:46
>> Objet: Proposal: #ModuleNameCharacters (revised)
>
>> Issue summary
>> -------------
>>
>> #ModuleNameCharacters --- Module names are presently constrained to
>> be Java identifiers. Some existing module systems allow additional
>> characters in module names, such as hyphens and slashes. Should this
>> restriction be lifted or, perhaps, should it somehow be made
>> layer-specific? [1]
>>
>> Proposal
>> --------
>>
>> Do not change the treatment of module names in source code; they will
>> remain qualified names. Revise the encoding of module names in compiled
>> module-declaration class files to lift the current constraints but adopt
>> new, less onerous constraints that still provide for the future evolution
>> of the platform. Revise the format of class files to structure module
>> and package names in a manner consistent with that already used for other
>> kinds of constrained names.
>>
>> * * *
>>
>> Modules are a new construct of the Java programming language in the
>> present design. In the source language they are hence identified by
>> qualified names [2] in the same manner as the existing structural
>> constructs, i.e., packages and classes. As such these names do allow
>> some unusual characters, though not hyphens or slashes [3].
>>
>> In the very long term a future version of the language may well support
>> not just the declaration of modules, and of relationships between them,
>> but also the expression of operations upon them as is possible in, e.g.,
>> Standard ML [4], or qualified references in code to a type in some other
>> named module, or yet some other kind of use that we do not imagine today.
>> It would hence be unwise at this point to allow module names in source
>> code to be any different in nature than the other kinds of qualified
>> names already in the language.
>>
>> We will therefore retain the present constraints on module names in the
>> source language and also continue to enforce those constraints in the
>> `ModuleDescriptor.Builder` API, which is intended to be consistent with
>> the language. (The `ModuleDescriptor` API will continue to be able to
>> read class files that contain module names not expressible in the source
>> language.)
>>
>> * * *
>>
>> Module names in compiled module-declaration class files are presently
>> encoded in the internal form traditionally used for qualified names:
>> Periods (`.`) are replaced with forward slashes (`/`), and periods,
>> semicolons (`;`), and left square brackets (`[`) are forbidden [5].
>> This encoding is inconvenient for other module systems that may
>> interoperate with JPMS, so we will abandon it for module names despite
>> the fact that doing so will increase the complexity of any code that
>> parses class files.
>>
>> To allow for the future evolution of the platform we propose a different,
>> less onerous encoding of module names in class files:
>>
>> - If at some future point we find that we need to add structure to
>> module names, or combine module names with qualified type names,
>> then the `:` character would be a good candidate, even in the
>> source language if need be, so we reserve that character now.
>>
>> - We presently use `@` in the API to separate module names from
>> version strings, where available, so it is prudent to reserve
>> that character in module names in class files also, just in case
>> we someday decide to introduce compound module identifiers that
>> combine module names with version strings.
>>
>> - In further support of interoperation we will reserve the universal
>> escape character (`\`) and define the sequences `\\`, `\:`, and
>> `\@` to stand for `\`, `:`, and `@`, respectively.
>>
>> - We will finally, for sanity, forbid any character whose Unicode code
>> point is less than 0x20 (` `). (Ideally we'd forbid all Unicode
>> non-printing characters, but it's best not to have the JVMS depend
>> too deeply upon details of the Unicode specification.)
>>
>> To sum up: In module names in class files reserve `:` and `@` for future
>> use; reserve `\` as an escape character and use it to quote itself, `:`,
>> and `@`; and forbid the non-printing ASCII characters (< 0x20).
>>
>> * * *
>>
>> The first version of this proposal [6] claimed that the present design is
>> consistent with the existing treatment of qualified names in class files.
>> That is, in fact, not true, since qualified names in class files today
>> are always wrapped in tagged constant-pool structures rather than simple
>> `CONSTANT_Utf8_info` structures. Class names, e.g., are wrapped in
>> `CONSTANT_Class_info` structures, which in turn reference the `Utf8`
>> structures that represent the actual class names [7].
>>
>> To address this inconsistency, and particularly in light of the new
>> encoding of module names described above, we propose to use consistent
>> kinds of class-file structures for module and package names.
>>
>> Module names in a compiled module-declaration class file will be encoded
>> as above and wrapped in tagged `CONSTANT_Module_info` structures:
>>
>> CONSTANT_Module_info {
>> u1 tag; // == CONSTANT_Module == 19
>> u2 name_index; // Index of a CONSTANT_Utf8_info
>> }
>>
>> Package names in class files will be encoded in the traditional internal
>> form and wrapped in tagged `CONSTANT_Package_info` structures:
>>
>> CONSTANT_Package_info {
>> u1 tag; // == CONSTANT_Package == 20
>> u2 name_index; // Index of a CONSTANT_Utf8_info
>> }
>>
>> Existing references in the class-file format to module and package names
>> will be adjusted to refer to these new kinds of tagged structures.
>>
>>
>> [1] http://openjdk.java.net/projects/jigsaw/spec/issues/#ModuleNameCharacters
>> [2] http://docs.oracle.com/javase/specs/jls/se8/html/jls-6.html#jls-6.2
>> [3] http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8
>> [4] https://en.wikipedia.org/wiki/Standard_ML#Module_system
>> [5] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.2.1
>> [6]
>> http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-November/000468.html
>> [7] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.1
--
- DML
More information about the jpms-spec-observers
mailing list