Proposal: #ModuleNameCharacters (revised)

David M. Lloyd david.lloyd at redhat.com
Sat Dec 10 15:24:18 UTC 2016


Well, maybe we can build on this idea: what about making 
ModuleDescriptor.Builder non-final, so we can push validation logic into 
separate overridable methods?  This way the default behavior is 
consistent, and it's much harder to accidentally bypass the logic, but 
it's also still easy to provide a customized scheme.

On 12/10/2016 01:50 AM, Remi Forax wrote:
> No, i hope it's more that ModuleDescriptor will be an interface.
> So we can have our own module descriptor builder.
>
> Rémi
>
>
>
> On December 9, 2016 11:05:02 PM GMT+01:00, "David M. Lloyd"
> <david.lloyd at redhat.com> wrote:
>
>     Whoops, hang on... one problem I didn't spot on my first read-through:
>
>             We will therefore retain the present constraints on module
>             names in the
>             source language and also continue to enforce those
>             constraints in the
>             `ModuleDescriptor.Builder` API, which is intended to be
>             consistent with
>             the language. (The `ModuleDescriptor` API will continue to
>             be able to
>             read class files that contain module names not expressible
>             in the source
>             language.)
>
>
>     So... essentially a custom module system has to generate binary
>     descriptors?  That's going to be a real pain.
>
>
>     On 12/09/2016 03:48 PM, David M. Lloyd wrote:
>
>         +1 here
>
>         On 12/09/2016 03:45 PM, mark.reinhold at oracle.com wrote:
>
>             Issue summary
>             -------------
>
>             #ModuleNameCharacters --- Module names are presently
>             constrained to
>             be Java identifiers. Some existing module systems allow
>             additional
>             characters in module names, such as hyphens and slashes.
>             Should this
>             restriction be lifted or, perhaps, should it somehow be made
>             layer-specific? [1]
>
>             Proposal
>             --------
>
>             Do not change the treatment of module names in source code;
>             they will
>             remain qualified names. Revise the encoding of module names
>             in compiled
>             module-declaration class files to lift the current
>             constraints but adopt
>             new, less onerous constraints that still provide for ! the
>             future evolution
>             of the platform. Revise the format of class files to
>             structure module
>             and package names in a manner consistent with that already
>             used for other
>             kinds of constrained names.
>
>             * * *
>
>             Modules are a new construct of the Java programming language
>             in the
>             present design. In the source language they are hence
>             identified by
>             qualified names [2] in the same manner as the existing
>             structural
>             constructs, i.e., packages and classes. As such these names
>             do allow
>             some unusual characters, though not hyphens or slashes [3].
>
>             In the very long term a future version of the language may
>             well support
>             not just the declaration of modules, and of relationships
>             between them,
>             but also the expression of operations upon them as is
>             possible in, e.g.,
>             Standard ML [4], or qualified references in code to a type
>             in some other
>             named module, or yet some other ki! nd of use that we do not
>             imagine today.
>             It would hence be unwise at this point to allow module names
>             in source
>             code to be any different in nature than the other kinds of
>             qualified
>             names already in the language.
>
>             We will therefore retain the present constraints on module
>             names in the
>             source language and also continue to enforce those
>             constraints in the
>             `ModuleDescriptor.Builder` API, which is intended to be
>             consistent with
>             the language. (The `ModuleDescriptor` API will continue to
>             be able to
>             read class files that contain module names not expressible
>             in the source
>             language.)
>
>             * * *
>
>             Module names in compiled module-declaration class files are
>             presently
>             encoded in the internal form traditionally used for
>             qualified names:
>             Periods (`.`) are replaced with forward slashes (`/`), and
>             periods,
>             semicolons (`;`), and left square brackets (`[`) are
>             forbidden [5]! .
>             This encoding is inconvenient for other module systems that may
>             interoperate with JPMS, so we will abandon it for module
>             names despite
>             the fact that doing so will increase the complexity of any
>             code that
>             parses class files.
>
>             To allow for the future evolution of the platform we propose
>             a different,
>             less onerous encoding of module names in class files:
>
>             - If at some future point we find that we need to add
>             structure to
>             module names, or combine module names with qualified type names,
>             then the `:` character would be a good candidate, even in the
>             source language if need be, so we reserve that character now.
>
>             - We presently use `@` in the API to separate module names from
>             version strings, where available, so it is prudent to reserve
>             that character in module names in class files also, just in case
>             we someday decide to introduce compound module identifiers
>             thatcombine module names with version strings.
>
>             - In further support of interoperation we will reserve the
>             universal
>             escape character (`\`) and define the sequences `\\`, `\:`, and
>             `\@` to stand for `\`, `:`, and `@`, respectively.
>
>             - We will finally, for sanity, forbid any character whose
>             Unicode code
>             point is less than 0x20 (` `). (Ideally we'd forbid all Unicode
>             non-printing characters, but it's best not to have the JVMS
>             depend
>             too deeply upon details of the Unicode specification.)
>
>             To sum up: In module names in class files reserve `:` and
>             `@` for future
>             use; reserve `\` as an escape character and use it to quote
>             itself, `:`,
>             and `@`; and forbid the non-printing ASCII characters (< 0x20).
>
>             * * *
>
>             The first version of this proposal [6] claimed that the
>             present design is
>             consistent with the existing treatment of qua! lified names
>             in class files.
>             That is, in fact, not true, since qualified names in class
>             files today
>             are always wrapped in tagged constant-pool structures rather
>             than simple
>             `CONSTANT_Utf8_info` structures. Class names, e.g., are
>             wrapped in
>             `CONSTANT_Class_info` structures, which in turn reference
>             the `Utf8`
>             structures that represent the actual class names [7].
>
>             To address this inconsistency, and particularly in light of
>             the new
>             encoding of module names described above, we propose to use
>             consistent
>             kinds of class-file structures for module and package names.
>
>             Module names in a compiled module-declaration class file
>             will be encoded
>             as above and wrapped in tagged `CONSTANT_Module_info`
>             structures:
>
>             CONSTANT_Module_info {
>             u1 tag; // == CONSTANT_Module == 19
>             u2 name_index; // Index of a CONSTANT_Utf8_info
>             }
>
>             Package names in clas! s files will be encoded in the
>             traditional internal
>             form and wrapped in tagged `CONSTANT_Package_info` structures:
>
>             CONSTANT_Package_info {
>             u1 tag; // == CONSTANT_Package == 20
>             u2 name_index; // Index of a CONSTANT_Utf8_info
>             }
>
>             Existing references in the class-file format to module and
>             package names
>             will be adjusted to refer to these new kinds of tagged
>             structures.
>
>
>             [1]
>             http://openjdk.java.net/projects/jigsaw/spec/issues/#ModuleNameCharacters
>             [2]
>             http://docs.oracle.com/javase/specs/jls/se8/html/jls-6.html#jls-6.2
>             [3]
>             http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8
>             [4] https://en.wikipedia.org/wiki/Standard_ML#Module_system
>             [5]
>             http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.2.1
>             [6]
>             http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-November/000468.html
>
>             [7]
>             http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.1
>
>
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
- DML


More information about the jpms-spec-experts mailing list