Proposal: #ModuleNameCharacters
David M. Lloyd
david.lloyd at redhat.com
Wed Nov 30 02:47:33 UTC 2016
On 11/29/2016 06:08 PM, mark.reinhold at oracle.com wrote:
> 2016/11/22 13:05:13 -0800, david.lloyd at redhat.com:
>> On 11/22/2016 10:49 AM, mark.reinhold at oracle.com wrote:
>>> Proposal
>>> --------
>>>
>>> Make no changes here.
>>
>> TL;DR: we can't accept this proposal as-is. Expounding more below.
>>
>>> Modules are a new construct of the Java programming language in the
>>> present design. In the source language they are hence identified by
>>> qualified names [2] in the same manner as the existing structural
>>> constructs, i.e., packages and classes. As such these names do allow
>>> some unusual characters, though not hyphens or slashes [3].
>>>
>>> Module names in compiled module-declaration class files are recorded in
>>> `CONSTANT_Utf8_info` structures, and thus have fewer constraints.
>>
>> I believe that according to the JVM spec (until now anyway), this field
>> type by itself has no constraints at all, beyond being a valid
>> "modified" UTF-8 string.
>
> It's true that a CONSTANT_Utf8_info can, as such, contain arbitrary UTF-8
> text. The additional restrictions I listed are the same as those already
> in place for other kinds of qualified names, i.e., the names of packages
> and classes. A module name is, after all, just another kind of qualified
> name in the present design.
Right, that's the implementation, but that is a problem for us.
>>> They replace periods (`'.'`) with forward slashes (`'/'`), and disallow
>>> periods, semicolons (`';'`), and left square brackets (`'['`) [4].
>>
>> These name manglings are really just plain weird in this context, and
>> are clearly an implementation artifact. How did we arrive at this
>> place? I feel like it springs from the long-maligned conflation of
>> module descriptors with class files. But modules are not types and
>> AFAICT these restrictions really make no sense from any other
>> perspective than one of implementation.
>
> These restrictions make perfect sense from the perspective of tools that
> manipulate class files, which are likely already to assume that these
> restrictions are observed uniformly for all kinds of qualified names.
...but not for all CONSTANT_Utf8_info fields, obviously. For example,
String constant values use CONSTANT_Utf8_info without requiring name
mangling.
> I'm reluctant to violate that expectation without good reason.
AFAICT nobody has this expectation. Who would expect that a module name
can't have left square brackets? What possible reason could that serve?
> These restrictions have, moreover, ensured a useful degree of freedom for
> the evolution of the platform over the last twenty years. I'm reluctant
> to give that up without good reason.
For types, sure; but this is a module name, not a type or any part of a
type. I just don't see the equivalency.
>>> The present design is, then, consistent with the existing treatment of
>>> qualified names in the language, in class files, and in the Java SE API.
>>
>> I do not believe that any of these statements is sufficient to justify
>> the constraint... more below.
>>
>>> A different module system with a more-flexible naming scheme can easily
>>> refer to JPMS modules, per the agreed interoperation requirement [5].
>>> The requirements do not mandate bidirectional interoperation, which for
>>> this issue would mean that JPMS modules must be able to refer to non-JPMS
>>> modules with non-JPMS names.
>>
>> This is also not sufficient to justify the constraint, though it does
>> help to explain the reasoning why the constraint existed in the first place.
>
> There is no agreed requirement for bidirectional interoperation, nor any
> other agreed requirement that mandates that module names be arbitrary
> strings. There is thus no need, per the requirements, to support such
> module names.
Nevertheless, we must do so, and this particular enhancement does not
seem to be a big "ask".
> In the absence of a requirement to support arbitrary module names I've
> chosen in the present design to be consistent with the other kinds of
> qualified names already used in the language. (I acknowledge that you
> don't think that modules should be a language construct in the first
> place, but that's not a decision that I intend to revisit.) This will
> be the least surprising choice for the vast majority of developers who
> will use this module system.
Again, nobody will have this expectation.
>>> To support that would add significant
>>> complexity to this specification and its implementations.
>>
>> I am sympathetic to this, but I think it needs more discussion,
>> particularly as the proposed complexities have not been explained.
>
> I'm concerned mostly about the complexity of the language specification,
> which affects all users of this module system. If we change the language
> of module declarations to allow arbitrary module names, possibly via some
> sort of quoting scheme, then that's something that every developer would
> have to understand even though it would most likely be of benefit to few.
You don't need a quoting scheme to allow arbitrary names - you just
allow them - and I don't see how this complicates the language
specification in any way: in fact it should simplify it. There's just
no reason to use "internal form".
> I'm concerned also, though to a lesser degree, about the complexity of
> the class-file specification (i.e., the JVMS), the long-term evolvability
> of that specification, and the complexity of code that reads and writes
> class files, though the latter is second-order since it mostly affects
> maintainers of IDEs, compilers, and other kinds of tools rather than
> developers in general. If there's a compelling reason to lift the usual
> restrictions on the representation of qualified names in class files, at
> least in the case of module names, then I'd like to hear it.
I'm just really confused at this line of justification. Making module
names be a qualified name is the very thing that I'm arguing against.
Don't make any change to the restrictions on qualified names, and don't
make module names be qualified names; that's all there is to it.
>> But the other implicit requirement that we have is to ensure that it's
>> possible to adapt Java EE in some reasonable manner. It's my belief
>> that in order to do so we need to be able to create modules with names
>> that match the current constraints for Java EE module names (i.e.
>> effectively no constraint, just valid UTF-8). Name mangling to
>> accommodate this (which would be our only other option) is unnecessarily
>> user-unfriendly at best, and outright incompatible at worst.
>
> I don't expect Java EE modules to map directly to Java SE modules, nor
> do any of the Java EE spec leads with whom I've discussed this issue.
> EE modules and SE modules are completely different kinds of things.
On March 11 of this year (and other occasions) I specifically asked
exactly that, and you said [1] "Of course we have that expectation --
that's why the requirements include an entire section on dynamic
configuration". Did I misunderstand you then or now, or has something
changed? If something has changed, can we get agreement from this and
the Java EE expert groups that Java EE will not make this a requirement
until/unless both EGs agree to do so at some later time (for example,
pending updates in Java 10)?
[1]
http://mail.openjdk.java.net/pipermail/jpms-spec-experts/2016-March/000251.html
--
- DML
More information about the jpms-spec-experts
mailing list