java.lang.constant.ClassDesc and TypeDescriptor for hidden class??

Wed Apr 15 18:40:48 UTC 2020

Hi Peter, Remi,

Lois and Harold have given further feedback.   They are concerned with 
option c's impact to JVM implementations due to this new form of 
signature whose name is outside the "[;"] envelope  whereas signatures 
are always inside the "[;]" envelopes since day 1. Option c'  appears to 
have a higher compatibility risk not only in existing libraries but 
probably also VM implementations.

Option c has a low compatibility risk while existing code still needs to 
be updated to support hidden classes.  As John advices [1], it would be 
a mistake for `Class::descriptorString` to throw an exception in the 
long run.  We have exhausted the differences among these options.

Spec change proposal is:
- extend TypeDescriptor for entities that cannot be described 
nominally.  If it can be described nominally, 
TypeDescriptor::descriptorString returns a field/method descriptor 
conforming to JVMS 4.3.  If it cannot be described nominally, the result 
string is not a type descriptor.  No nominal descriptor can be produced.
- specify in the javadoc for Class::descriptorString and 
MethodType::descriptorString that the result string when it can be 
described nominally conforming to JVMS 4.3 or when it cannot be 
described nominally.
- No JVMS change

Here is the updated specdiff and webrev:
Specdiff:
http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/specdiff-descriptor-string/overview-summary.html

Webrev:
http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/webrev.06-descriptor-string/

Please review.
Thanks
Mandy
[1] 
https://mail.openjdk.java.net/pipermail/valhalla-dev/2020-April/007093.html

On 4/14/20 2:16 AM, Peter Levart wrote:
>
>
> On 4/13/20 9:09 PM, Mandy Chung wrote:
>>
>>
>> On 4/12/20 5:14 AM, Remi Forax wrote:
>>>> The problem is not that 'c' is easier to parse, but that 'c`' is not
>>>> parsable at all. Do we really want unparsable method descriptors?
>>>>
>>>> If the problem is preventing resolving of hidden class names or
>>>> descriptors, then it seems that making the method descriptors 
>>>> unparsable
>>>> is not the right place to do that.
>>> I agree with Peter,
>>> throwing an exception is better, there is no way to encode a hidden 
>>> class in a descriptor because a hidden class has no name you can 
>>> lookup,
>>> if the API return an unparsable method descriptor, the user code 
>>> will throw an exception anyway.
>>>
>>
>> Several points that are noteworthy:
>>
>> 1. A resolved method never has a hidden class in its signature as a 
>> hidden class cannot be discovered by any class loader.
>
> Right.
>
>>
>> 2. When VM fails to resolve a symbolic reference to a hidden class, 
>> it might print its name or descriptor string in the error message.  
>> Lois and Harold can confirm if this should or should not cause any 
>> issue (I can't see how it would cause any issue yet).
>>
>> 3. The only way to get a method descriptor with a hidden class in it 
>> is by constructing `MethodType` with a `Class` object representing a 
>> hidden class.
>
> Or by custom code that manipulates class descriptors using String 
> operations. Suppose there's code that doesn't want to eagerly resolve 
> types and just manipulates Strings. Surely a class descriptor of a HC 
> can only be obtained when there *is* a HC already present, but ... 
> never underestimate programmers' imagination when (s)he is combining 
> information from various sources, some of them might be resolvable 
> types, some might be just descriptors, etc...
>

True.  Our imagination is powerful!

The main thing is that a bad method descriptor will fail to resolve.

>>
>> 4. `Class::descriptorString` on a hidden class is human-readable but 
>> not a valid descriptor (both option c and c')
>>
>> 5. The special character chosen by option c and c' is an illegal 
>> character for an unqualified name ("."  ";" "[" "/" see JVMS 4.2.2).  
>> This way loading a class of the name of a hidden class will always 
>> get CNFE via bytecode linkage or Class::forName etc (either from 
>> Class::getName or mapped from Class::descriptorString).
>
> Right. The JVMS may remain unchanged. But that doesn't mean that 
> Class.descriptorString() couldn't be specified to return a JVMS valid 
> descriptor for classical named types, while for HCs (or derived types 
> like arrays) it would return a special unresolvable descriptor with 
> carefully specified syntax. Such a syntax that would play well when 
> composed into the syntax of higher-level descriptors like method type 
> descriptor. Why would we want that? Because by that we get a more 
> predictable failure mode. We only fail when/if the type described by 
> such descriptor tries to be resolved.
>
> In this respect, both variants 'c' and 'c`' as you said, violate JVMS 
> spec for valid class descriptor, but 'c' has a more carefully chosen 
> syntax.
>

'c' keeps the "[;" envelope which is the long-standing format.

I would say putting the name inside the "[;" envelope may not break 
existing tools (for example if they never use the name to find the 
class) whereas putting the suffix following ';' is harder to predict how 
existing tools are impacted.

>>
>> For existing tools that map a descriptor string by trimming "L;" 
>> envelope and/or replacing "/" with ".",  "Lfoo/Foo;/123Z" (option c') 
>> may be mapped to "foo.Foo" and ".123Z" (if used ";" as a separator)...
>
> I would say that for existing tools that treat a single class 
> descriptor at once, with option 'c`' they won't treat ';' as a 
> separator between multiple elements. I would say that existing code 
> that tries to trim 'L;' would either:
>
> - remove the 'L' prefix and strip the string of ';' character wherever 
> it is, which would produce "foo/Foo/123Z" and consequently 
> "foo.Foo.123Z" (a valid binary name)
> - grok and fail (for example because something that starts with 'L' 
> does not end with ';')
> - if the code is "hackish" it might blindly trim the last character if 
> the 1st is '[', so we would end up with "foo/Foo;/123" and 
> consequently with "foo.Foo;.123" (not a valid binary name)
> - something else that neither of us can imagine now
>
>> or "foo.Foo/123Z" which are invalid name whereas "Lfoo/Foo.123Z;" 
>> (option c) may have higher chance be mapped to "foo.Foo.123Z" which 
>> is a valid binary name.
>
> Right, but neither is 'c`' immune to that interpretation. At least the 
> failure mode of 'c' is more predictable.
>
>>
>> ";" and "[" are already used for descriptor.  The remaining ones are 
>> "." and "/".
>>
>> JDWP and JDI are examples of existing tools that obtain the type 
>> descriptor by calling JVM TI `GetClassSignature` and then trims the 
>> "L;" envelope and replace "/" with ".".    Option c produces 
>> "foo.Foo.123Z" as the resulting string which might make it harder
>
> And what does option 'c`' produce?
>
> Existing JDK tools could be updated from day 0. Existing 3rd party 
> tools would have to be updated too in either case. Typical failure 
> mode for option 'c' would be that class "foo.Foo.123Z" can't be found. 
> Who knows what kind of failure modes would option 'c`' produce if 
> parsing was done in C for example. Are crashes excluded?
>
>>
>> 6. Throwing an exception (option a) may make existing libraries to 
>> catch issues very early on.  I see the consistency that John made 
>> about dual-use APIs that prints a human-readable but not resolvable 
>> descriptor.  I got convinced that option c and c' have the benefit 
>> over option a.
>>
>> 7. Existing tools or scripts that parse the descriptor string have to 
>> be updated in both option c and c' to properly handle hidden 
>> classes.  Option c may just hide the problem which is bad if it's 
>> left unnoticed but happens in customer environments.
>
> I doubt that the problem would be hidden with option 'c'. Either the 
> code would just work (because it needs not resolve the descriptor of 
> HC) or it would grok on trying to resolve it. In theory the binary 
> name "foo.Foo.123Z" could be resolved into a real class, but that's 
> hardly possible in practice unless you specifically construct such 
> case. And option 'c`' is not immune to that as well. So I don't think 
> that we would suddenly see a bunch of wrong resolvings where 
> "foo.Foo.123Z" would actually be resolved successfully. You have a say 
> in how the suffix in "foo.Foo/suffix" is constructed and by using 
> something that is not a usual name the chances can be minimized.
>
>>
>> My only concern is the compatibility risk on existing agents that 
>> assume JVM TI `GetClassSignature` returns a valid type descriptor and 
>> use it to resolution.   Both option c and c' return an invalid 
>> descriptor string and so I consider the impact is about the same.    
>> JDI and JDWP have to be updated to work with either new form. As John 
>> noted, option c' has the fail-fast properties that may help existing 
>> code to diagnose issues during migration.
>>
>> That's my summary why I went with option c'.    The preference is 
>> "slightly".
>>
>> Any other thought?
>
> I think that it is easier to debug a more predictable failure even if 
> it happens a little later (when resolving the descriptor) than it is 
> to debug an unpredictable (unimagined) failure which supposedly 
> happens a little earlier. In that respect, option 'a' is most 
> predictable, but it might be "to early" (for example, what if some 
> code just wants to log the descriptor). And 'c`' seems a little scary 
> to me, because I can't imagine all the possible failures.
> Regards, Peter