java.lang.constant.ClassDesc and TypeDescriptor for hidden class??

Peter Levart peter.levart at gmail.com
Tue Apr 14 09:16:51 UTC 2020



On 4/13/20 9:09 PM, Mandy Chung wrote:
>
>
> On 4/12/20 5:14 AM, Remi Forax wrote:
>>> The problem is not that 'c' is easier to parse, but that 'c`' is not
>>> parsable at all. Do we really want unparsable method descriptors?
>>>
>>> If the problem is preventing resolving of hidden class names or
>>> descriptors, then it seems that making the method descriptors unparsable
>>> is not the right place to do that.
>> I agree with Peter,
>> throwing an exception is better, there is no way to encode a hidden class in a descriptor because a hidden class has no name you can lookup,
>> if the API return an unparsable method descriptor, the user code will throw an exception anyway.
>>
>
> Several points that are noteworthy:
>
> 1. A resolved method never has a hidden class in its signature as a 
> hidden class cannot be discovered by any class loader.

Right.

>
> 2. When VM fails to resolve a symbolic reference to a hidden class, it 
> might print its name or descriptor string in the error message.  Lois 
> and Harold can confirm if this should or should not cause any issue (I 
> can't see how it would cause any issue yet).
>
> 3. The only way to get a method descriptor with a hidden class in it 
> is by constructing `MethodType` with a `Class` object representing a 
> hidden class.

Or by custom code that manipulates class descriptors using String 
operations. Suppose there's code that doesn't want to eagerly resolve 
types and just manipulates Strings. Surely a class descriptor of a HC 
can only be obtained when there *is* a HC already present, but ... never 
underestimate programmers' imagination when (s)he is combining 
information from various sources, some of them might be resolvable 
types, some might be just descriptors, etc...

>
> 4. `Class::descriptorString` on a hidden class is human-readable but 
> not a valid descriptor (both option c and c')
>
> 5. The special character chosen by option c and c' is an illegal 
> character for an unqualified name ("."  ";" "[" "/" see JVMS 4.2.2).  
> This way loading a class of the name of a hidden class will always get 
> CNFE via bytecode linkage or Class::forName etc (either from 
> Class::getName or mapped from Class::descriptorString).

Right. The JVMS may remain unchanged. But that doesn't mean that 
Class.descriptorString() couldn't be specified to return a JVMS valid 
descriptor for classical named types, while for HCs (or derived types 
like arrays) it would return a special unresolvable descriptor with 
carefully specified syntax. Such a syntax that would play well when 
composed into the syntax of higher-level descriptors like method type 
descriptor. Why would we want that? Because by that we get a more 
predictable failure mode. We only fail when/if the type described by 
such descriptor tries to be resolved.

In this respect, both variants 'c' and 'c`' as you said, violate JVMS 
spec for valid class descriptor, but 'c' has a more carefully chosen syntax.

>
> For existing tools that map a descriptor string by trimming "L;" 
> envelope and/or replacing "/" with ".",  "Lfoo/Foo;/123Z" (option c') 
> may be mapped to "foo.Foo" and ".123Z" (if used ";" as a separator)...

I would say that for existing tools that treat a single class descriptor 
at once, with option 'c`' they won't treat ';' as a separator between 
multiple elements. I would say that existing code that tries to trim 
'L;' would either:

- remove the 'L' prefix and strip the string of ';' character wherever 
it is, which would produce "foo/Foo/123Z" and consequently 
"foo.Foo.123Z" (a valid binary name)
- grok and fail (for example because something that starts with 'L' does 
not end with ';')
- if the code is "hackish" it might blindly trim the last character if 
the 1st is '[', so we would end up with "foo/Foo;/123" and consequently 
with "foo.Foo;.123" (not a valid binary name)
- something else that neither of us can imagine now

> or "foo.Foo/123Z" which are invalid name whereas "Lfoo/Foo.123Z;" 
> (option c) may have higher chance be mapped to "foo.Foo.123Z" which is 
> a valid binary name.

Right, but neither is 'c`' immune to that interpretation. At least the 
failure mode of 'c' is more predictable.

>
> ";" and "[" are already used for descriptor.  The remaining ones are 
> "." and "/".
>
> JDWP and JDI are examples of existing tools that obtain the type 
> descriptor by calling JVM TI `GetClassSignature` and then trims the 
> "L;" envelope and replace "/" with ".".    Option c produces 
> "foo.Foo.123Z" as the resulting string which might make it harder

And what does option 'c`' produce?

Existing JDK tools could be updated from day 0. Existing 3rd party tools 
would have to be updated too in either case. Typical failure mode for 
option 'c' would be that class "foo.Foo.123Z" can't be found. Who knows 
what kind of failure modes would option 'c`' produce if parsing was done 
in C for example. Are crashes excluded?

>
> 6. Throwing an exception (option a) may make existing libraries to 
> catch issues very early on.  I see the consistency that John made 
> about dual-use APIs that prints a human-readable but not resolvable 
> descriptor.  I got convinced that option c and c' have the benefit 
> over option a.
>
> 7. Existing tools or scripts that parse the descriptor string have to 
> be updated in both option c and c' to properly handle hidden classes.  
> Option c may just hide the problem which is bad if it's left unnoticed 
> but happens in customer environments.

I doubt that the problem would be hidden with option 'c'. Either the 
code would just work (because it needs not resolve the descriptor of HC) 
or it would grok on trying to resolve it. In theory the binary name 
"foo.Foo.123Z" could be resolved into a real class, but that's hardly 
possible in practice unless you specifically construct such case. And 
option 'c`' is not immune to that as well. So I don't think that we 
would suddenly see a bunch of wrong resolvings where "foo.Foo.123Z" 
would actually be resolved successfully. You have a say in how the 
suffix in "foo.Foo/suffix" is constructed and by using something that is 
not a usual name the chances can be minimized.

>
> My only concern is the compatibility risk on existing agents that 
> assume JVM TI `GetClassSignature` returns a valid type descriptor and 
> use it to resolution.   Both option c and c' return an invalid 
> descriptor string and so I consider the impact is about the same.    
> JDI and JDWP have to be updated to work with either new form. As John 
> noted, option c' has the fail-fast properties that may help existing 
> code to diagnose issues during migration.
>
> That's my summary why I went with option c'.    The preference is 
> "slightly".
>
> Any other thought?

I think that it is easier to debug a more predictable failure even if it 
happens a little later (when resolving the descriptor) than it is to 
debug an unpredictable (unimagined) failure which supposedly happens a 
little earlier. In that respect, option 'a' is most predictable, but it 
might be "to early" (for example, what if some code just wants to log 
the descriptor). And 'c`' seems a little scary to me, because I can't 
imagine all the possible failures.

Regards, Peter

>
> Mandy
> [1] 
> http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/webrev.06-delta/




More information about the valhalla-dev mailing list