java.lang.constant.ClassDesc and TypeDescriptor for hidden class??
mandy.chung at oracle.com
Thu Apr 16 16:33:16 UTC 2020
On 4/16/20 2:34 AM, forax at univ-mlv.fr wrote:
> Hi Mandy,
> We have forgotten MethodType.toMethodDescriptorString() , it has to
> be updated too.
It's not forgotten (I only didn't list it explicitly in the summary):
> I still disagree with John, it's fine to use the scheme c' has a name
> for the Hidden Class. I still think the Java methods that says in
> their spec that they return a descriptor should return a valid
> descriptor thus throw an exception if there is a hidden class
> somewhere in the descriptor so a user has to decide how to handle
> Hidden Class before calling those methods.
> Anyway, we should move on that, the patch is fine for me if the
> javadoc of toMethodDescriptorString is updated too, so let's record
> that i disagree and move on.
It was updated:
> And for ASM, we have already talked to deprecate any methods that
> takes a live object in Type (j.l.Class and j.l.r.Method) because it
> causes surprising classloading loop when ASM is used in a classloader
> or in an agent, hidden class not having a proper name is another data
> point in that direction.
> *De: *"mandy chung" <mandy.chung at oracle.com>
> *À: *"Peter Levart" <peter.levart at gmail.com>, "Remi Forax"
> <forax at univ-mlv.fr>
> *Cc: *"John Rose" <john.r.rose at oracle.com>, "Brian Goetz"
> <brian.goetz at oracle.com>, "valhalla-dev"
> <valhalla-dev at openjdk.java.net>
> *Envoyé: *Mercredi 15 Avril 2020 20:40:48
> *Objet: *Re: java.lang.constant.ClassDesc and TypeDescriptor for
> hidden class??
> Hi Peter, Remi,
> Lois and Harold have given further feedback. They are concerned
> with option c's impact to JVM implementations due to this new form
> of signature whose name is outside the "[;"] envelope whereas
> signatures are always inside the "[;]" envelopes since day 1.
> Option c' appears to have a higher compatibility risk not only in
> existing libraries but probably also VM implementations.
> Option c has a low compatibility risk while existing code still
> needs to be updated to support hidden classes. As John advices
> , it would be a mistake for `Class::descriptorString` to throw
> an exception in the long run. We have exhausted the differences
> among these options.
> Spec change proposal is:
> - extend TypeDescriptor for entities that cannot be described
> nominally. If it can be described nominally,
> TypeDescriptor::descriptorString returns a field/method descriptor
> conforming to JVMS 4.3. If it cannot be described nominally, the
> result string is not a type descriptor. No nominal descriptor can
> be produced.
> - specify in the javadoc for Class::descriptorString and
> MethodType::descriptorString that the result string when it can be
> described nominally conforming to JVMS 4.3 or when it cannot be
> described nominally.
> - No JVMS change
> Here is the updated specdiff and webrev:
> Please review.
> On 4/14/20 2:16 AM, Peter Levart wrote:
> On 4/13/20 9:09 PM, Mandy Chung wrote:
> On 4/12/20 5:14 AM, Remi Forax wrote:
> The problem is not that 'c' is easier to parse,
> but that 'c`' is not
> parsable at all. Do we really want unparsable
> method descriptors?
> If the problem is preventing resolving of hidden
> class names or
> descriptors, then it seems that making the method
> descriptors unparsable
> is not the right place to do that.
> I agree with Peter,
> throwing an exception is better, there is no way to
> encode a hidden class in a descriptor because a hidden
> class has no name you can lookup,
> if the API return an unparsable method descriptor, the
> user code will throw an exception anyway.
> Several points that are noteworthy:
> 1. A resolved method never has a hidden class in its
> signature as a hidden class cannot be discovered by any
> class loader.
> 2. When VM fails to resolve a symbolic reference to a
> hidden class, it might print its name or descriptor string
> in the error message. Lois and Harold can confirm if this
> should or should not cause any issue (I can't see how it
> would cause any issue yet).
> 3. The only way to get a method descriptor with a hidden
> class in it is by constructing `MethodType` with a `Class`
> object representing a hidden class.
> Or by custom code that manipulates class descriptors using
> String operations. Suppose there's code that doesn't want to
> eagerly resolve types and just manipulates Strings. Surely a
> class descriptor of a HC can only be obtained when there *is*
> a HC already present, but ... never underestimate programmers'
> imagination when (s)he is combining information from various
> sources, some of them might be resolvable types, some might be
> just descriptors, etc...
> True. Our imagination is powerful!
> The main thing is that a bad method descriptor will fail to resolve.
> 4. `Class::descriptorString` on a hidden class is
> human-readable but not a valid descriptor (both option c
> and c')
> 5. The special character chosen by option c and c' is an
> illegal character for an unqualified name ("." ";" "["
> "/" see JVMS 4.2.2). This way loading a class of the name
> of a hidden class will always get CNFE via bytecode
> linkage or Class::forName etc (either from Class::getName
> or mapped from Class::descriptorString).
> Right. The JVMS may remain unchanged. But that doesn't mean
> that Class.descriptorString() couldn't be specified to return
> a JVMS valid descriptor for classical named types, while for
> HCs (or derived types like arrays) it would return a special
> unresolvable descriptor with carefully specified syntax. Such
> a syntax that would play well when composed into the syntax of
> higher-level descriptors like method type descriptor. Why
> would we want that? Because by that we get a more predictable
> failure mode. We only fail when/if the type described by such
> descriptor tries to be resolved.
> In this respect, both variants 'c' and 'c`' as you said,
> violate JVMS spec for valid class descriptor, but 'c' has a
> more carefully chosen syntax.
> 'c' keeps the "[;" envelope which is the long-standing format.
> I would say putting the name inside the "[;" envelope may not
> break existing tools (for example if they never use the name to
> find the class) whereas putting the suffix following ';' is harder
> to predict how existing tools are impacted.
> For existing tools that map a descriptor string by
> trimming "L;" envelope and/or replacing "/" with ".",
> "Lfoo/Foo;/123Z" (option c') may be mapped to "foo.Foo"
> and ".123Z" (if used ";" as a separator)...
> I would say that for existing tools that treat a single class
> descriptor at once, with option 'c`' they won't treat ';' as a
> separator between multiple elements. I would say that existing
> code that tries to trim 'L;' would either:
> - remove the 'L' prefix and strip the string of ';' character
> wherever it is, which would produce "foo/Foo/123Z" and
> consequently "foo.Foo.123Z" (a valid binary name)
> - grok and fail (for example because something that starts
> with 'L' does not end with ';')
> - if the code is "hackish" it might blindly trim the last
> character if the 1st is '[', so we would end up with
> "foo/Foo;/123" and consequently with "foo.Foo;.123" (not a
> valid binary name)
> - something else that neither of us can imagine now
> or "foo.Foo/123Z" which are invalid name whereas
> "Lfoo/Foo.123Z;" (option c) may have higher chance be
> mapped to "foo.Foo.123Z" which is a valid binary name.
> Right, but neither is 'c`' immune to that interpretation. At
> least the failure mode of 'c' is more predictable.
> ";" and "[" are already used for descriptor. The
> remaining ones are "." and "/".
> JDWP and JDI are examples of existing tools that obtain
> the type descriptor by calling JVM TI `GetClassSignature`
> and then trims the "L;" envelope and replace "/" with
> ".". Option c produces "foo.Foo.123Z" as the resulting
> string which might make it harder
> And what does option 'c`' produce?
> Existing JDK tools could be updated from day 0. Existing 3rd
> party tools would have to be updated too in either case.
> Typical failure mode for option 'c' would be that class
> "foo.Foo.123Z" can't be found. Who knows what kind of failure
> modes would option 'c`' produce if parsing was done in C for
> example. Are crashes excluded?
> 6. Throwing an exception (option a) may make existing
> libraries to catch issues very early on. I see the
> consistency that John made about dual-use APIs that prints
> a human-readable but not resolvable descriptor. I got
> convinced that option c and c' have the benefit over
> option a.
> 7. Existing tools or scripts that parse the descriptor
> string have to be updated in both option c and c' to
> properly handle hidden classes. Option c may just hide
> the problem which is bad if it's left unnoticed but
> happens in customer environments.
> I doubt that the problem would be hidden with option 'c'.
> Either the code would just work (because it needs not resolve
> the descriptor of HC) or it would grok on trying to resolve
> it. In theory the binary name "foo.Foo.123Z" could be resolved
> into a real class, but that's hardly possible in practice
> unless you specifically construct such case. And option 'c`'
> is not immune to that as well. So I don't think that we would
> suddenly see a bunch of wrong resolvings where "foo.Foo.123Z"
> would actually be resolved successfully. You have a say in how
> the suffix in "foo.Foo/suffix" is constructed and by using
> something that is not a usual name the chances can be minimized.
> My only concern is the compatibility risk on existing
> agents that assume JVM TI `GetClassSignature` returns a
> valid type descriptor and use it to resolution. Both
> option c and c' return an invalid descriptor string and so
> I consider the impact is about the same. JDI and JDWP
> have to be updated to work with either new form. As John
> noted, option c' has the fail-fast properties that may
> help existing code to diagnose issues during migration.
> That's my summary why I went with option c'. The
> preference is "slightly".
> Any other thought?
> I think that it is easier to debug a more predictable failure
> even if it happens a little later (when resolving the
> descriptor) than it is to debug an unpredictable (unimagined)
> failure which supposedly happens a little earlier. In that
> respect, option 'a' is most predictable, but it might be "to
> early" (for example, what if some code just wants to log the
> descriptor). And 'c`' seems a little scary to me, because I
> can't imagine all the possible failures.
> Regards, Peter
More information about the valhalla-dev