RFR: 8301065: Handle control characters in java_lang_String::print [v2]

Mon Apr 17 04:39:46 UTC 2023

On Fri, 14 Apr 2023 20:16:18 GMT, Kevin Walls <kevinw at openjdk.org> wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 768:
>> 
>>> 766:   for (int index = 0; index < length; index++) {
>>> 767:     jchar c = value->byte_at(index);
>>> 768:     if (c < ' ') {
>> 
>> `byte_at()` will give the wrong result if `is_latin1` is false.
>> 
>> Maybe we should use `java_lang_String::as_quoted_ascii()` instead?
>
> At first I read that comment as pointing out that this will not print all characters.  But that is the existing behaviour, the existing java_lang_String::print does not handle extended/non-ASCII characters, it just doesn't print them.
> 
> This is called only at debug time, including hs_err reporting, so expect nobody yet thought that lack of the extended characters was a problem (if it is, maybe we should handle it in a separate change).
> 
> But it should be better using the existing condition to fetch a char or byte, then check if that's a low/control character value.
> 
> I looked at using java_lang_String::as_quoted_ascii(), and then we see escaped versions of extended characters, as well as control characters.  At least we do see the extended characters, rather than nothing, but a concern might be that as_quoted_ascii makes an allocation, and this is called at debug time, maybe after a crash.
> 
> I also looked at the UNICODE::as_utf8 conversion methods, they all require a resource area allocation at some point.
> 
> Separately I tried a version which made an allocation with NEW_RESOURCE_ARRAY_RETURN_NULL (which as_quoted_ascii does) and called UNICODE::as_utf8 to get a utf8 string, then prints that (escaping the low-numbered characters).  That works and shows extended characters, but the allocation might be a concern.  I was thinking better to get this check done, and visit extended characters separately if there's interest.

This new version looks good to me. I agree that the extra allocation in UNICODE::as_utf8 might be a concern.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/12190#discussion_r1168150388