RFR: 8301065: Handle control characters in java_lang_String::print [v2]
Kevin Walls
kevinw at openjdk.org
Fri Apr 14 20:21:37 UTC 2023
On Wed, 25 Jan 2023 20:43:12 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>> Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>> - update: get char or byte for control char check
>> - Merge branch 'master' into escape_objectprinting
>> - 8301065: Handle control characters in java_lang_String::print
>
> src/hotspot/share/classfile/javaClasses.cpp line 768:
>
>> 766: for (int index = 0; index < length; index++) {
>> 767: jchar c = value->byte_at(index);
>> 768: if (c < ' ') {
>
> `byte_at()` will give the wrong result if `is_latin1` is false.
>
> Maybe we should use `java_lang_String::as_quoted_ascii()` instead?
At first I read that comment as pointing out that this will not print all characters. But that is the existing behaviour, the existing java_lang_String::print does not handle extended/non-ASCII characters, it just doesn't print them.
This is called only at debug time, including hs_err reporting, so expect nobody yet thought that lack of the extended characters was a problem (if it is, maybe we should handle it in a separate change).
But it should be better using the existing condition to fetch a char or byte, then check if that's a low/control character value.
I looked at using java_lang_String::as_quoted_ascii(), and then we see escaped versions of extended characters, as well as control characters. At least we do see the extended characters, rather than nothing, but a concern might be that as_quoted_ascii makes an allocation, and this is called at debug time, maybe after a crash.
I also looked at the UNICODE::as_utf8 conversion methods, they all require a resource area allocation at some point.
Separately I tried a version which made an allocation with NEW_RESOURCE_ARRAY_RETURN_NULL (which as_quoted_ascii does) and called UNICODE::as_utf8 to get a utf8 string, then prints that (escaping the low-numbered characters). That works and shows extended characters, but the allocation might be a concern. I was thinking better to get this check done, and visit extended characters separately if there's interest.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/12190#discussion_r1167242462
More information about the hotspot-runtime-dev
mailing list