RFR: 8301065: Handle control characters in java_lang_String::print [v2]

Fri Apr 14 20:21:37 UTC 2023

On Wed, 25 Jan 2023 20:43:12 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> Kevin Walls has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - update: get char or byte for control char check
>>  - Merge branch 'master' into escape_objectprinting
>>  - 8301065: Handle control characters in java_lang_String::print
>
> src/hotspot/share/classfile/javaClasses.cpp line 768:
> 
>> 766:   for (int index = 0; index < length; index++) {
>> 767:     jchar c = value->byte_at(index);
>> 768:     if (c < ' ') {
> 
> `byte_at()` will give the wrong result if `is_latin1` is false.
> 
> Maybe we should use `java_lang_String::as_quoted_ascii()` instead?

At first I read that comment as pointing out that this will not print all characters.  But that is the existing behaviour, the existing java_lang_String::print does not handle extended/non-ASCII characters, it just doesn't print them.

This is called only at debug time, including hs_err reporting, so expect nobody yet thought that lack of the extended characters was a problem (if it is, maybe we should handle it in a separate change).

But it should be better using the existing condition to fetch a char or byte, then check if that's a low/control character value.

I looked at using java_lang_String::as_quoted_ascii(), and then we see escaped versions of extended characters, as well as control characters.  At least we do see the extended characters, rather than nothing, but a concern might be that as_quoted_ascii makes an allocation, and this is called at debug time, maybe after a crash.

I also looked at the UNICODE::as_utf8 conversion methods, they all require a resource area allocation at some point.

Separately I tried a version which made an allocation with NEW_RESOURCE_ARRAY_RETURN_NULL (which as_quoted_ascii does) and called UNICODE::as_utf8 to get a utf8 string, then prints that (escaping the low-numbered characters).  That works and shows extended characters, but the allocation might be a concern.  I was thinking better to get this check done, and visit extended characters separately if there's interest.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/12190#discussion_r1167242462