PING: RFR: 8204531: Remove unused chars following '\0'

Wed Jun 13 12:18:20 UTC 2018

Hi Yasumasa,

your patch will change behavior of jsnap if file.encoding is set to
something different than US-ASCII.

Old version uses, hard wired, US-ASCII.

You now use jvm/hotspot/utilities/CStringUtilities.java, which uses

private static String encoding = System.getProperty("file.encoding",
"US-ASCII");

and thus may use a different encoding.

Best Regards, Thomas

On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasuenag at gmail.com> wrote:
> PING: Could you review it?
> I want to merge this change before JDK 11 RDP 1.
>
>>>>     webrev:
>>>>       http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>
>>>>     JBS:
>>>>       https://bugs.openjdk.java.net/browse/JDK-8204531
>
>
>
> Yasumasa
>
>
> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>
>> Hi David,
>>
>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.holmes at oracle.com>:
>>>
>>> Hi Yasumasa,
>>>
>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> Please review this change:
>>>>
>>>>     webrev:
>>>>       http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>
>>>>     JBS:
>>>>       https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>
>>>>
>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>> String values in PerfData are defined as jbyte array, but JSnap cannot
>>>> handle it well as following:
>>>>
>>>> ```
>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>
>>>> sun.gc.cause=No
>>>>
>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>> ```
>>>>
>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it is
>>>> non-ascii character.
>>>
>>>
>>>
>>> So your synopsis says "remove unused chars following \0". Is that really
>>> what is happening? I would have expected "new String(bytes, encoding)" to
>>> stop processing bytes when it encounters a NUL!
>>
>>
>> String c'tor will continue to parse byte array even if it finds '\0'.
>> You can see it on JShell as below:
>>
>> -------------
>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>
>> jshell> new String(bytes, "US-ASCII")
>> $2 ==> "abc\000def"
>> -------------
>>
>>
>>>> PerfDataEntry has null-terminated C string. So we should restore as it
>>>> in
>>>> Java layer.
>>>
>>>
>>>
>>> If the issue is really "junk" beyond the \0 nul-terminator, how did we
>>> get
>>> such values? Shouldn't the byte array already be terminated at the NUL?
>>> Or
>>> is this just a raw representation of an array in the VM using the exact
>>> length of the array regardless of content?
>>
>>
>> I think JSnap shows the array regardless of content.
>>
>>
>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>> value,
>>> and uses the default file encoding (else ASCII).
>>>
>>> The fix seems perfectly reasonable, I'm just unclear where exactly this
>>> "bad" data should have been stopped.
>>
>>
>> Thanks!
>>
>>
>> Yasumasa
>>
>>
>>> Thanks,
>>> David
>>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>
>