PING: RFR: 8204531: Remove unused chars following '\0'

Thu Jun 14 04:18:53 UTC 2018

Hi Thomas,

Thank you for your comment.
I uploaded new webrev:

  http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.02/

2018-06-14 3:34 GMT+09:00 Thomas Stüfe <thomas.stuefe at gmail.com>:
> Hi Yasumasa,
>
> looks good to me.
>
> small nit:
> import java.nio.charset.* -> import java.nio.charset.Charset;

Fixed.

> We loose the assert for type now from byteArrayValue(). Can you re-add
> it to the location where you call CStringUtils?

I added the assertion to check vector length.
I did not add type assertion because if-statement branches by data type.

> We do not seem to need byteArrayValue(), can this be removed?

I cannot agree that.
byteArrayValue() is public method (but it is not used AFAICS).
However, similar getter methods are privided from PerfDataEntry (e.g.
charArrayValue()). So I think we should not remove byteArrayValue().

Regards,

Yasumasa

> Best Regards, Thomas
>
>
>
> On Wed, Jun 13, 2018 at 3:33 PM, Yasumasa Suenaga <yasuenag at gmail.com> wrote:
>> Thanks Thomas,
>>
>>> That said, I'd probably just add a variant to
>>> CStringUtilities::getString which takes encoding as an argument, and
>>> then pass "US-ASCII" to remain compatible with the current version of
>>> jsnap.
>>
>>
>> I agree with you. My opinion is one of the ideals.
>> Anyway, I uploaded new webrev which is added charset support in
>> CStringUtils:
>>
>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/
>>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>>
>> On 2018/06/13 21:46, Thomas Stüfe wrote:
>>>
>>> Disclaimer: I am Reviewer but not versed in this corner of the jdk,
>>> nor have knowledge of the history, so take what I suggest with a grain
>>> of salt.
>>>
>>> That said, I'd probably just add a variant to
>>> CStringUtilities::getString which takes encoding as an argument, and
>>> then pass "US-ASCII" to remain compatible with the current version of
>>> jsnap.
>>>
>>> ..Thomas
>>>
>>> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>> wrote:
>>>>
>>>> Hi Thomas,
>>>>
>>>> Thank you for your comment.
>>>>
>>>> CStringUtilities.java says the String which is constructed from byte
>>>> array
>>>> should use Charset.defaultCharset(). [1]
>>>> I'm not convinced how should we fix - US-ASCII or default charset.
>>>>
>>>> IMHO we should use default charset. However I concern I need to get CSR
>>>> approvement.
>>>> (In addition, I want to refactor CStringUtilities. I guess we can make
>>>> more
>>>> simple code :-)
>>>>
>>>>
>>>> What do you think about it?
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> [1]
>>>>
>>>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>>>>
>>>>
>>>>
>>>> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>>>>
>>>>>
>>>>> Hi Yasumasa,
>>>>>
>>>>> your patch will change behavior of jsnap if file.encoding is set to
>>>>> something different than US-ASCII.
>>>>>
>>>>> Old version uses, hard wired, US-ASCII.
>>>>>
>>>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>>>>
>>>>> private static String encoding = System.getProperty("file.encoding",
>>>>> "US-ASCII");
>>>>>
>>>>> and thus may use a different encoding.
>>>>>
>>>>> Best Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> PING: Could you review it?
>>>>>> I want to merge this change before JDK 11 RDP 1.
>>>>>>
>>>>>>>>>       webrev:
>>>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>
>>>>>>>>>       JBS:
>>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.holmes at oracle.com>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Please review this change:
>>>>>>>>>
>>>>>>>>>       webrev:
>>>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>
>>>>>>>>>       JBS:
>>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>>>>> String values in PerfData are defined as jbyte array, but JSnap
>>>>>>>>> cannot
>>>>>>>>> handle it well as following:
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>>>>
>>>>>>>>> sun.gc.cause=No
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it
>>>>>>>>> is
>>>>>>>>> non-ascii character.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>>>>> really
>>>>>>>> what is happening? I would have expected "new String(bytes,
>>>>>>>> encoding)"
>>>>>>>> to
>>>>>>>> stop processing bytes when it encounters a NUL!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> String c'tor will continue to parse byte array even if it finds '\0'.
>>>>>>> You can see it on JShell as below:
>>>>>>>
>>>>>>> -------------
>>>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>>>>
>>>>>>> jshell> new String(bytes, "US-ASCII")
>>>>>>> $2 ==> "abc\000def"
>>>>>>> -------------
>>>>>>>
>>>>>>>
>>>>>>>>> PerfDataEntry has null-terminated C string. So we should restore as
>>>>>>>>> it
>>>>>>>>> in
>>>>>>>>> Java layer.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did
>>>>>>>> we
>>>>>>>> get
>>>>>>>> such values? Shouldn't the byte array already be terminated at the
>>>>>>>> NUL?
>>>>>>>> Or
>>>>>>>> is this just a raw representation of an array in the VM using the
>>>>>>>> exact
>>>>>>>> length of the array regardless of content?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think JSnap shows the array regardless of content.
>>>>>>>
>>>>>>>
>>>>>>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>>>>>>> value,
>>>>>>>> and uses the default file encoding (else ASCII).
>>>>>>>>
>>>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly
>>>>>>>> this
>>>>>>>> "bad" data should have been stopped.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>