PING: RFR: 8204531: Remove unused chars following '\0'

Wed Jun 13 13:33:04 UTC 2018

Thanks Thomas,

> That said, I'd probably just add a variant to
> CStringUtilities::getString which takes encoding as an argument, and
> then pass "US-ASCII" to remain compatible with the current version of
> jsnap.

I agree with you. My opinion is one of the ideals.
Anyway, I uploaded new webrev which is added charset support in CStringUtils:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/

Thanks,

Yasumasa

On 2018/06/13 21:46, Thomas Stüfe wrote:
> Disclaimer: I am Reviewer but not versed in this corner of the jdk,
> nor have knowledge of the history, so take what I suggest with a grain
> of salt.
> 
> That said, I'd probably just add a variant to
> CStringUtilities::getString which takes encoding as an argument, and
> then pass "US-ASCII" to remain compatible with the current version of
> jsnap.
> 
> ..Thomas
> 
> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasuenag at gmail.com> wrote:
>> Hi Thomas,
>>
>> Thank you for your comment.
>>
>> CStringUtilities.java says the String which is constructed from byte array
>> should use Charset.defaultCharset(). [1]
>> I'm not convinced how should we fix - US-ASCII or default charset.
>>
>> IMHO we should use default charset. However I concern I need to get CSR
>> approvement.
>> (In addition, I want to refactor CStringUtilities. I guess we can make more
>> simple code :-)
>>
>>
>> What do you think about it?
>>
>> Yasumasa
>>
>>
>> [1]
>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>>
>>
>>
>> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>>
>>> Hi Yasumasa,
>>>
>>> your patch will change behavior of jsnap if file.encoding is set to
>>> something different than US-ASCII.
>>>
>>> Old version uses, hard wired, US-ASCII.
>>>
>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>>
>>> private static String encoding = System.getProperty("file.encoding",
>>> "US-ASCII");
>>>
>>> and thus may use a different encoding.
>>>
>>> Best Regards, Thomas
>>>
>>>
>>>
>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>> wrote:
>>>>
>>>> PING: Could you review it?
>>>> I want to merge this change before JDK 11 RDP 1.
>>>>
>>>>>>>       webrev:
>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>
>>>>>>>       JBS:
>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>
>>>>
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>>
>>>>>
>>>>> Hi David,
>>>>>
>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.holmes at oracle.com>:
>>>>>>
>>>>>>
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Please review this change:
>>>>>>>
>>>>>>>       webrev:
>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>
>>>>>>>       JBS:
>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>
>>>>>>>
>>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>>> String values in PerfData are defined as jbyte array, but JSnap cannot
>>>>>>> handle it well as following:
>>>>>>>
>>>>>>> ```
>>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>>
>>>>>>> sun.gc.cause=No
>>>>>>>
>>>>>>>
>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>>> ```
>>>>>>>
>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it is
>>>>>>> non-ascii character.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>>> really
>>>>>> what is happening? I would have expected "new String(bytes, encoding)"
>>>>>> to
>>>>>> stop processing bytes when it encounters a NUL!
>>>>>
>>>>>
>>>>>
>>>>> String c'tor will continue to parse byte array even if it finds '\0'.
>>>>> You can see it on JShell as below:
>>>>>
>>>>> -------------
>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>>
>>>>> jshell> new String(bytes, "US-ASCII")
>>>>> $2 ==> "abc\000def"
>>>>> -------------
>>>>>
>>>>>
>>>>>>> PerfDataEntry has null-terminated C string. So we should restore as it
>>>>>>> in
>>>>>>> Java layer.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did we
>>>>>> get
>>>>>> such values? Shouldn't the byte array already be terminated at the NUL?
>>>>>> Or
>>>>>> is this just a raw representation of an array in the VM using the exact
>>>>>> length of the array regardless of content?
>>>>>
>>>>>
>>>>>
>>>>> I think JSnap shows the array regardless of content.
>>>>>
>>>>>
>>>>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>>>>> value,
>>>>>> and uses the default file encoding (else ASCII).
>>>>>>
>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly this
>>>>>> "bad" data should have been stopped.
>>>>>
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>
>>>>
>>