PING: RFR: 8204531: Remove unused chars following '\0'

Thu Jun 14 05:01:06 UTC 2018

Hi David,

Thank you for your comment.

I removed the assertion from PerfDataEntry.java .
Could you review again?

  http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.03/

Thanks,

Yasuamsa

2018-06-14 13:41 GMT+09:00 David Holmes <david.holmes at oracle.com>:
> Hi Yasumasa,
>
> This seems mostly okay - though like Thomas this is not an area I'm that
> familiar with. But it seems this addresses the problem without impacting
> other things.
>
> This assert is redundant:
>
>  366                 if (Assert.ASSERTS_ENABLED) {
>  367                     Assert.that(vectorLength() > 0, "not a byte
> vector");
>
> You can't reach it unless len != 0 and I assume you're not really checking
> that len is not-negative - if you are then the assert needs to go further up
> after:
>
> 328         int len = vectorLength();
>
> Thanks,
> David
>
>
> On 14/06/2018 2:18 PM, Yasumasa Suenaga wrote:
>>
>> Hi Thomas,
>>
>> Thank you for your comment.
>> I uploaded new webrev:
>>
>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.02/
>>
>>
>> 2018-06-14 3:34 GMT+09:00 Thomas Stüfe <thomas.stuefe at gmail.com>:
>>>
>>> Hi Yasumasa,
>>>
>>> looks good to me.
>>>
>>> small nit:
>>> import java.nio.charset.* -> import java.nio.charset.Charset;
>>
>>
>> Fixed.
>>
>>
>>> We loose the assert for type now from byteArrayValue(). Can you re-add
>>> it to the location where you call CStringUtils?
>>
>>
>> I added the assertion to check vector length.
>> I did not add type assertion because if-statement branches by data type.
>>
>>
>>> We do not seem to need byteArrayValue(), can this be removed?
>>
>>
>> I cannot agree that.
>> byteArrayValue() is public method (but it is not used AFAICS).
>> However, similar getter methods are privided from PerfDataEntry (e.g.
>> charArrayValue()). So I think we should not remove byteArrayValue().
>>
>>
>> Regards,
>>
>> Yasumasa
>>
>>
>>> Best Regards, Thomas
>>>
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:33 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>> wrote:
>>>>
>>>> Thanks Thomas,
>>>>
>>>>> That said, I'd probably just add a variant to
>>>>> CStringUtilities::getString which takes encoding as an argument, and
>>>>> then pass "US-ASCII" to remain compatible with the current version of
>>>>> jsnap.
>>>>
>>>>
>>>>
>>>> I agree with you. My opinion is one of the ideals.
>>>> Anyway, I uploaded new webrev which is added charset support in
>>>> CStringUtils:
>>>>
>>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>
>>>> On 2018/06/13 21:46, Thomas Stüfe wrote:
>>>>>
>>>>>
>>>>> Disclaimer: I am Reviewer but not versed in this corner of the jdk,
>>>>> nor have knowledge of the history, so take what I suggest with a grain
>>>>> of salt.
>>>>>
>>>>> That said, I'd probably just add a variant to
>>>>> CStringUtilities::getString which takes encoding as an argument, and
>>>>> then pass "US-ASCII" to remain compatible with the current version of
>>>>> jsnap.
>>>>>
>>>>> ..Thomas
>>>>>
>>>>> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Thomas,
>>>>>>
>>>>>> Thank you for your comment.
>>>>>>
>>>>>> CStringUtilities.java says the String which is constructed from byte
>>>>>> array
>>>>>> should use Charset.defaultCharset(). [1]
>>>>>> I'm not convinced how should we fix - US-ASCII or default charset.
>>>>>>
>>>>>> IMHO we should use default charset. However I concern I need to get
>>>>>> CSR
>>>>>> approvement.
>>>>>> (In addition, I want to refactor CStringUtilities. I guess we can make
>>>>>> more
>>>>>> simple code :-)
>>>>>>
>>>>>>
>>>>>> What do you think about it?
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> your patch will change behavior of jsnap if file.encoding is set to
>>>>>>> something different than US-ASCII.
>>>>>>>
>>>>>>> Old version uses, hard wired, US-ASCII.
>>>>>>>
>>>>>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>>>>>>
>>>>>>> private static String encoding = System.getProperty("file.encoding",
>>>>>>> "US-ASCII");
>>>>>>>
>>>>>>> and thus may use a different encoding.
>>>>>>>
>>>>>>> Best Regards, Thomas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga
>>>>>>> <yasuenag at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> PING: Could you review it?
>>>>>>>> I want to merge this change before JDK 11 RDP 1.
>>>>>>>>
>>>>>>>>>>>        webrev:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>>        JBS:
>>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.holmes at oracle.com>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>
>>>>>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Please review this change:
>>>>>>>>>>>
>>>>>>>>>>>        webrev:
>>>>>>>>>>>
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>>        JBS:
>>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>>>>>>> String values in PerfData are defined as jbyte array, but JSnap
>>>>>>>>>>> cannot
>>>>>>>>>>> handle it well as following:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>>>>>>
>>>>>>>>>>> sun.gc.cause=No
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows
>>>>>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>> non-ascii character.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>>>>>>> really
>>>>>>>>>> what is happening? I would have expected "new String(bytes,
>>>>>>>>>> encoding)"
>>>>>>>>>> to
>>>>>>>>>> stop processing bytes when it encounters a NUL!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> String c'tor will continue to parse byte array even if it finds
>>>>>>>>> '\0'.
>>>>>>>>> You can see it on JShell as below:
>>>>>>>>>
>>>>>>>>> -------------
>>>>>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>>>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>>>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>>>>>>
>>>>>>>>> jshell> new String(bytes, "US-ASCII")
>>>>>>>>> $2 ==> "abc\000def"
>>>>>>>>> -------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> PerfDataEntry has null-terminated C string. So we should restore
>>>>>>>>>>> as
>>>>>>>>>>> it
>>>>>>>>>>> in
>>>>>>>>>>> Java layer.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how
>>>>>>>>>> did
>>>>>>>>>> we
>>>>>>>>>> get
>>>>>>>>>> such values? Shouldn't the byte array already be terminated at the
>>>>>>>>>> NUL?
>>>>>>>>>> Or
>>>>>>>>>> is this just a raw representation of an array in the VM using the
>>>>>>>>>> exact
>>>>>>>>>> length of the array regardless of content?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think JSnap shows the array regardless of content.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I see CStringUtilities.getString() both stops when it encounters a
>>>>>>>>>> 0
>>>>>>>>>> value,
>>>>>>>>>> and uses the default file encoding (else ASCII).
>>>>>>>>>>
>>>>>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly
>>>>>>>>>> this
>>>>>>>>>> "bad" data should have been stopped.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>