PING: RFR: 8204531: Remove unused chars following '\0'

Thu Jun 14 04:41:37 UTC 2018

Hi Yasumasa,

This seems mostly okay - though like Thomas this is not an area I'm that 
familiar with. But it seems this addresses the problem without impacting 
other things.

This assert is redundant:

  366                 if (Assert.ASSERTS_ENABLED) {
  367                     Assert.that(vectorLength() > 0, "not a byte 
vector");

You can't reach it unless len != 0 and I assume you're not really 
checking that len is not-negative - if you are then the assert needs to 
go further up after:

328         int len = vectorLength();

Thanks,
David

On 14/06/2018 2:18 PM, Yasumasa Suenaga wrote:
> Hi Thomas,
> 
> Thank you for your comment.
> I uploaded new webrev:
> 
>    http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.02/
> 
> 
> 2018-06-14 3:34 GMT+09:00 Thomas Stüfe <thomas.stuefe at gmail.com>:
>> Hi Yasumasa,
>>
>> looks good to me.
>>
>> small nit:
>> import java.nio.charset.* -> import java.nio.charset.Charset;
> 
> Fixed.
> 
> 
>> We loose the assert for type now from byteArrayValue(). Can you re-add
>> it to the location where you call CStringUtils?
> 
> I added the assertion to check vector length.
> I did not add type assertion because if-statement branches by data type.
> 
> 
>> We do not seem to need byteArrayValue(), can this be removed?
> 
> I cannot agree that.
> byteArrayValue() is public method (but it is not used AFAICS).
> However, similar getter methods are privided from PerfDataEntry (e.g.
> charArrayValue()). So I think we should not remove byteArrayValue().
> 
> 
> Regards,
> 
> Yasumasa
> 
> 
>> Best Regards, Thomas
>>
>>
>>
>> On Wed, Jun 13, 2018 at 3:33 PM, Yasumasa Suenaga <yasuenag at gmail.com> wrote:
>>> Thanks Thomas,
>>>
>>>> That said, I'd probably just add a variant to
>>>> CStringUtilities::getString which takes encoding as an argument, and
>>>> then pass "US-ASCII" to remain compatible with the current version of
>>>> jsnap.
>>>
>>>
>>> I agree with you. My opinion is one of the ideals.
>>> Anyway, I uploaded new webrev which is added charset support in
>>> CStringUtils:
>>>
>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>
>>> On 2018/06/13 21:46, Thomas Stüfe wrote:
>>>>
>>>> Disclaimer: I am Reviewer but not versed in this corner of the jdk,
>>>> nor have knowledge of the history, so take what I suggest with a grain
>>>> of salt.
>>>>
>>>> That said, I'd probably just add a variant to
>>>> CStringUtilities::getString which takes encoding as an argument, and
>>>> then pass "US-ASCII" to remain compatible with the current version of
>>>> jsnap.
>>>>
>>>> ..Thomas
>>>>
>>>> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> Thank you for your comment.
>>>>>
>>>>> CStringUtilities.java says the String which is constructed from byte
>>>>> array
>>>>> should use Charset.defaultCharset(). [1]
>>>>> I'm not convinced how should we fix - US-ASCII or default charset.
>>>>>
>>>>> IMHO we should use default charset. However I concern I need to get CSR
>>>>> approvement.
>>>>> (In addition, I want to refactor CStringUtilities. I guess we can make
>>>>> more
>>>>> simple code :-)
>>>>>
>>>>>
>>>>> What do you think about it?
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>>>>>
>>>>>
>>>>>
>>>>> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>>>>>
>>>>>>
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> your patch will change behavior of jsnap if file.encoding is set to
>>>>>> something different than US-ASCII.
>>>>>>
>>>>>> Old version uses, hard wired, US-ASCII.
>>>>>>
>>>>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>>>>>
>>>>>> private static String encoding = System.getProperty("file.encoding",
>>>>>> "US-ASCII");
>>>>>>
>>>>>> and thus may use a different encoding.
>>>>>>
>>>>>> Best Regards, Thomas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasuenag at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> PING: Could you review it?
>>>>>>> I want to merge this change before JDK 11 RDP 1.
>>>>>>>
>>>>>>>>>>        webrev:
>>>>>>>>>>          http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>>
>>>>>>>>>>        JBS:
>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.holmes at oracle.com>:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Please review this change:
>>>>>>>>>>
>>>>>>>>>>        webrev:
>>>>>>>>>>          http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>>>
>>>>>>>>>>        JBS:
>>>>>>>>>>          https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>>>>>> String values in PerfData are defined as jbyte array, but JSnap
>>>>>>>>>> cannot
>>>>>>>>>> handle it well as following:
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>>>>>
>>>>>>>>>> sun.gc.cause=No
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it
>>>>>>>>>> is
>>>>>>>>>> non-ascii character.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>>>>>> really
>>>>>>>>> what is happening? I would have expected "new String(bytes,
>>>>>>>>> encoding)"
>>>>>>>>> to
>>>>>>>>> stop processing bytes when it encounters a NUL!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> String c'tor will continue to parse byte array even if it finds '\0'.
>>>>>>>> You can see it on JShell as below:
>>>>>>>>
>>>>>>>> -------------
>>>>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>>>>>
>>>>>>>> jshell> new String(bytes, "US-ASCII")
>>>>>>>> $2 ==> "abc\000def"
>>>>>>>> -------------
>>>>>>>>
>>>>>>>>
>>>>>>>>>> PerfDataEntry has null-terminated C string. So we should restore as
>>>>>>>>>> it
>>>>>>>>>> in
>>>>>>>>>> Java layer.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did
>>>>>>>>> we
>>>>>>>>> get
>>>>>>>>> such values? Shouldn't the byte array already be terminated at the
>>>>>>>>> NUL?
>>>>>>>>> Or
>>>>>>>>> is this just a raw representation of an array in the VM using the
>>>>>>>>> exact
>>>>>>>>> length of the array regardless of content?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I think JSnap shows the array regardless of content.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>>>>>>>> value,
>>>>>>>>> and uses the default file encoding (else ASCII).
>>>>>>>>>
>>>>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly
>>>>>>>>> this
>>>>>>>>> "bad" data should have been stopped.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>