RFR(XXS) 8249087 Symbol constructor unnecessarily initializes _body[0]

Ioi Lam ioi.lam at oracle.com
Tue Jul 21 06:14:00 UTC 2020


On 7/20/20 8:59 PM, David Holmes wrote:
> On 21/07/2020 1:26 pm, Ioi Lam wrote:
>> On 7/20/20 7:57 PM, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> On 21/07/2020 12:50 pm, Ioi Lam wrote:
>>>>
>>>>
>>>> On 7/20/20 7:36 PM, David Holmes wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> On 21/07/2020 10:12 am, Ioi Lam wrote:
>>>>>> Hi please review this very simple fix:
>>>>>>
>>>>>> diff -r 19f26d72a8d0 src/hotspot/share/oops/symbol.cpp
>>>>>> --- a/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 14:24:19 
>>>>>> 2020 -0700
>>>>>> +++ b/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 17:11:57 
>>>>>> 2020 -0700
>>>>>> @@ -51,8 +51,11 @@
>>>>>>   Symbol::Symbol(const u1* name, int length, int refcount) {
>>>>>>     _hash_and_refcount = 
>>>>>> pack_hash_and_refcount((short)os::random(), refcount);
>>>>>>     _length = length;
>>>>>> -  _body[0] = 0;  // in case length == 0
>>>>>>     memcpy(_body, name, length);
>>>>>> +  // For symbols of length 0 and 1: _body[0] (and _body[1]) are 
>>>>>> uninitialized and may
>>>>>
>>>>> Can we ever have a Symbol of length zero?
>>>>>
>>>>> If the Symbol name is length 1 then surely _body[0] is initialized 
>>>>> to the single character of that name?
>>>>>
>>>>> The change seems harmless given a zero length symbol is 
>>>>> meaningless, but the commentary just confuses things to me.
>>>>>
>>>>
>>>> Hi David,
>>>>
>>>> We can have a valid Symbol of length 0. All UTF8 constants in 
>>>> classfiles, including the literal string "", are represented as 
>>>> Symbols.
>>>
>>> Ah I see.
>>>> How about
>>>>
>>>> // Random, uninitialized values may appear in _body[0] and _body[1]
>>>> // for Symbols of length 0 and 1. These random values never change 
>>>> during
>>>> // the lifetime of the Symbol, and are read only by 
>>>> Symbol::identity_hash(),
>>>
>>> What if as_quoted_ascii() were called on the zero-length symbol? 
>>> That reads _body[0] unconditionally. The same for use of base(). ??
>>
>> Hi David,
>>
>> as_quoted_ascii() and base() do not read the contents of _body[0]. 
>> They take the address of _body[0]:
>>
>> char* Symbol::as_quoted_ascii() const {
>>    const char *ptr = (const char *)&_body[0];
>
> Yep misread that one - sorry.
>
>>    int quoted_length = UTF8::quoted_ascii_length(ptr, utf8_length());
>>    char* result = NEW_RESOURCE_ARRAY(char, quoted_length + 1);
>>    UTF8::as_quoted_ascii(ptr, utf8_length(), result, quoted_length + 1);
>>    return result;
>> }
>>
>>    const u1* base() const { return &_body[0]; }
>
> Sure but what about the callers of base()? Any indexing off base() 
> should be guarded by a length() check of course, but it's hard to 
> verify that.

I think we are pretty safe -- _body[] is not zero-terminated, so if we 
have code that uses base() and goes out of bound, it will read from the 
next object (or debug padding) and will surely have caused some tests to 
fail.
>
> How about:
>
> // the lifetime of the Symbol, and should only be read by 
> Symbol::identity_hash(),
>

OK, I'll change the comment as you suggested.

Thanks
- Ioi

> Thanks,
> David
> -----
>
>> Thanks
>> - Ioi
>>
>>
>>
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> // which would tolerate such randomness.
>>>>
>>>>
>>>> Thanks
>>>> - Ioi
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> +  // contain random values, which will only be read by 
>>>>>> Symbol::identity_hash(),
>>>>>> +  // which would tolerate such randomness. These values never 
>>>>>> change during the lifetime
>>>>>> +  // of the Symbol.
>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Passed hs tiers 1/2. Running tiers 3/4 now.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>
>>



More information about the hotspot-runtime-dev mailing list