RFR(XXS) 8249087 Symbol constructor unnecessarily initializes _body[0]

David Holmes david.holmes at oracle.com
Tue Jul 21 03:59:37 UTC 2020


On 21/07/2020 1:26 pm, Ioi Lam wrote:
> On 7/20/20 7:57 PM, David Holmes wrote:
>> Hi Ioi,
>>
>> On 21/07/2020 12:50 pm, Ioi Lam wrote:
>>>
>>>
>>> On 7/20/20 7:36 PM, David Holmes wrote:
>>>> Hi Ioi,
>>>>
>>>> On 21/07/2020 10:12 am, Ioi Lam wrote:
>>>>> Hi please review this very simple fix:
>>>>>
>>>>> diff -r 19f26d72a8d0 src/hotspot/share/oops/symbol.cpp
>>>>> --- a/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 14:24:19 2020 
>>>>> -0700
>>>>> +++ b/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 17:11:57 2020 
>>>>> -0700
>>>>> @@ -51,8 +51,11 @@
>>>>>   Symbol::Symbol(const u1* name, int length, int refcount) {
>>>>>     _hash_and_refcount = 
>>>>> pack_hash_and_refcount((short)os::random(), refcount);
>>>>>     _length = length;
>>>>> -  _body[0] = 0;  // in case length == 0
>>>>>     memcpy(_body, name, length);
>>>>> +  // For symbols of length 0 and 1: _body[0] (and _body[1]) are 
>>>>> uninitialized and may
>>>>
>>>> Can we ever have a Symbol of length zero?
>>>>
>>>> If the Symbol name is length 1 then surely _body[0] is initialized 
>>>> to the single character of that name?
>>>>
>>>> The change seems harmless given a zero length symbol is meaningless, 
>>>> but the commentary just confuses things to me.
>>>>
>>>
>>> Hi David,
>>>
>>> We can have a valid Symbol of length 0. All UTF8 constants in 
>>> classfiles, including the literal string "", are represented as Symbols.
>>
>> Ah I see.
>>> How about
>>>
>>> // Random, uninitialized values may appear in _body[0] and _body[1]
>>> // for Symbols of length 0 and 1. These random values never change 
>>> during
>>> // the lifetime of the Symbol, and are read only by 
>>> Symbol::identity_hash(),
>>
>> What if as_quoted_ascii() were called on the zero-length symbol? That 
>> reads _body[0] unconditionally. The same for use of base(). ??
> 
> Hi David,
> 
> as_quoted_ascii() and base() do not read the contents of _body[0]. They 
> take the address of _body[0]:
> 
> char* Symbol::as_quoted_ascii() const {
>    const char *ptr = (const char *)&_body[0];

Yep misread that one - sorry.

>    int quoted_length = UTF8::quoted_ascii_length(ptr, utf8_length());
>    char* result = NEW_RESOURCE_ARRAY(char, quoted_length + 1);
>    UTF8::as_quoted_ascii(ptr, utf8_length(), result, quoted_length + 1);
>    return result;
> }
> 
>    const u1* base() const { return &_body[0]; }

Sure but what about the callers of base()? Any indexing off base() 
should be guarded by a length() check of course, but it's hard to verify 
that.

How about:

// the lifetime of the Symbol, and should only be read by 
Symbol::identity_hash(),

Thanks,
David
-----

> Thanks
> - Ioi
> 
> 
> 
>>
>> Thanks,
>> David
>> -----
>>
>>> // which would tolerate such randomness.
>>>
>>>
>>> Thanks
>>> - Ioi
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> +  // contain random values, which will only be read by 
>>>>> Symbol::identity_hash(),
>>>>> +  // which would tolerate such randomness. These values never 
>>>>> change during the lifetime
>>>>> +  // of the Symbol.
>>>>>   }
>>>>>
>>>>>
>>>>> Passed hs tiers 1/2. Running tiers 3/4 now.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>
> 


More information about the hotspot-runtime-dev mailing list