RFR(XXS) 8249087 Symbol constructor unnecessarily initializes _body[0]

David Holmes david.holmes at oracle.com
Thu Jul 23 04:07:37 UTC 2020


On 23/07/2020 3:42 am, Ioi Lam wrote:
> On 7/22/20 10:12 AM, Lois Foltan wrote:
>> On 7/22/2020 12:17 AM, David Holmes wrote:
>>> Hi Lois,
>>>
>>> On 22/07/2020 1:06 am, Lois Foltan wrote:
>>>> On 7/21/2020 2:24 AM, Ioi Lam wrote:
>>>>>
>>>>>
>>>>> On 7/20/20 11:12 PM, Florian Weimer wrote:
>>>>>> * Ioi Lam:
>>>>>>
>>>>>>> Hi please review this very simple fix:
>>>>>>>
>>>>>>> diff -r 19f26d72a8d0 src/hotspot/share/oops/symbol.cpp
>>>>>>> --- a/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 14:24:19 
>>>>>>> 2020 -0700
>>>>>>> +++ b/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 17:11:57 
>>>>>>> 2020 -0700
>>>>>>> @@ -51,8 +51,11 @@
>>>>>>>   Symbol::Symbol(const u1* name, int length, int refcount) {
>>>>>>>     _hash_and_refcount = pack_hash_and_refcount((short)os::random(),
>>>>>>> refcount);
>>>>>>>     _length = length;
>>>>>>> -  _body[0] = 0;  // in case length == 0
>>>>>>>     memcpy(_body, name, length);
>>>>>>> +  // For symbols of length 0 and 1: _body[0] (and _body[1]) are
>>>>>>> uninitialized and may
>>>>>>> +  // contain random values, which will only be read by
>>>>>>> Symbol::identity_hash(),
>>>>>>> +  // which would tolerate such randomness. These values never 
>>>>>>> change
>>>>>>> during the lifetime
>>>>>>> +  // of the Symbol.
>>>>>>>   }
>>>>>> Won't this still trip memory debuggers? Symbol::identity_hash() 
>>>>>> implies
>>>>>> that the result is eventually used in a conditional operation (a hash
>>>>>> comparison perhaps).  If it's possible one day to run Hotspot under
>>>>>> valgrind, this would result in false positives.
>>>>>
>>>>> Are you saying that valgrind will modify uninitialized memory 
>>>>> periodically after the constructor has returned, and thus will 
>>>>> cause Symbol::identity_hash() to return a different value?
>>>>>
>>>>> Without my patch, _body[1] is uninitialized for Symbols whose 
>>>>> length is 0 or 1. We have not heard of any issues related to 
>>>>> valgrind and Symbol::identity_hash().
>>>>>
>>>>> In fact, looking at the code history, the setting of "_body[0] = 0" 
>>>>> in Symbol::Symbol was introduced only recently (Feb 2020):
>>>>>
>>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/4a4d185098e2/src/hotspot/share/oops/symbol.cpp#l55 
>>>>>
>>>>>
>>>>> I'll check with Lois who added the code to see the reason for doing 
>>>>> it.
>>>>
>>>> Hi Ioi,
>>>>
>>>> Reviewing this JBS issue, I have concerns over leaving both _body[0] 
>>>> and now even _body[1] uninitialized.  The signature processing 
>>>> frequently checks the first character of a Symbol via 
>>>> Symbol::char_at(0) to determine what type it is dealing with.  Is 
>>>> there a danger that the uninitialized memory actually has a valid 
>>>> type indicator in it like an '[' character for example?  The 
>>>> signature processing could potentially make wrong assumptions about 
>>>> the type it is trying to process.
>>>
>>> Aren't all the signature related symbols already guaranteed to not 
>>> have zero-length, or else the length is being pre-tested for zero?
>>
>> Hi David,
>>
>> I believe you are correct in that signature related symbols are 
>> already guaranteed to not have zero-length.  I tried several different 
>> jasm files to introduce an empty string either for a class name or for 
>> a type of a field reference and did not get past class file parsing 
>> without a ClassFormatError.
>>
>> Error: LinkageError occurred while loading main class HelloEmptyString
>>         java.lang.ClassFormatError: Class name is empty or contains 
>> illegal character in descriptor in class file HelloEmptyString
>>
>>
>> However, after reading the original issue in JDK-8249087, the 
>> objection wasn't that there was a needless initialization of _body[0] 
>> but instead it was around the inconsistency of initializing _body[0] 
>> and not _body[1].  I think it should be the responsibility of the 
>> Symbol class API to ensure initialization so that as a consumer I 
>> don't have to worry about _body[0] or _body[1]'s validity.  So I 
>> prefer the change to actually add the initialization of _body[1].  
>> Something like, "_body[0] = _body[1] = 0".
>>
> Hi Lois,
> 
> The fact that _body[0..1] is in the Symbol header is just by coincidence 
> -- we need only 6 bytes of meta-info about the Symbol, so we have 2 
> bytes left over. We use these two left-over bytes for 
> Symbol::identity_hash(). However, no one else should unconditionally 
> read these 2 bytes --  if we change the Symbol header in the future, 
> these two bytes may not be allocated anymore
> 
> It looks like leaving _body[0..1] uninitialized is confusing, and will 
> possibly lead to problems with valgrind as pointed out by Florian. How 
> about this:
> 
> Symbol::Symbol(const u1* name, int length, int refcount) {
>    _hash_and_refcount =  pack_hash_and_refcount((short)os::random(), 
> refcount);
>    _length = length;
>    // _body[0..1] are allocated in the header just by coincidence in the 
> current
>    // implementation of Symbol. They are read by identity_hash(), so 
> make sure they
>    // are initialized.
>    // No other code should assume that _body[0..1] are always allocated. 
> E.g., do
>    // not unconditionally read base()[0] as that will be invalid for an 
> empty Symbol.
>    _body[0] = _body[1] = 0;
>    memcpy(_body, name, length);
> }
> 
> I'll also change the bug title to "Always initialize _body[0..1] in 
> Symbol constructor"

That works for me.

> ----
> 
> As I discussed with Lois off-line:
> 
> There's Signature code that unconditionally reads _body[0], which would 
> assert (but class loading checks for invalid signatures that prevents 
> this from happening)

Right - and that's what I was referring to before. The signature symbol 
validity has already been established and length must be > 0. I would 
not want to see unnecessary length checks inserted there. At most an 
assertion.

Thanks,
David
-----

> BasicType Signature::basic_type(const Symbol* signature) {
>    return basic_type(signature->char_at(0));
> }
> 
> char Symbol::char_at(int index) const {
>    assert(index >=0 && index < length(), "symbol index overflow");
>    return (char)base()[index];
> }
> 
> Signature::basic_type() should be fixed to either check for length, 
> and/or assert that signature is a valid signature.
> 
> Thanks
> - Ioi
> 
> 
> 
> 


More information about the hotspot-runtime-dev mailing list