RFR(XXS) 8249087 Symbol constructor unnecessarily initializes _body[0]
Ioi Lam
ioi.lam at oracle.com
Wed Jul 22 17:42:16 UTC 2020
On 7/22/20 10:12 AM, Lois Foltan wrote:
> On 7/22/2020 12:17 AM, David Holmes wrote:
>> Hi Lois,
>>
>> On 22/07/2020 1:06 am, Lois Foltan wrote:
>>> On 7/21/2020 2:24 AM, Ioi Lam wrote:
>>>>
>>>>
>>>> On 7/20/20 11:12 PM, Florian Weimer wrote:
>>>>> * Ioi Lam:
>>>>>
>>>>>> Hi please review this very simple fix:
>>>>>>
>>>>>> diff -r 19f26d72a8d0 src/hotspot/share/oops/symbol.cpp
>>>>>> --- a/src/hotspot/share/oops/symbol.cpp Mon Jul 20 14:24:19
>>>>>> 2020 -0700
>>>>>> +++ b/src/hotspot/share/oops/symbol.cpp Mon Jul 20 17:11:57
>>>>>> 2020 -0700
>>>>>> @@ -51,8 +51,11 @@
>>>>>> Symbol::Symbol(const u1* name, int length, int refcount) {
>>>>>> _hash_and_refcount = pack_hash_and_refcount((short)os::random(),
>>>>>> refcount);
>>>>>> _length = length;
>>>>>> - _body[0] = 0; // in case length == 0
>>>>>> memcpy(_body, name, length);
>>>>>> + // For symbols of length 0 and 1: _body[0] (and _body[1]) are
>>>>>> uninitialized and may
>>>>>> + // contain random values, which will only be read by
>>>>>> Symbol::identity_hash(),
>>>>>> + // which would tolerate such randomness. These values never
>>>>>> change
>>>>>> during the lifetime
>>>>>> + // of the Symbol.
>>>>>> }
>>>>> Won't this still trip memory debuggers? Symbol::identity_hash()
>>>>> implies
>>>>> that the result is eventually used in a conditional operation (a hash
>>>>> comparison perhaps). If it's possible one day to run Hotspot under
>>>>> valgrind, this would result in false positives.
>>>>
>>>> Are you saying that valgrind will modify uninitialized memory
>>>> periodically after the constructor has returned, and thus will
>>>> cause Symbol::identity_hash() to return a different value?
>>>>
>>>> Without my patch, _body[1] is uninitialized for Symbols whose
>>>> length is 0 or 1. We have not heard of any issues related to
>>>> valgrind and Symbol::identity_hash().
>>>>
>>>> In fact, looking at the code history, the setting of "_body[0] = 0"
>>>> in Symbol::Symbol was introduced only recently (Feb 2020):
>>>>
>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/4a4d185098e2/src/hotspot/share/oops/symbol.cpp#l55
>>>>
>>>>
>>>> I'll check with Lois who added the code to see the reason for doing
>>>> it.
>>>
>>> Hi Ioi,
>>>
>>> Reviewing this JBS issue, I have concerns over leaving both _body[0]
>>> and now even _body[1] uninitialized. The signature processing
>>> frequently checks the first character of a Symbol via
>>> Symbol::char_at(0) to determine what type it is dealing with. Is
>>> there a danger that the uninitialized memory actually has a valid
>>> type indicator in it like an '[' character for example? The
>>> signature processing could potentially make wrong assumptions about
>>> the type it is trying to process.
>>
>> Aren't all the signature related symbols already guaranteed to not
>> have zero-length, or else the length is being pre-tested for zero?
>
> Hi David,
>
> I believe you are correct in that signature related symbols are
> already guaranteed to not have zero-length. I tried several different
> jasm files to introduce an empty string either for a class name or for
> a type of a field reference and did not get past class file parsing
> without a ClassFormatError.
>
> Error: LinkageError occurred while loading main class HelloEmptyString
> java.lang.ClassFormatError: Class name is empty or contains
> illegal character in descriptor in class file HelloEmptyString
>
>
> However, after reading the original issue in JDK-8249087, the
> objection wasn't that there was a needless initialization of _body[0]
> but instead it was around the inconsistency of initializing _body[0]
> and not _body[1]. I think it should be the responsibility of the
> Symbol class API to ensure initialization so that as a consumer I
> don't have to worry about _body[0] or _body[1]'s validity. So I
> prefer the change to actually add the initialization of _body[1].
> Something like, "_body[0] = _body[1] = 0".
>
Hi Lois,
The fact that _body[0..1] is in the Symbol header is just by coincidence
-- we need only 6 bytes of meta-info about the Symbol, so we have 2
bytes left over. We use these two left-over bytes for
Symbol::identity_hash(). However, no one else should unconditionally
read these 2 bytes -- if we change the Symbol header in the future,
these two bytes may not be allocated anymore
It looks like leaving _body[0..1] uninitialized is confusing, and will
possibly lead to problems with valgrind as pointed out by Florian. How
about this:
Symbol::Symbol(const u1* name, int length, int refcount) {
_hash_and_refcount = pack_hash_and_refcount((short)os::random(),
refcount);
_length = length;
// _body[0..1] are allocated in the header just by coincidence in the
current
// implementation of Symbol. They are read by identity_hash(), so
make sure they
// are initialized.
// No other code should assume that _body[0..1] are always allocated.
E.g., do
// not unconditionally read base()[0] as that will be invalid for an
empty Symbol.
_body[0] = _body[1] = 0;
memcpy(_body, name, length);
}
I'll also change the bug title to "Always initialize _body[0..1] in
Symbol constructor"
----
As I discussed with Lois off-line:
There's Signature code that unconditionally reads _body[0], which would
assert (but class loading checks for invalid signatures that prevents
this from happening)
BasicType Signature::basic_type(const Symbol* signature) {
return basic_type(signature->char_at(0));
}
char Symbol::char_at(int index) const {
assert(index >=0 && index < length(), "symbol index overflow");
return (char)base()[index];
}
Signature::basic_type() should be fixed to either check for length,
and/or assert that signature is a valid signature.
Thanks
- Ioi
More information about the hotspot-runtime-dev
mailing list