Avoid some GCC 10.X warnings in HotSpot
Ioi Lam
ioi.lam at oracle.com
Thu Jun 18 05:51:54 UTC 2020
On 6/17/20 10:25 PM, David Holmes wrote:
> On 18/06/2020 3:04 pm, Ioi Lam wrote:
>>
>>
>> On 6/17/20 6:56 PM, David Holmes wrote:
>>> Hi Koichi,
>>>
>>> This is in relation to the hotspot part as these issues need to be
>>> handled separately. I have filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8247818
>>>
>>> On 18/06/2020 8:46 am, Koichi Sakata wrote:
>>>> Hi all,
>>>>
>>>> I tried to build OpenJDK fastdebug with GCC 10.1 on Ubuntu 18.04,
>>>> but I saw some compiler warnings as follows:
>>>>
>>>> In file included from
>>>> /home/jyukutyo/code/jdk/src/hotspot/share/classfile/systemDictionary.hpp:31,
>>>>
>>>> from
>>>> /home/jyukutyo/code/jdk/src/hotspot/share/classfile/javaClasses.hpp:28,
>>>>
>>>> from
>>>> /home/jyukutyo/code/jdk/src/hotspot/share/precompiled/precompiled.hpp:35:
>>>>
>>>> In member function 'void Symbol::byte_at_put(int, u1)',
>>>> inlined from 'Symbol::Symbol(const u1*, int, int)' at
>>>> /home/jyukutyo/code/jdk/src/hotspot/share/oops/symbol.cpp:55:16:
>>>> /home/jyukutyo/code/jdk/src/hotspot/share/oops/symbol.hpp:130:18:
>>>> error: writing 1 byte into a region of size 0
>>>> [-Werror=stringop-overflow=]
>>>> 130 | _body[index] = value;
>>>> | ~~~~~~~~~~~~~^~~~~~~
>>>
>>> I'm not really clear on the warning here but this is an area where
>>> we trick the compiler somewhat. The _body[] is declared with a size
>>> of 2, but when we allocate Symbols we allocate sufficient memory for
>>> _body to contain the entire symbol.
>>>
>>> That said I'm struggling to see how we allocate the additional space
>>> needed for the _hash_and_refcount and _length fields ???
>>>
>>>> I can resolve them with the following patch. I believe it fixes
>>>> those potential bugs, so I'd like to contribute it.
>>>> (Our company has signed OCA.)
>>>>
>>>> Thanks,
>>>> Koichi
>>>>
>>>> ===== PATCH =====
>>>> diff -r 20d92fe3ac52 src/hotspot/share/oops/symbol.cpp
>>>> --- a/src/hotspot/share/oops/symbol.cpp Tue Jun 16 03:16:41 2020
>>>> +0000
>>>> +++ b/src/hotspot/share/oops/symbol.cpp Thu Jun 18 07:08:50 2020
>>>> +0900
>>>> @@ -50,9 +50,10 @@
>>>> Symbol::Symbol(const u1* name, int length, int refcount) {
>>>> _hash_and_refcount =
>>>> pack_hash_and_refcount((short)os::random(), refcount);
>>>> _length = length;
>>>> - _body[0] = 0; // in case length == 0
>>>> - for (int i = 0; i < length; i++) {
>>>> - byte_at_put(i, name[i]);
>>>> + if (length == 0) {
>>>> + _body[0] = 0;
>>>
>>> The check for length==0 introduces more overhead than just always
>>> setting _body[0]=0, so there is no need to add it.
>>>
>>>> + } else {
>>>> + memcpy(_body, name, length);
>>>> }
>>>> }
>>>
>>> So you are replacing byte_at_put with a memcpy call. On the surface
>>> that seems reasonable, but I have to wonder why we were using the
>>> loop in the first place. It may just be historical or it may relate
>>> to an alignment issue, or something else. Hopefully someone else
>>> (e.g. Coleen :) ) can shed more light here.
>>>
>>
>> Actually I don't know why we are setting _body[0] to zero. The
>> content of _body is not supposed to be 0-terminated so we don't need
>> a zero in there.
>
> It may not be required but it is cleaner/safer to make it NUL. Though
> arguably we should just be rejecting a length of zero to begin with.
>
zero-length symbols are valid. They are used for UTF8 strings in
classfiles that represent the literal string "".
>> E.g., if the length is 1, _body[1] will have garbage anyway.
>
> If the length is 1 then _body[1] is "out of bounds", so I'm not sure
> what your point is. If the actual length is one then _body[0] has the
> true symbol content (non NUL-terminated) and _body[1] is junk we
> shouldn't touch.
>
By the same token if length is 0, _body[0] is out of bounds, so we can
also leave it as garbage. My point is, there's no advantage of making
zero length a special case, where all other cases the string is not null
terminated.
> But the definition and allocation of Symbols confuses me as I
> mentioned because I can't see why we define _body with size 2 rather
> than just 1.
We define it as size 2 so the whole header is 8 bytes long. It doesn't
really matter, but feels nicer,
volatile uint32_t _hash_and_refcount;
u2 _length;
u1 _body[2];
... especially you see the same "2" in here:
static int byte_size(int length) {
// minimum number of natural words needed to hold these bits (no
non-heap version)
return (int)(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
}
> Nor can I see how the length we pass to placement new accounts for the
> additional fields that a Symbol has. ??
>
The size of the Symbol, including additional space for the whole string,
is calculated in here:
void* Symbol::operator new(size_t sz, int len, Arena* arena) throw() {
int alloc_size = size(len)*wordSize; <<<<<<<<< HERE
address res = (address)arena->Amalloc_4(alloc_size);
return res;
}
Thanks
- Ioi
> Cheers,
> David
>
>> "java" hit Breakpoint 2, Symbol::Symbol (this=0x7fffa1621e38,
>> name=0x7ffff6dceb9c "*", length=1, refcount=65535)
>> (step until the loop finishes)
>>
>> (gdb) p *this
>> $1 = {<MetaspaceObj> = {}, _hash_and_refcount = 2027159551, _length =
>> 1, _body = <incomplete sequence \361>}
>> (gdb) p _body[0]
>> $2 = 42 '*'
>> (gdb) p _body[1]
>> $3 = 241 '\361'
>>
>> Thanks
>> - Ioi
>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> diff -r 20d92fe3ac52 src/hotspot/share/oops/symbol.hpp
>>>> --- a/src/hotspot/share/oops/symbol.hpp Tue Jun 16 03:16:41 2020
>>>> +0000
>>>> +++ b/src/hotspot/share/oops/symbol.hpp Thu Jun 18 07:08:50 2020
>>>> +0900
>>>> @@ -125,11 +125,6 @@
>>>> return (int)heap_word_size(byte_size(length));
>>>> }
>>>>
>>>> - void byte_at_put(int index, u1 value) {
>>>> - assert(index >=0 && index < length(), "symbol index overflow");
>>>> - _body[index] = value;
>>>> - }
>>>> -
>>>> Symbol(const u1* name, int length, int refcount);
>>>> void* operator new(size_t size, int len) throw();
>>>> void* operator new(size_t size, int len, Arena* arena) throw();
>>>
>>>
>>
More information about the hotspot-runtime-dev
mailing list