RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
Yumin Qi
yumin.qi at oracle.com
Sat Jun 20 05:17:31 UTC 2015
Ioi,
Thanks. I missed the size(...) call. Updated webrev02, convert _body
from size 1 array to size 2 array and have done tests for it.
Thanks
Yumin
On 6/19/2015 6:39 PM, Ioi Lam wrote:
> Yumin,
>
> The new version of Symbol::size over-calculates the required size by 1
> byte. That's why your average Symbol size is only reduced from 44.198
> to 43.317 bytes (a reduction of 0.88 bytes).
>
> The code:
>
> jbyte _body[1];
> static int size(int length) {
> size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ?
> length - 1 : 0));
> return align_object_size(sz);
> }
>
> should be changed to
>
> jbyte _body[2];
> static int size(int length) {
> size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
> length - 2 : 0));
> return align_object_size(sz);
> }
>
> When you have this
>
> class Symbol : public MetaspaceObj {
> ATOMIC_SHORT_PAIR(
> volatile short _refcount, // needs atomic operation
> unsigned short _length // number of UTF8 characters in
> the symbol (does not need atomic op)
> );
> short _identity_hash;
> jbyte _body[1];
> }
>
> Even though you have declared only 7 bytes of fields, sizeof(Symbol)
> is 8 bytes, not 7. That's because the C++ compiler implicitly adds 1
> byte of padding. So if your string is "ab", Symbol::size() would use
>
> sizeof(Symbol) + (length > 1 ? length - 1 : 0) ==> 8 + 2 - 1
> ==> 9 bytes
>
> even though 8 bytes are enough.
>
> Please see more comments below:
>
> On 6/19/15 5:20 PM, Yumin Qi wrote:
>> Hi, Coleen and Ioi
>>
>> The revised webrev:
>> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>>
>> In this version, I have tested with Specjbb2005:
>> 1) change _body to one byte, contribute _body[0] to the calculation
>> of identity_hash. No rehash request found during the test. Like the
>> Warnings mentioned by Coleen in her email;
>> 2) Following data is the output, no. of literals is same as no. of
>> entries which refers to no. of symbol.
>>
>> before:
>> SymbolTable statistics:
>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>> Number of entries : 2016 = 64512 bytes, avg 32.000
>> Number of literals : 2016 = 89104 bytes, avg 44.198
>> Total footprint : = 473792 bytes
>> Average bucket size : 0.101
>> Variance of bucket size : 0.100
>> Std. dev. of bucket size: 0.316
>> Maximum bucket size : 3
>>
>> after:
>> SymbolTable statistics:
>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>> Number of entries : 2021 = 64672 bytes, avg 32.000
>> Number of literals : 2021 = 87544 bytes, avg 43.317
>> Total footprint : = 472392 bytes
>> Average bucket size : 0.101
>> Variance of bucket size : 0.100
>> Std. dev. of bucket size: 0.316
>> Maximum bucket size : 3
>>
>> About no. of literals, the calculation of the size is:
>>
>> template <class T, MEMFLAGS F> int RehashableHashtable<T,
>> F>::literal_size(Symbol *symbol) {
>> return symbol->size() * HeapWordSize;
>> }
>>
>> And the size is just the _length of the symbol which I think is the
>> length of the characters, not including Symbol itself. Cannot use
>> this data to display how many bytes we saved.
>
> This is not true. If it were true, you wouldn't have seen any changes
> in the "average literal" size printed above.
>
> Symbol::size() is defined to be this:
>
> int size() { return size(utf8_length()); }
>
> which calls the static Symbol::size(int) function, which includes the
> size of the header.
>
>> 3) correct a inaccurate calculation of size for Symbol in old version.
>> size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>> length : 0));
>> It will allocate one extra byte when length > 0 since we
>> already reserve one byte for _body.
> Your description is incorrect. In the old version, SymbolBase did not
> include _body. Please see comments in JDK-8009575.
>
> So why did we need SymbolBase for 8009575 (the previous version) but
> don't need it for 8087143 (the current RFE)?
>
> After the fix for 8009575, we needed 8 bytes of header for Symbol, so
> if Symbol was declared as this:
>
> class Symbol : public MetaspaceObj {
> ATOMIC_SHORT_PAIR(
> volatile short _refcount, // needs atomic operation
> unsigned short _length // number of UTF8 characters in
> the symbol (does not need atomic op)
> );
> int _identity_hash;
> jbyte _body[1];
> }
>
> It would occupy 9 bytes. Most compilers would automatically pad it up
> to 12 bytes.
>
> I could have written Symbol as the following to consider the
> pad-to-12-bytes case:
>
> class Symbol : public MetaspaceObj {
> ATOMIC_SHORT_PAIR(
> volatile short _refcount, // needs atomic operation
> unsigned short _length // number of UTF8 characters in
> the symbol (does not need atomic op)
> );
> int _identity_hash;
> jbyte _body[4]; // align up to 12 bytes
>
> static int size(int length) {
> size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
> length - 4 : 0));
> return align_object_size(sz);
> }
> }
>
> However, the amount of structure padding by the C compiler is not
> well-defined. Although most of our compiler would pad to 12 bytes,
> some compiler may decide to pad to 16 byte. While this is highly
> unlikely, I didn't want to take the chance. Explicitly splitting out a
> SymbolBase class, which is 8 bytes, would make our size calculation
> always correct.
>
> Anyway, for the current fix, we reduced the Symbol header to only 6
> bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually
> no compiler would add any padding to a 8-byte structure, so using
> sizeof(Symbol) is good enough.
>
> Please trace the Symbol::size(int) function inside gdb to verify that
> it performs the calculation correctly.
>
>> Now we save 3 bytes for every symbol (for symbol whose length is
>> greater than 1).
>> 4) remove java.cpp change, which is not needed, as Coleen pointed
>> out, we can use -XX:+PrintStringTableStatistics to get this data,
>> which also is a product flag.
>> 5) for .jsa files:
>> specjbb2005-org.jsa: 4026368
>> specjbb2005.jsa: 4018176
>> Saved about 8K.
>>
> Here is the output extracted from "java -Xshare:dump
> -XX:+PrintSharedSpaces" on 64-bit Linux.
>
> Before any changes:
>
> Symbol : 50913 1925736 27.4 | 0 0 0.0
> | 50913 1925736 10.5
>
> After your webrev02/
>
> Symbol : 50913 1889520 27.0 | 0 0 0.0
> | 50913 1889520 10.3
>
> After changing _body[1] => _body[2]
>
> Symbol : 50913 1856248 26.6 | 0 0 0.0
> | 50913 1856248 10.1
>
> The average Symbol size changed from
>
> 37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
>
> As I mentioned in the bug report, the reduction (1.37) is smaller than
> the theoretical value (2 bytes) because we have many small symbols,
> and metaspace.cpp enforces a 24-byte minimum allocation size for each
> MetaspaceObj on 64-bit platforms.
>
> Thanks
> - Ioi
>
>
>> Thanks
>> Yumin
>>
>>
>> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>>
>>> Hi, This is okay, one comment below.
>>>
>>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>>> Coleen,
>>>>
>>>> Thanks.
>>>>
>>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>>
>>>>> Yumin,
>>>>>
>>>>> If the identity hash isn't unique enough you get a system
>>>>> dictionary warning like this, which doesn't stop the JVM. Could
>>>>> you check your test logs to see if this warning is output in any
>>>>> of them (jtreg keeps logs around, just checking the jtreg test
>>>>> results should be sufficient, I think).
>>>>>
>>>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>>>> average=0.564759 load=0.266601
>>>>>
>>>>> I think with including _body[0] in the function you may make the
>>>>> hash random enough. So this was one of my concerns with this
>>>>> change.
>>>>>
>>>> I will check that.
>>>>> In
>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>>>
>>>>> Are you sure you want all of the SymbolTable dump? Maybe it
>>>>> should be under PrintSymbolTableSizeStatistics instead?
>>>>>
>>>> I will check and make change based on the output of statistics,
>>>> maybe it is too much to print out all SymbolTable.
>>>
>>> It's way more than anyone needs normally. Maybe only print it if
>>> one adds +Verbose to PrintStringTableSizeStatistics and when we do
>>> logging, it'll be -Xlog:symboltable=trace or a lower logging level
>>> than info.
>>>
>>> Thanks,
>>> Coleen
>>>>> In
>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>>>
>>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>>
>>>>> I'm not sure what that comment meant. With SymbolBase gone (which
>>>>> I think looks nicer), can you still correctly calculate the space
>>>>> needed?
>>>>>
>>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>>> The original calculation is not correct I think: if symbol length
>>>> > 0, we allocate *_ONE_* extra byte for it:
>>>> - size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? length : 0));
>>>> + size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
>>>>
>>>>> Other than these things, this change looks good. Nice to find
>>>>> some space savings with something so frequent as symbols.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>>> Ioi,
>>>>>>
>>>>>> Thanks for review, I will add that (for print out purpose, not
>>>>>> in the final change?)
>>>>>>
>>>>>> Yumin
>>>>>>
>>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> I wonder why the average symbol size didn't change. They are
>>>>>>> both about 34.25 bytes long.
>>>>>>>
>>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces
>>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>>
>>>>>>> Also, could you add some code to print out the statistics of the
>>>>>>> system dictionary to show if there's any change in the hashtable
>>>>>>> balance?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Please review the small change for
>>>>>>>>
>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>>
>>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase),
>>>>>>>> it is used to compute hash bucket index by modulus division of
>>>>>>>> table size. Currently in hotspot, no table size is more than
>>>>>>>> 65535 so we can use short instead. For case with table size
>>>>>>>> over 65535 we can use the first two bytes of symbol data to be
>>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>>
>>>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>>>> redundant. Also add print out SymbolTable statistics under
>>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>>
>>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>>
>>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>>
>>>>>>>> Before Change:
>>>>>>>> SymbolTable statistics:
>>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>>> Number of entries : 49914 = 1597248 bytes, avg 32.000
>>>>>>>> Number of literals : 49914 = 1709688 bytes, avg 34.253
>>>>>>>> Total footprint : = 3627112 bytes
>>>>>>>> Average bucket size : 2.494
>>>>>>>> Variance of bucket size : 2.483
>>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>>> Maximum bucket size : 10
>>>>>>>>
>>>>>>>> After Change:
>>>>>>>>
>>>>>>>> SymbolTable statistics:
>>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>>> Number of entries : 49911 = 1597152 bytes, avg 32.000
>>>>>>>> Number of literals : 49911 = 1709544 bytes, avg 34.252
>>>>>>>> Total footprint : = 3626872 bytes
>>>>>>>> Average bucket size : 2.494
>>>>>>>> Variance of bucket size : 2.483
>>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>>> Maximum bucket size : 10
>>>>>>>>
>>>>>>>> There is no big change for the hashtable balance.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Yumin
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list