RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
David Holmes
david.holmes at oracle.com
Mon Jun 22 00:20:39 UTC 2015
On 20/06/2015 3:17 PM, Yumin Qi wrote:
> Ioi,
> Thanks. I missed the size(...) call. Updated webrev02, convert _body
> from size 1 array to size 2 array and have done tests for it.
Why did you add:
118 public:
and make a bunch of previously private methods (and the enum) public ??
Thanks,
David
> Thanks
> Yumin
>
> On 6/19/2015 6:39 PM, Ioi Lam wrote:
>> Yumin,
>>
>> The new version of Symbol::size over-calculates the required size by 1
>> byte. That's why your average Symbol size is only reduced from 44.198
>> to 43.317 bytes (a reduction of 0.88 bytes).
>>
>> The code:
>>
>> jbyte _body[1];
>> static int size(int length) {
>> size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ?
>> length - 1 : 0));
>> return align_object_size(sz);
>> }
>>
>> should be changed to
>>
>> jbyte _body[2];
>> static int size(int length) {
>> size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>> length - 2 : 0));
>> return align_object_size(sz);
>> }
>>
>> When you have this
>>
>> class Symbol : public MetaspaceObj {
>> ATOMIC_SHORT_PAIR(
>> volatile short _refcount, // needs atomic operation
>> unsigned short _length // number of UTF8 characters in
>> the symbol (does not need atomic op)
>> );
>> short _identity_hash;
>> jbyte _body[1];
>> }
>>
>> Even though you have declared only 7 bytes of fields, sizeof(Symbol)
>> is 8 bytes, not 7. That's because the C++ compiler implicitly adds 1
>> byte of padding. So if your string is "ab", Symbol::size() would use
>>
>> sizeof(Symbol) + (length > 1 ? length - 1 : 0) ==> 8 + 2 - 1
>> ==> 9 bytes
>>
>> even though 8 bytes are enough.
>>
>> Please see more comments below:
>>
>> On 6/19/15 5:20 PM, Yumin Qi wrote:
>>> Hi, Coleen and Ioi
>>>
>>> The revised webrev:
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>>>
>>> In this version, I have tested with Specjbb2005:
>>> 1) change _body to one byte, contribute _body[0] to the calculation
>>> of identity_hash. No rehash request found during the test. Like the
>>> Warnings mentioned by Coleen in her email;
>>> 2) Following data is the output, no. of literals is same as no. of
>>> entries which refers to no. of symbol.
>>>
>>> before:
>>> SymbolTable statistics:
>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>> Number of entries : 2016 = 64512 bytes, avg 32.000
>>> Number of literals : 2016 = 89104 bytes, avg 44.198
>>> Total footprint : = 473792 bytes
>>> Average bucket size : 0.101
>>> Variance of bucket size : 0.100
>>> Std. dev. of bucket size: 0.316
>>> Maximum bucket size : 3
>>>
>>> after:
>>> SymbolTable statistics:
>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>> Number of entries : 2021 = 64672 bytes, avg 32.000
>>> Number of literals : 2021 = 87544 bytes, avg 43.317
>>> Total footprint : = 472392 bytes
>>> Average bucket size : 0.101
>>> Variance of bucket size : 0.100
>>> Std. dev. of bucket size: 0.316
>>> Maximum bucket size : 3
>>>
>>> About no. of literals, the calculation of the size is:
>>>
>>> template <class T, MEMFLAGS F> int RehashableHashtable<T,
>>> F>::literal_size(Symbol *symbol) {
>>> return symbol->size() * HeapWordSize;
>>> }
>>>
>>> And the size is just the _length of the symbol which I think is the
>>> length of the characters, not including Symbol itself. Cannot use
>>> this data to display how many bytes we saved.
>>
>> This is not true. If it were true, you wouldn't have seen any changes
>> in the "average literal" size printed above.
>>
>> Symbol::size() is defined to be this:
>>
>> int size() { return size(utf8_length()); }
>>
>> which calls the static Symbol::size(int) function, which includes the
>> size of the header.
>>
>>> 3) correct a inaccurate calculation of size for Symbol in old version.
>>> size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>> length : 0));
>>> It will allocate one extra byte when length > 0 since we
>>> already reserve one byte for _body.
>> Your description is incorrect. In the old version, SymbolBase did not
>> include _body. Please see comments in JDK-8009575.
>>
>> So why did we need SymbolBase for 8009575 (the previous version) but
>> don't need it for 8087143 (the current RFE)?
>>
>> After the fix for 8009575, we needed 8 bytes of header for Symbol, so
>> if Symbol was declared as this:
>>
>> class Symbol : public MetaspaceObj {
>> ATOMIC_SHORT_PAIR(
>> volatile short _refcount, // needs atomic operation
>> unsigned short _length // number of UTF8 characters in
>> the symbol (does not need atomic op)
>> );
>> int _identity_hash;
>> jbyte _body[1];
>> }
>>
>> It would occupy 9 bytes. Most compilers would automatically pad it up
>> to 12 bytes.
>>
>> I could have written Symbol as the following to consider the
>> pad-to-12-bytes case:
>>
>> class Symbol : public MetaspaceObj {
>> ATOMIC_SHORT_PAIR(
>> volatile short _refcount, // needs atomic operation
>> unsigned short _length // number of UTF8 characters in
>> the symbol (does not need atomic op)
>> );
>> int _identity_hash;
>> jbyte _body[4]; // align up to 12 bytes
>>
>> static int size(int length) {
>> size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
>> length - 4 : 0));
>> return align_object_size(sz);
>> }
>> }
>>
>> However, the amount of structure padding by the C compiler is not
>> well-defined. Although most of our compiler would pad to 12 bytes,
>> some compiler may decide to pad to 16 byte. While this is highly
>> unlikely, I didn't want to take the chance. Explicitly splitting out a
>> SymbolBase class, which is 8 bytes, would make our size calculation
>> always correct.
>>
>> Anyway, for the current fix, we reduced the Symbol header to only 6
>> bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually
>> no compiler would add any padding to a 8-byte structure, so using
>> sizeof(Symbol) is good enough.
>>
>> Please trace the Symbol::size(int) function inside gdb to verify that
>> it performs the calculation correctly.
>>
>>> Now we save 3 bytes for every symbol (for symbol whose length is
>>> greater than 1).
>>> 4) remove java.cpp change, which is not needed, as Coleen pointed
>>> out, we can use -XX:+PrintStringTableStatistics to get this data,
>>> which also is a product flag.
>>> 5) for .jsa files:
>>> specjbb2005-org.jsa: 4026368
>>> specjbb2005.jsa: 4018176
>>> Saved about 8K.
>>>
>> Here is the output extracted from "java -Xshare:dump
>> -XX:+PrintSharedSpaces" on 64-bit Linux.
>>
>> Before any changes:
>>
>> Symbol : 50913 1925736 27.4 | 0 0 0.0
>> | 50913 1925736 10.5
>>
>> After your webrev02/
>>
>> Symbol : 50913 1889520 27.0 | 0 0 0.0
>> | 50913 1889520 10.3
>>
>> After changing _body[1] => _body[2]
>>
>> Symbol : 50913 1856248 26.6 | 0 0 0.0
>> | 50913 1856248 10.1
>>
>> The average Symbol size changed from
>>
>> 37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
>>
>> As I mentioned in the bug report, the reduction (1.37) is smaller than
>> the theoretical value (2 bytes) because we have many small symbols,
>> and metaspace.cpp enforces a 24-byte minimum allocation size for each
>> MetaspaceObj on 64-bit platforms.
>>
>> Thanks
>> - Ioi
>>
>>
>>> Thanks
>>> Yumin
>>>
>>>
>>> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>>>
>>>> Hi, This is okay, one comment below.
>>>>
>>>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>>>> Coleen,
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>> Yumin,
>>>>>>
>>>>>> If the identity hash isn't unique enough you get a system
>>>>>> dictionary warning like this, which doesn't stop the JVM. Could
>>>>>> you check your test logs to see if this warning is output in any
>>>>>> of them (jtreg keeps logs around, just checking the jtreg test
>>>>>> results should be sufficient, I think).
>>>>>>
>>>>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>>>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>>>>> average=0.564759 load=0.266601
>>>>>>
>>>>>> I think with including _body[0] in the function you may make the
>>>>>> hash random enough. So this was one of my concerns with this
>>>>>> change.
>>>>>>
>>>>> I will check that.
>>>>>> In
>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>> Are you sure you want all of the SymbolTable dump? Maybe it
>>>>>> should be under PrintSymbolTableSizeStatistics instead?
>>>>>>
>>>>> I will check and make change based on the output of statistics,
>>>>> maybe it is too much to print out all SymbolTable.
>>>>
>>>> It's way more than anyone needs normally. Maybe only print it if
>>>> one adds +Verbose to PrintStringTableSizeStatistics and when we do
>>>> logging, it'll be -Xlog:symboltable=trace or a lower logging level
>>>> than info.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>> In
>>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>>>>
>>>>>>
>>>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>>>
>>>>>> I'm not sure what that comment meant. With SymbolBase gone (which
>>>>>> I think looks nicer), can you still correctly calculate the space
>>>>>> needed?
>>>>>>
>>>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>>>> The original calculation is not correct I think: if symbol length
>>>>> > 0, we allocate *_ONE_* extra byte for it:
>>>>> - size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
>>>>> length : 0));
>>>>> + size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>>>>> length - 2 : 0));
>>>>>
>>>>>> Other than these things, this change looks good. Nice to find
>>>>>> some space savings with something so frequent as symbols.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>>>> Ioi,
>>>>>>>
>>>>>>> Thanks for review, I will add that (for print out purpose, not
>>>>>>> in the final change?)
>>>>>>>
>>>>>>> Yumin
>>>>>>>
>>>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>>>> Hi Yumin,
>>>>>>>>
>>>>>>>> I wonder why the average symbol size didn't change. They are
>>>>>>>> both about 34.25 bytes long.
>>>>>>>>
>>>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces
>>>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>>>
>>>>>>>> Also, could you add some code to print out the statistics of the
>>>>>>>> system dictionary to show if there's any change in the hashtable
>>>>>>>> balance?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Please review the small change for
>>>>>>>>>
>>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>>>
>>>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase),
>>>>>>>>> it is used to compute hash bucket index by modulus division of
>>>>>>>>> table size. Currently in hotspot, no table size is more than
>>>>>>>>> 65535 so we can use short instead. For case with table size
>>>>>>>>> over 65535 we can use the first two bytes of symbol data to be
>>>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>>>
>>>>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>>>>> redundant. Also add print out SymbolTable statistics under
>>>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>>>
>>>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>>>
>>>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>>>
>>>>>>>>> Before Change:
>>>>>>>>> SymbolTable statistics:
>>>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>>>> Number of entries : 49914 = 1597248 bytes, avg 32.000
>>>>>>>>> Number of literals : 49914 = 1709688 bytes, avg 34.253
>>>>>>>>> Total footprint : = 3627112 bytes
>>>>>>>>> Average bucket size : 2.494
>>>>>>>>> Variance of bucket size : 2.483
>>>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>>>> Maximum bucket size : 10
>>>>>>>>>
>>>>>>>>> After Change:
>>>>>>>>>
>>>>>>>>> SymbolTable statistics:
>>>>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>>>>> Number of entries : 49911 = 1597152 bytes, avg 32.000
>>>>>>>>> Number of literals : 49911 = 1709544 bytes, avg 34.252
>>>>>>>>> Total footprint : = 3626872 bytes
>>>>>>>>> Average bucket size : 2.494
>>>>>>>>> Variance of bucket size : 2.483
>>>>>>>>> Std. dev. of bucket size: 1.576
>>>>>>>>> Maximum bucket size : 10
>>>>>>>>>
>>>>>>>>> There is no big change for the hashtable balance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Yumin
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list