RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

Yumin Qi yumin.qi at oracle.com
Sat Jun 20 05:17:31 UTC 2015


Ioi,
   Thanks.  I missed the size(...) call. Updated webrev02, convert _body 
from size 1 array to size 2 array and have done tests for it.

Thanks
Yumin

On 6/19/2015 6:39 PM, Ioi Lam wrote:
> Yumin,
>
> The new version of Symbol::size over-calculates the required size by 1 
> byte. That's why your average Symbol size is only reduced from 44.198 
> to 43.317 bytes (a reduction of 0.88 bytes).
>
> The code:
>
>     jbyte _body[1];
>     static int size(int length) {
>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 1 ?
>     length - 1 : 0));
>         return align_object_size(sz);
>     }
>
> should be changed to
>
>     jbyte _body[2];
>     static int size(int length) {
>        size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ?
>     length - 2 : 0));
>        return align_object_size(sz);
>     }
>
> When you have this
>
>     class Symbol : public MetaspaceObj {
>        ATOMIC_SHORT_PAIR(
>          volatile short _refcount,  // needs atomic operation
>          unsigned short _length     // number of UTF8 characters in
>     the symbol (does not need atomic op)
>        );
>        short _identity_hash;
>        jbyte _body[1];
>     }
>
> Even though you have declared only 7 bytes of fields, sizeof(Symbol) 
> is 8 bytes, not 7. That's because the C++ compiler implicitly adds 1 
> byte of padding. So if your string is "ab", Symbol::size() would use
>
>      sizeof(Symbol) + (length > 1 ? length - 1 : 0)  ==>  8 + 2 - 1 
> ==> 9 bytes
>
> even though 8 bytes are enough.
>
> Please see more comments below:
>
> On 6/19/15 5:20 PM, Yumin Qi wrote:
>> Hi, Coleen and Ioi
>>
>>   The revised webrev:
>> http://cr.openjdk.java.net/~minqi/8087143/webrev02/
>>
>>   In this version, I have tested with Specjbb2005:
>>   1) change _body to one byte, contribute _body[0] to the calculation 
>> of identity_hash. No rehash request found during the test. Like the 
>> Warnings mentioned by Coleen in her email;
>>   2)  Following data is the output, no. of literals is same as no. of 
>> entries which refers to no. of symbol.
>>
>> before:
>> SymbolTable statistics:
>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>> Number of entries       :      2016 =     64512 bytes, avg 32.000
>> Number of literals      :      2016 =     89104 bytes, avg 44.198
>> Total footprint         :           =    473792 bytes
>> Average bucket size     :     0.101
>> Variance of bucket size :     0.100
>> Std. dev. of bucket size:     0.316
>> Maximum bucket size     :         3
>>
>> after:
>> SymbolTable statistics:
>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>> Number of entries       :      2021 =     64672 bytes, avg 32.000
>> Number of literals      :      2021 =     87544 bytes, avg 43.317
>> Total footprint         :           =    472392 bytes
>> Average bucket size     :     0.101
>> Variance of bucket size :     0.100
>> Std. dev. of bucket size:     0.316
>> Maximum bucket size     :         3
>>
>>   About no. of literals, the calculation of the size is:
>>
>> template <class T, MEMFLAGS F> int RehashableHashtable<T, 
>> F>::literal_size(Symbol *symbol) {
>>   return symbol->size() * HeapWordSize;
>> }
>>
>>   And the size is just the _length of the symbol which I think is the 
>> length of the characters, not including  Symbol itself. Cannot use 
>> this data to display how many bytes we saved.
>
> This is not true.  If it were true, you wouldn't have seen any changes 
> in the "average literal" size printed above.
>
> Symbol::size() is defined to be this:
>
>   int size()                { return size(utf8_length()); }
>
> which calls the static Symbol::size(int) function, which includes the 
> size of the header.
>
>> 3) correct a inaccurate calculation of size for Symbol in old version.
>>       size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? 
>> length : 0));
>>       It will allocate one extra byte when length > 0 since we 
>> already reserve one byte for _body.
> Your description is incorrect. In the old version, SymbolBase did not 
> include _body. Please see comments in JDK-8009575.
>
> So why did we need SymbolBase for 8009575 (the previous version) but 
> don't need it for 8087143 (the current RFE)?
>
> After the fix for 8009575, we needed 8 bytes of header for Symbol, so 
> if Symbol was declared as this:
>
>     class Symbol : public MetaspaceObj {
>        ATOMIC_SHORT_PAIR(
>          volatile short _refcount,  // needs atomic operation
>          unsigned short _length     // number of UTF8 characters in
>     the symbol (does not need atomic op)
>        );
>        int _identity_hash;
>        jbyte _body[1];
>     }
>
> It would occupy 9 bytes. Most compilers would automatically pad it up 
> to 12 bytes.
>
> I could have written Symbol as the following to consider the 
> pad-to-12-bytes case:
>
>     class Symbol : public MetaspaceObj {
>        ATOMIC_SHORT_PAIR(
>          volatile short _refcount,  // needs atomic operation
>          unsigned short _length     // number of UTF8 characters in
>     the symbol (does not need atomic op)
>        );
>        int _identity_hash;
>        jbyte _body[4];    // align up to 12 bytes
>
>        static int size(int length) {
>          size_t sz = heap_word_size(sizeof(Symbol) + (length > 4 ?
>     length - 4 : 0));
>          return align_object_size(sz);
>        }
>     }
>
> However, the amount of structure padding by the C compiler is not 
> well-defined. Although most of our compiler would pad to 12 bytes, 
> some compiler may decide to pad to 16 byte. While this is highly 
> unlikely, I didn't want to take the chance. Explicitly splitting out a 
> SymbolBase class, which is 8 bytes, would make our size calculation 
> always correct.
>
> Anyway, for the current fix, we reduced the Symbol header to only 6 
> bytes. Adding a body[2] will make sizeof(Symbol) to 8 bytes. Virtually 
> no compiler would add any padding to a 8-byte structure, so using 
> sizeof(Symbol) is good enough.
>
> Please trace the Symbol::size(int) function inside gdb to verify that 
> it performs the calculation correctly.
>
>>      Now we save 3 bytes for every symbol (for symbol whose length is 
>> greater than 1).
>>   4) remove java.cpp change, which is not needed, as Coleen pointed 
>> out, we can use -XX:+PrintStringTableStatistics to get this data, 
>> which also is a product flag.
>>   5) for .jsa files:
>>        specjbb2005-org.jsa:   4026368
>>        specjbb2005.jsa:          4018176
>>        Saved about 8K.
>>
> Here is the output extracted from "java  -Xshare:dump 
> -XX:+PrintSharedSpaces" on 64-bit Linux.
>
> Before any changes:
>
> Symbol              :    50913    1925736  27.4 | 0          0   0.0 
> |    50913    1925736  10.5
>
> After your webrev02/
>
> Symbol              :    50913    1889520  27.0 | 0          0   0.0 
> |    50913    1889520  10.3
>
> After changing _body[1] => _body[2]
>
> Symbol              :    50913    1856248  26.6 | 0          0   0.0 
> |    50913    1856248  10.1
>
> The average Symbol size changed from
>
>        37.82 bytes to 37.11 (-0.71) to 36.45 (-1.37)
>
> As I mentioned in the bug report, the reduction (1.37) is smaller than 
> the theoretical value (2 bytes) because we have many small symbols, 
> and metaspace.cpp enforces a 24-byte minimum allocation size for each 
> MetaspaceObj on 64-bit platforms.
>
> Thanks
> - Ioi
>
>
>> Thanks
>> Yumin
>>
>>
>> On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>>>
>>> Hi,  This is okay, one comment below.
>>>
>>> On 6/19/15 12:21 PM, Yumin Qi wrote:
>>>> Coleen,
>>>>
>>>>   Thanks.
>>>>
>>>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>>>
>>>>> Yumin,
>>>>>
>>>>> If the identity hash isn't unique enough you get a system 
>>>>> dictionary warning like this, which doesn't stop the JVM.   Could 
>>>>> you check your test logs to see if this warning is output in any 
>>>>> of them (jtreg keeps logs around, just checking the jtreg test 
>>>>> results should be sufficient, I think).
>>>>>
>>>>> Log : HotSpot(TM) Server VM warning: Performance bug: 
>>>>> SystemDictionary lookup_count=6918 lookup_length=3907 
>>>>> average=0.564759 load=0.266601
>>>>>
>>>>> I think with including _body[0] in the function you may make the 
>>>>> hash random enough.   So this was one of my concerns with this 
>>>>> change.
>>>>>
>>>> I will check that.
>>>>> In 
>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>>>
>>>>> Are you sure you want all of the SymbolTable dump?  Maybe it 
>>>>> should be under PrintSymbolTableSizeStatistics instead?
>>>>>
>>>> I will check and make change based on the output of statistics, 
>>>> maybe it is too much to print out all SymbolTable.
>>>
>>> It's way more than anyone needs normally.   Maybe only print it if 
>>> one adds +Verbose to PrintStringTableSizeStatistics and when we do 
>>> logging, it'll be -Xlog:symboltable=trace or a lower logging level 
>>> than info.
>>>
>>> Thanks,
>>> Coleen
>>>>> In 
>>>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>>>
>>>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>>>
>>>>> I'm not sure what that comment meant.  With SymbolBase gone (which 
>>>>> I think looks nicer), can you still correctly calculate the space 
>>>>> needed?
>>>>>
>>>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>>>> The original calculation is not correct I think:  if symbol length 
>>>> > 0, we allocate *_ONE_* extra byte for it:
>>>> -    size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? length : 0));
>>>> +    size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
>>>>
>>>>> Other than these things, this change looks good.  Nice to find 
>>>>> some space savings with something so frequent as symbols.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>>>> Ioi,
>>>>>>
>>>>>>   Thanks for review, I will add that (for print out purpose, not 
>>>>>> in the final change?)
>>>>>>
>>>>>> Yumin
>>>>>>
>>>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> I wonder why the average symbol size didn't change. They are 
>>>>>>> both about 34.25 bytes long.
>>>>>>>
>>>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces 
>>>>>>> show about 1.3 bytes reduction per symbol.
>>>>>>>
>>>>>>> Also, could you add some code to print out the statistics of the 
>>>>>>> system dictionary to show if there's any change in the hashtable 
>>>>>>> balance?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>>>
>>>>>>>> Please review the small change for
>>>>>>>>
>>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>>>
>>>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase), 
>>>>>>>> it is used to compute hash bucket index by modulus division of 
>>>>>>>> table size. Currently in hotspot, no table size is more than 
>>>>>>>> 65535 so we can use short instead.  For case with table size 
>>>>>>>> over 65535 we can use the first two bytes of symbol data to be 
>>>>>>>> as the upper 16 bits for the calculation but rare cases.
>>>>>>>>
>>>>>>>> This is a minor change, also eliminates SymbolBase which seems 
>>>>>>>> redundant.  Also add print out SymbolTable statistics under 
>>>>>>>> debug flag PrintSystemDictionaryAtExit.
>>>>>>>>
>>>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>>>
>>>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>>>
>>>>>>>> Before Change:
>>>>>>>> SymbolTable statistics:
>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>>>> Number of entries       :     49914 =   1597248 bytes, avg 32.000
>>>>>>>> Number of literals      :     49914 =   1709688 bytes, avg 34.253
>>>>>>>> Total footprint         :           =   3627112 bytes
>>>>>>>> Average bucket size     :     2.494
>>>>>>>> Variance of bucket size :     2.483
>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>> Maximum bucket size     :        10
>>>>>>>>
>>>>>>>> After Change:
>>>>>>>>
>>>>>>>> SymbolTable statistics:
>>>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>>>> Number of entries       :     49911 =   1597152 bytes, avg 32.000
>>>>>>>> Number of literals      :     49911 =   1709544 bytes, avg 34.252
>>>>>>>> Total footprint         :           =   3626872 bytes
>>>>>>>> Average bucket size     :     2.494
>>>>>>>> Variance of bucket size :     2.483
>>>>>>>> Std. dev. of bucket size:     1.576
>>>>>>>> Maximum bucket size     :        10
>>>>>>>>
>>>>>>>> There is no big change for the hashtable balance.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Yumin
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list