RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

Yumin Qi yumin.qi at oracle.com
Sat Jun 20 00:20:16 UTC 2015


Hi, Coleen and Ioi

   The revised webrev:
   http://cr.openjdk.java.net/~minqi/8087143/webrev02/

   In this version, I have tested with Specjbb2005:
   1) change _body to one byte, contribute _body[0] to the calculation 
of identity_hash. No rehash request found during the test. Like the 
Warnings mentioned by Coleen in her email;
   2)  Following data is the output, no. of literals is same as no. of 
entries which refers to no. of symbol.

before:
SymbolTable statistics:
Number of buckets       :     20011 =    320176 bytes, avg  16.000
Number of entries       :      2016 =     64512 bytes, avg  32.000
Number of literals      :      2016 =     89104 bytes, avg  44.198
Total footprint         :           =    473792 bytes
Average bucket size     :     0.101
Variance of bucket size :     0.100
Std. dev. of bucket size:     0.316
Maximum bucket size     :         3

after:
SymbolTable statistics:
Number of buckets       :     20011 =    320176 bytes, avg  16.000
Number of entries       :      2021 =     64672 bytes, avg  32.000
Number of literals      :      2021 =     87544 bytes, avg  43.317
Total footprint         :           =    472392 bytes
Average bucket size     :     0.101
Variance of bucket size :     0.100
Std. dev. of bucket size:     0.316
Maximum bucket size     :         3

   About no. of literals, the calculation of the size is:

template <class T, MEMFLAGS F> int RehashableHashtable<T, 
F>::literal_size(Symbol *symbol) {
   return symbol->size() * HeapWordSize;
}

   And the size is just the _length of the symbol which I think is the 
length of the characters, not including  Symbol itself.  Cannot use this 
data to display how many bytes we saved.
   3) correct a inaccurate calculation of size for Symbol in old version.
       size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? 
length : 0));
       It will allocate one extra byte when length > 0 since we already 
reserve one byte for _body.
      Now we save 3 bytes for every symbol (for symbol whose length is 
greater than 1).
   4) remove java.cpp change, which is not needed, as Coleen pointed 
out, we can use -XX:+PrintStringTableStatistics to get this data, which 
also is a product flag.
   5) for .jsa files:
        specjbb2005-org.jsa:   4026368
        specjbb2005.jsa:          4018176
        Saved about 8K.

Thanks
Yumin


On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>
> Hi,  This is okay, one comment below.
>
> On 6/19/15 12:21 PM, Yumin Qi wrote:
>> Coleen,
>>
>>   Thanks.
>>
>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>
>>> Yumin,
>>>
>>> If the identity hash isn't unique enough you get a system dictionary 
>>> warning like this, which doesn't stop the JVM. Could you check your 
>>> test logs to see if this warning is output in any of them (jtreg 
>>> keeps logs around, just checking the jtreg test results should be 
>>> sufficient, I think).
>>>
>>> Log : HotSpot(TM) Server VM warning: Performance bug: 
>>> SystemDictionary lookup_count=6918 lookup_length=3907 
>>> average=0.564759 load=0.266601
>>>
>>> I think with including _body[0] in the function you may make the 
>>> hash random enough.   So this was one of my concerns with this change.
>>>
>> I will check that.
>>> In 
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>
>>> Are you sure you want all of the SymbolTable dump?  Maybe it should 
>>> be under PrintSymbolTableSizeStatistics instead?
>>>
>> I will check and make change based on the output of statistics, maybe 
>> it is too much to print out all SymbolTable.
>
> It's way more than anyone needs normally.   Maybe only print it if one 
> adds +Verbose to PrintStringTableSizeStatistics and when we do 
> logging, it'll be -Xlog:symboltable=trace or a lower logging level 
> than info.
>
> Thanks,
> Coleen
>>> In 
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>
>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>
>>> I'm not sure what that comment meant.  With SymbolBase gone (which I 
>>> think looks nicer), can you still correctly calculate the space needed?
>>>
>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>> The original calculation is not correct I think:  if symbol length > 
>> 0, we allocate *_ONE_* extra byte for it:
>> -    size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? length : 0));
>> +    size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
>>
>>> Other than these things, this change looks good.  Nice to find some 
>>> space savings with something so frequent as symbols.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>> Ioi,
>>>>
>>>>   Thanks for review, I will add that (for print out purpose, not in 
>>>> the final change?)
>>>>
>>>> Yumin
>>>>
>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>> Hi Yumin,
>>>>>
>>>>> I wonder why the average symbol size didn't change. They are both 
>>>>> about 34.25 bytes long.
>>>>>
>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces show 
>>>>> about 1.3 bytes reduction per symbol.
>>>>>
>>>>> Also, could you add some code to print out the statistics of the 
>>>>> system dictionary to show if there's any change in the hashtable 
>>>>> balance?
>>>>>
>>>>> Thanks
>>>>>
>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>
>>>>>> Please review the small change for
>>>>>>
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>
>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase), it 
>>>>>> is used to compute hash bucket index by modulus division of table 
>>>>>> size. Currently in hotspot, no table size is more than 65535 so 
>>>>>> we can use short instead.  For case with table size over 65535 we 
>>>>>> can use the first two bytes of symbol data to be as the upper 16 
>>>>>> bits for the calculation but rare cases.
>>>>>>
>>>>>> This is a minor change, also eliminates SymbolBase which seems 
>>>>>> redundant.  Also add print out SymbolTable statistics under debug 
>>>>>> flag PrintSystemDictionaryAtExit.
>>>>>>
>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>
>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>
>>>>>> Before Change:
>>>>>> SymbolTable statistics:
>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>> Number of entries       :     49914 =   1597248 bytes, avg 32.000
>>>>>> Number of literals      :     49914 =   1709688 bytes, avg 34.253
>>>>>> Total footprint         :           =   3627112 bytes
>>>>>> Average bucket size     :     2.494
>>>>>> Variance of bucket size :     2.483
>>>>>> Std. dev. of bucket size:     1.576
>>>>>> Maximum bucket size     :        10
>>>>>>
>>>>>> After Change:
>>>>>>
>>>>>> SymbolTable statistics:
>>>>>> Number of buckets       :     20011 =    320176 bytes, avg 16.000
>>>>>> Number of entries       :     49911 =   1597152 bytes, avg 32.000
>>>>>> Number of literals      :     49911 =   1709544 bytes, avg 34.252
>>>>>> Total footprint         :           =   3626872 bytes
>>>>>> Average bucket size     :     2.494
>>>>>> Variance of bucket size :     2.483
>>>>>> Std. dev. of bucket size:     1.576
>>>>>> Maximum bucket size     :        10
>>>>>>
>>>>>> There is no big change for the hashtable balance.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Yumin
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list