RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
Yumin Qi
yumin.qi at oracle.com
Sat Jun 20 00:20:16 UTC 2015
Hi, Coleen and Ioi
The revised webrev:
http://cr.openjdk.java.net/~minqi/8087143/webrev02/
In this version, I have tested with Specjbb2005:
1) change _body to one byte, contribute _body[0] to the calculation
of identity_hash. No rehash request found during the test. Like the
Warnings mentioned by Coleen in her email;
2) Following data is the output, no. of literals is same as no. of
entries which refers to no. of symbol.
before:
SymbolTable statistics:
Number of buckets : 20011 = 320176 bytes, avg 16.000
Number of entries : 2016 = 64512 bytes, avg 32.000
Number of literals : 2016 = 89104 bytes, avg 44.198
Total footprint : = 473792 bytes
Average bucket size : 0.101
Variance of bucket size : 0.100
Std. dev. of bucket size: 0.316
Maximum bucket size : 3
after:
SymbolTable statistics:
Number of buckets : 20011 = 320176 bytes, avg 16.000
Number of entries : 2021 = 64672 bytes, avg 32.000
Number of literals : 2021 = 87544 bytes, avg 43.317
Total footprint : = 472392 bytes
Average bucket size : 0.101
Variance of bucket size : 0.100
Std. dev. of bucket size: 0.316
Maximum bucket size : 3
About no. of literals, the calculation of the size is:
template <class T, MEMFLAGS F> int RehashableHashtable<T,
F>::literal_size(Symbol *symbol) {
return symbol->size() * HeapWordSize;
}
And the size is just the _length of the symbol which I think is the
length of the characters, not including Symbol itself. Cannot use this
data to display how many bytes we saved.
3) correct a inaccurate calculation of size for Symbol in old version.
size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ?
length : 0));
It will allocate one extra byte when length > 0 since we already
reserve one byte for _body.
Now we save 3 bytes for every symbol (for symbol whose length is
greater than 1).
4) remove java.cpp change, which is not needed, as Coleen pointed
out, we can use -XX:+PrintStringTableStatistics to get this data, which
also is a product flag.
5) for .jsa files:
specjbb2005-org.jsa: 4026368
specjbb2005.jsa: 4018176
Saved about 8K.
Thanks
Yumin
On 6/19/2015 11:21 AM, Coleen Phillimore wrote:
>
> Hi, This is okay, one comment below.
>
> On 6/19/15 12:21 PM, Yumin Qi wrote:
>> Coleen,
>>
>> Thanks.
>>
>> On 6/19/2015 5:14 AM, Coleen Phillimore wrote:
>>>
>>> Yumin,
>>>
>>> If the identity hash isn't unique enough you get a system dictionary
>>> warning like this, which doesn't stop the JVM. Could you check your
>>> test logs to see if this warning is output in any of them (jtreg
>>> keeps logs around, just checking the jtreg test results should be
>>> sufficient, I think).
>>>
>>> Log : HotSpot(TM) Server VM warning: Performance bug:
>>> SystemDictionary lookup_count=6918 lookup_length=3907
>>> average=0.564759 load=0.266601
>>>
>>> I think with including _body[0] in the function you may make the
>>> hash random enough. So this was one of my concerns with this change.
>>>
>> I will check that.
>>> In
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/runtime/java.cpp.udiff.html
>>>
>>> Are you sure you want all of the SymbolTable dump? Maybe it should
>>> be under PrintSymbolTableSizeStatistics instead?
>>>
>> I will check and make change based on the output of statistics, maybe
>> it is too much to print out all SymbolTable.
>
> It's way more than anyone needs normally. Maybe only print it if one
> adds +Verbose to PrintStringTableSizeStatistics and when we do
> logging, it'll be -Xlog:symboltable=trace or a lower logging level
> than info.
>
> Thanks,
> Coleen
>>> In
>>> http://cr.openjdk.java.net/~minqi/8087143/webrev01/src/share/vm/oops/symbol.hpp.udiff.html
>>>
>>> -// We separate the fields in SymbolBase from Symbol::_body so that
>>> -// Symbol::size(int) can correctly calculate the space needed.
>>>
>>> I'm not sure what that comment meant. With SymbolBase gone (which I
>>> think looks nicer), can you still correctly calculate the space needed?
>>>
>> Yes, now the _body changed from 1 byte array to 2 bytes array.
>> The original calculation is not correct I think: if symbol length >
>> 0, we allocate *_ONE_* extra byte for it:
>> - size_t sz = heap_word_size(sizeof(SymbolBase) + (length > 0 ? length : 0));
>> + size_t sz = heap_word_size(sizeof(Symbol) + (length > 2 ? length - 2 : 0));
>>
>>> Other than these things, this change looks good. Nice to find some
>>> space savings with something so frequent as symbols.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 6/18/15 8:40 PM, Yumin Qi wrote:
>>>> Ioi,
>>>>
>>>> Thanks for review, I will add that (for print out purpose, not in
>>>> the final change?)
>>>>
>>>> Yumin
>>>>
>>>> On 6/18/2015 3:34 PM, ioi.lam at oracle.com wrote:
>>>>> Hi Yumin,
>>>>>
>>>>> I wonder why the average symbol size didn't change. They are both
>>>>> about 34.25 bytes long.
>>>>>
>>>>> When I tested the change, -Xshare:dump -XX:+PrintSharedSpaces show
>>>>> about 1.3 bytes reduction per symbol.
>>>>>
>>>>> Also, could you add some code to print out the statistics of the
>>>>> system dictionary to show if there's any change in the hashtable
>>>>> balance?
>>>>>
>>>>> Thanks
>>>>>
>>>>>> On Jun 18, 2015, at 1:46 PM, Yumin Qi <yumin.qi at oracle.com> wrote:
>>>>>>
>>>>>> Please review the small change for
>>>>>>
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
>>>>>>
>>>>>> Summary: _identity_hash is an integer in Symbol (SymbolBase), it
>>>>>> is used to compute hash bucket index by modulus division of table
>>>>>> size. Currently in hotspot, no table size is more than 65535 so
>>>>>> we can use short instead. For case with table size over 65535 we
>>>>>> can use the first two bytes of symbol data to be as the upper 16
>>>>>> bits for the calculation but rare cases.
>>>>>>
>>>>>> This is a minor change, also eliminates SymbolBase which seems
>>>>>> redundant. Also add print out SymbolTable statistics under debug
>>>>>> flag PrintSystemDictionaryAtExit.
>>>>>>
>>>>>> Tests: JPRT, Specjbb2005.
>>>>>>
>>>>>> The SymbolTable statistics for specjbb2005:
>>>>>>
>>>>>> Before Change:
>>>>>> SymbolTable statistics:
>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>> Number of entries : 49914 = 1597248 bytes, avg 32.000
>>>>>> Number of literals : 49914 = 1709688 bytes, avg 34.253
>>>>>> Total footprint : = 3627112 bytes
>>>>>> Average bucket size : 2.494
>>>>>> Variance of bucket size : 2.483
>>>>>> Std. dev. of bucket size: 1.576
>>>>>> Maximum bucket size : 10
>>>>>>
>>>>>> After Change:
>>>>>>
>>>>>> SymbolTable statistics:
>>>>>> Number of buckets : 20011 = 320176 bytes, avg 16.000
>>>>>> Number of entries : 49911 = 1597152 bytes, avg 32.000
>>>>>> Number of literals : 49911 = 1709544 bytes, avg 34.252
>>>>>> Total footprint : = 3626872 bytes
>>>>>> Average bucket size : 2.494
>>>>>> Variance of bucket size : 2.483
>>>>>> Std. dev. of bucket size: 1.576
>>>>>> Maximum bucket size : 10
>>>>>>
>>>>>> There is no big change for the hashtable balance.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Yumin
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list