RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
Yumin Qi
yumin.qi at oracle.com
Fri Jun 26 06:29:16 UTC 2015
I run with wls, and result is:
1) original:
SymbolTable statistics:
Number of buckets : 20011 = 160088 bytes, avg 8.000
Number of entries : 265300 = 6367200 bytes, avg 24.000
Number of literals : 265300 = 12325168 bytes, avg 46.457
Total footprint : = 18852456 bytes
Average bucket size : 13.258
Variance of bucket size : 13.209
Std. dev. of bucket size: 3.634
Maximum bucket size : 33
2) reduced to short and new identity_hash():
SymbolTable statistics:
Number of buckets : 20011 = 160088 bytes, avg 8.000
Number of entries : 265237 = 6365688 bytes, avg 24.000
Number of literals : 265237 = 11805440 bytes, avg 44.509
Total footprint : = 18331216 bytes
Average bucket size : 13.255
Variance of bucket size : 13.209
Std. dev. of bucket size: 3.634
Maximum bucket size : 33
They have similar balance data. The no. of symbols decreased 63.
Thanks
Yumin
On 6/24/2015 1:10 PM, Yumin Qi wrote:
> Updated webrev at 03 again. Please let me know if it is OK.
>
> Thanks
> Yumin
>
> On 6/24/2015 9:21 AM, Yumin Qi wrote:
>> Aleksey and Ioi,
>>
>> On 6/24/15, 2:55 AM, Aleksey Shipilev wrote:
>>> Hi Ioi,
>>>
>>> On 06/23/2015 10:28 PM, Ioi Lam wrote:
>>>> I assume by "address" you mean
>>>>
>>>> ((unsigned)((this)>> (LogMinObjAlignmentInBytes + 3)));
>>> Yes.
>>>
>>>> I think for performance reasons we are storing a random number in the
>>>> Symbol instead of computing the hashcode base on the string value.
>>>> The intention of this RFE not to change that, but just use fewer
>>>> random bits to accomplish the same goal. Yes, we are assuming a
>>>> particular hashtable implementation, but I don't think we are going
>>>> to change that any time soon.
>>> ...
>>>
>>>> And if we ever change the hash function
>>>> to heavily depend on the top 16 bits of the hash code, we can
>>>> certainly revert back to a 32-bit random number.
>>> This sounds awfully similar to introducing the technical debt.
>>> Succumbing to this practice means you will have to make an avalanche
>>> modification once you need to do a little change in hash table
>>> implementation, given somebody actually remembers to change the hash
>>> function. I'm not keen of introducing these dependencies without a
>>> *very* good reason to.
>>>
>>> In this example, if there is a hash function that returns a 32-bit
>>> value, please make sure the value is random in all 32 bits. If that
>>> proves hard, then the memory footprint savings (what, 1K out of 3626K,
>>> if I read the data right?) does not worth the hassle of complicating
>>> the
>>> hash function beyond reasonable, and introducing undocumented
>>> dependencies that will eventually backfire.
>> Currently the _identity_hash = os::random() which the random seed is
>> 1234567.
>> I think to make it better to have more random, we need to have a seed
>> as 65535? Since 0 is same as 65536 with a short and so on.
>>
>> I'm not familiar with the random algorithm (os:random())--- suppose
>> it will generate a random between 0 and _rand_seed?
>>
>>
>> Thanks
>> Yumin
>>
>>> Thanks,
>>> -Aleksey
>
More information about the hotspot-runtime-dev
mailing list