RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes
Aleksey Shipilev
aleksey.shipilev at oracle.com
Tue Jun 23 11:08:02 UTC 2015
On 06/22/2015 11:48 PM, Ioi Lam wrote:
> Since it's absolutely unlikely that anyone would have a system
> dictionary (or any hash table) larger than 16M entries (even if we
> change SystemDictionary::calculate_systemdictionary_size), let's
> forget the top 8 bits, and just look at the bottom 24 bits of
> Symbol::identity_hash().
This is a technical debt: conditionalizing the choice of hashcode
function on an weird hash table implementation detail. If you don't want
to revisit this over and over again after it bites you, please create a
hash function that spits all the significant bits.
>>> I also have problem with "|", could it be more chance to have the
>>> results have more '1' s and make the case worse?
>>>
>> Oops, I think it should be this:
>>
>> unsigned identity_hash() {
>> unsigned lower = (unsigned)_identity_hash;
>> unsigned upper1 = ((unsigned)_body[0]);
>> unsigned upper2 = ((unsigned)_body[1]);
>> unsigned upper3 = ((unsigned)((uintptr_t)(this) >>
>> (LogMinObjAlignmentInBytes + 3)));
>> unsigned upper4 = ((unsigned)_length) ;
>> return lower | ((upper1 ^ upper2 ^ upper3 ^ upper4) << 16);
>> }
The rationale for "|" is simple: with proper shifts, it does concat, not
mixing. So you can spread the entropy of the sources over the entire
bitscale. In other words, you have to first maximize the value domain
for the result, and then try to maximize the the amount of source data
*each* result bit gets.
The baseline os::random() should pass the bit randomicity test (it is
one of the basic statistical tests for randomness), and if we are
augmenting it, we should strive to satisfy the same requirement.
For example, the hash function below is supposed to generate 32-bit
value, and we are arguing about the 16 bits upper word. Let's see:
// affects lower 8 bits
((unsigned)_body[0]);
// affects lower 8 bits, probably correlated with _body[0]
((unsigned)_body[1]);
// hopefully (!) affects 16 bits, but is it high entropy?
((unsigned)((this) >> (LogMinObjAlignmentInBytes + 3)));
// realistically, affects lower 4-10 bits
((unsigned)_length);
So you would like to cram the address at the entire 16-bit upword, and
the overlay other values over it to compensate for potentially low entropy:
identity_hash
| (address
^ (_length << 8) // cover higher bytes of address
^ (( _body[0] << 8) | _body[1])
) << 16
But if you want to go properly with choosing the hash function instead
of flying blind, you'd need to dump all these individual values for
symbols in a large workload, and simulate the simplest hash function,
instead of guessing. (There is a funny example in AoCP where Knuth
"built" the random function in the same manner by slapping pieces
together to gain more randomness, only to fail miserably)
Thanks,
-Aleksey
More information about the hotspot-runtime-dev
mailing list