RFR(S): 8087143: Reduce Symbol::_identity_hash to 2 bytes

Aleksey Shipilev aleksey.shipilev at oracle.com
Mon Jun 22 08:49:10 UTC 2015


On 06/18/2015 11:46 PM, Yumin Qi wrote:
> Please review the small change for
> 
> bug: https://bugs.openjdk.java.net/browse/JDK-8087143
> webrev: http://cr.openjdk.java.net/~minqi/8087143/webrev01/
> 
> Summary: _identity_hash is an integer in Symbol (SymbolBase), it is used
> to compute hash bucket index by modulus division of table size.
> Currently in hotspot, no table size is more than 65535 so we can use
> short instead.  For case with table size over 65535 we can use the first
> two bytes of symbol data to be as the upper 16 bits for the calculation
> but rare cases.

Wait a minute.

 1) We know there are users who deal with overloaded system hash tables
by increasing their sizes (bucket count) beyond defaults, and into >64K
area. The symbol table performance in that area is important. Therefore,
the considerations for entropy beyond 64K are important, and should not
be discounted as rare case.

 2) This change seems to make a hash function much more heavy-weight
than before. But hash functions are supposed to be fast. At very least,
ditch the division [1], and do something along the lines of:

 unsigned upper = (unsigned)((uintptr_t)(this) >>
LogMinObjAlignmentInBytes);

 3) Pointer addresses, especially when allocated in arena-ed allocators,
are known to have low entropy (randomness). Mixing in _body[0] would
help only so much. It might be better to ditch "this", and the mix more
_body values again? Note that it is better to concat the values rather
than xor them to get more randomness.

Hence, I would go with this hash:

  unsigned identity_hash() {
    return (unsigned)_identity_hash
         | ((unsigned)_body[0]) << 16);
         | ((unsigned)_body[1]) << 24);
  }

This also yields a nicer generated code [2].

Or, if we feel lucky today, make a single load [3]:

  unsigned identity_hash() {
    return (unsigned)_identity_hash
            | ((((jshort*)_body)[0]) << 16);
  }


Thanks,
-Aleksey

[1] webrev.02 version; gcc 4.8.2, x86_64, __attribute__ ((noinline))
added to Symbol::identity_hash:

0000000000565ad0 <_ZN6Symbol13identity_hashEv>:
  ; %rcx = ObjectAlignmentInBytes
  565ad0:       48 8d 0d 69 a0 9a 00    lea    0x9aa069(%rip),%rcx

  565ad7:       55                      push   %rbp
  565ad8:       48 89 f8                mov    %rdi,%rax

  ; %edx = 0
  565adb:       31 d2                   xor    %edx,%edx

  ; %rbp = "this"
  565add:       48 89 e5                mov    %rsp,%rbp

  ; "this"/ObjectAlignmentInBytes, result in %eax
  565ae0:       48 f7 31                divq   (%rcx)

  ; sign-extended load of a ***byte*** _body[0] to %edx
  565ae3:       0f be 57 06             movsbl 0x6(%rdi),%edx

  565ae7:       5d                      pop    %rbp

  ; %edx = (upper ^ _body[0])
  565ae8:       31 c2                   xor    %eax,%edx

  ; %eax = _identity_hash
  565aea:       0f bf 47 04             movswl 0x4(%rdi),%eax

  ; %edx = (upper ^ _body[0]) << 16
  565aee:       c1 e2 10                shl    $0x10,%edx

  ; %eax = _identity_hash + (upper ^ _body[0]) << 16
  565af1:       01 d0                   add    %edx,%eax

  565af3:       c3                      retq


[2] proposed version; gcc 4.8.2, x86_64, __attribute__ ((noinline))
added to Symbol::identity_hash:

0000000000565ad0 <_ZN6Symbol13identity_hashEv>:
  ; %eax = _body[0]
  565ad0:       0f be 47 06             movsbl 0x6(%rdi),%eax

  ; %edx = _body[1]
  565ad4:       0f be 57 07             movsbl 0x7(%rdi),%edx

  565ad8:       55                      push   %rbp
  565ad9:       48 89 e5                mov    %rsp,%rbp

  ; %edx = _body[1] << 24
  565adc:       c1 e2 18                shl    $0x18,%edx

  ; %edx = _body[0] << 16
  565adf:       c1 e0 10                shl    $0x10,%eax

  ; %eax = (_body[0] << 16) | (_body[0] << 24)
  565ae2:       09 d0                   or     %edx,%eax

  ; %edx = _identity_hash
  565ae4:       0f bf 57 04             movswl 0x4(%rdi),%edx
  565ae8:       5d                      pop    %rbp

  ; % eax = _identity_hash | (_body[0] << 16) | (_body[1] << 24)
  565ae9:       09 d0                   or     %edx,%eax

  565aeb:       c3                      retq

[3] proposed version #2; gcc 4.8.2, x86_64, __attribute__ ((noinline))
added to Symbol::identity_hash:

0000000000565ad0 <_ZN6Symbol13identity_hashEv>:
  ; %eax = (_body[0], _body[1])
  565ad0:       0f bf 47 06             movswl 0x6(%rdi),%eax

  ; %edx = _identity_hash
  565ad4:       0f bf 57 04             movswl 0x4(%rdi),%edx

  565ad8:       55                      push   %rbp
  565ad9:       48 89 e5                mov    %rsp,%rbp

  ; %eax = (_body[0], _body[1]) << 16
  565adc:       c1 e0 10                shl    $0x10,%eax

  ; %eax = _identity_hash | (_body[0], _body[1])
  565adf:       09 d0                   or     %edx,%eax

  565ae1:       5d                      pop    %rbp
  565ae2:       c3                      retq




More information about the hotspot-runtime-dev mailing list