RFR: JDK-8261302: NMT: Improve malloc site table hashing

Thomas Stuefe stuefe at openjdk.java.net
Tue Feb 9 07:24:57 UTC 2021


While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved.

The current hash algorithm uses the 32bit masked sum of all stack entries as hash.

I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains.

The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp:

https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118

wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups.

---

With that out of the way, there are still small things we can improve about the hash function:

Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter.

When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation.

See difference (linux x86):

Before:

Dump of assembler code for function NativeCallStack::hash() const:
=> 0x00007ffff68092f0 <+0>:     mov    0x20(%rdi),%eax        <load hash
   0x00007ffff68092f3 <+3>:     push   %rbp
   0x00007ffff68092f4 <+4>:     mov    %rsp,%rbp              
   0x00007ffff68092f7 <+7>:     test   %eax,%eax              <0?->X
   0x00007ffff68092f9 <+9>:     jne    0x7ffff6809324 <NativeCallStack::hash() const+52>
   0x00007ffff68092fb <+11>:    mov    (%rdi),%rdx
   0x00007ffff68092fe <+14>:    test   %rdx,%rdx
   0x00007ffff6809301 <+17>:    je     0x7ffff6809330 <NativeCallStack::hash() const+64>
   0x00007ffff6809303 <+19>:    mov    0x8(%rdi),%rax
   0x00007ffff6809307 <+23>:    test   %rax,%rax
   0x00007ffff680930a <+26>:    je     0x7ffff680931f <NativeCallStack::hash() const+47>
   0x00007ffff680930c <+28>:    add    %rax,%rdx
   0x00007ffff680930f <+31>:    mov    0x10(%rdi),%rax
   0x00007ffff6809313 <+35>:    test   %rax,%rax
   0x00007ffff6809316 <+38>:    je     0x7ffff680931f <NativeCallStack::hash() const+47>
   0x00007ffff6809318 <+40>:    add    %rax,%rdx
   0x00007ffff680931b <+43>:    add    0x18(%rdi),%rdx
   0x00007ffff680931f <+47>:    mov    %edx,%eax
   0x00007ffff6809321 <+49>:    mov    %edx,0x20(%rdi)
   0x00007ffff6809324 <+52>:    pop    %rbp                   <X
   0x00007ffff6809325 <+53>:    retq   
   0x00007ffff6809326 <+54>:    nopw   %cs:0x0(%rax,%rax,1)
   0x00007ffff6809330 <+64>:    xor    %edx,%edx
   0x00007ffff6809332 <+66>:    jmp    0x7ffff680931f <NativeCallStack::hash() const+47>

hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic.

With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load.
The hash calculation gets now inlined into the constructor and uses 128bit xmm registers and packed integer adds:

Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool):
=> 0x00007ffff67cbba0 <+0>:     push   %rbp
   0x00007ffff67cbba1 <+1>:     mov    %rsp,%rbp
   0x00007ffff67cbba4 <+4>:     push   %rbx
   0x00007ffff67cbba5 <+5>:     mov    %rdi,%rbx
   0x00007ffff67cbba8 <+8>:     sub    $0x8,%rsp
   0x00007ffff67cbbac <+12>:    test   %dl,%dl
   0x00007ffff67cbbae <+14>:    movl   $0x0,0x20(%rdi)
   0x00007ffff67cbbb5 <+21>:    jne    0x7ffff67cbc10 <NativeCallStack::NativeCallStack(int, bool)+112>
   0x00007ffff67cbbb7 <+23>:    movq   $0x0,(%rdi)
   0x00007ffff67cbbbe <+30>:    movq   $0x0,0x8(%rdi)
   0x00007ffff67cbbc6 <+38>:    movq   $0x0,0x10(%rdi)
   0x00007ffff67cbbce <+46>:    movq   $0x0,0x18(%rdi)
   0x00007ffff67cbbd6 <+54>:    movdqu (%rbx),%xmm0          <<<<
   0x00007ffff67cbbda <+58>:    movdqu 0x10(%rbx),%xmm2      <<<<
   0x00007ffff67cbbdf <+63>:    shufps $0x88,%xmm2,%xmm0     <<<<
   0x00007ffff67cbbe3 <+67>:    movdqa %xmm0,%xmm1           <<<<
   0x00007ffff67cbbe7 <+71>:    psrldq $0x8,%xmm1            <<<<
   0x00007ffff67cbbec <+76>:    paddd  %xmm1,%xmm0           <<<<
   0x00007ffff67cbbf0 <+80>:    movdqa %xmm0,%xmm1           <<<<
   0x00007ffff67cbbf4 <+84>:    psrldq $0x4,%xmm1            <<<<
   0x00007ffff67cbbf9 <+89>:    paddd  %xmm1,%xmm0           <<<<
   0x00007ffff67cbbfd <+93>:    movd   %xmm0,0x20(%rbx)      <<<<
   0x00007ffff67cbc02 <+98>:    add    $0x8,%rsp
   0x00007ffff67cbc06 <+102>:   pop    %rbx
   0x00007ffff67cbc07 <+103>:   pop    %rbp
   0x00007ffff67cbc08 <+104>:   retq   
   0x00007ffff67cbc09 <+105>:   nopl   0x0(%rax)
   0x00007ffff67cbc10 <+112>:   mov    %esi,%edx
   0x00007ffff67cbc12 <+114>:   mov    $0x4,%esi
   0x00007ffff67cbc17 <+119>:   callq  0x7ffff6824960 <os::get_native_stack(unsigned char**, int, int)>
   0x00007ffff67cbc1c <+124>:   jmp    0x7ffff67cbbd6 <NativeCallStack::NativeCallStack(int, bool)+54>

Thanks, Thomas

-------------

Commit messages:
 - Remove stray include
 - Start

Changes: https://git.openjdk.java.net/jdk/pull/2473/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261302
  Stats: 29 lines in 2 files changed: 9 ins; 16 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2473.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2473/head:pull/2473

PR: https://git.openjdk.java.net/jdk/pull/2473


More information about the hotspot-dev mailing list