RFR: JDK-8256844: Make NMT late-initializable

Tue Jul 27 23:02:55 UTC 2021

On 28/07/2021 12:17 am, Thomas Stuefe wrote:
> On Mon, 26 Jul 2021 21:08:04 GMT, David Holmes <david.holmes at oracle.com> wrote:
> 
>> Before looking at this, have you checked the startup performance impact?
>>
>> Thanks,
>> David
>> -----
> 
> Hi David,
> 
> performance should not be a problem. The potentially costly thing is the underlying hashmap. But we keep it operating with a very small load factor.
> 
> More details:
> 
> Adding entries is O(1). Since during pre-init phase almost only adds happen, startup time is not affected. Still, to make sure this is true, I did a bunch of tests:
> 
> - tested WCT of a helloworld, no differences with and without patch
> - tested startup time in various of ways, no differences
> - repeated those tests with 25000 (!) VM arguments, the only way to influence the number of pre-init allocations. No differences (VM gets slower with and without patch).
> 
> ----
> 
> The expensive thing is lookup since we potentially need to walk a very full hashmap. Lookup affects post-init more than pre-init.
> 
> To get an idea of the cost of a too-full preinit lookup table, I modified the VM to do a configurable number of pre-init test-allocations, with the intent of artificially inflating the lookup table. Then, after NMT initialization, I measured the cost of lookup. The short story, I was not able to measure anything, even with a million pre-init allocations. Of course, with more allocations lookup table got fuller and the VM got slower, but the time increase was caused by the cost of the malloc calls themselves, not the table lookup.
> 
> Finally, I did an isolated test for the lookup table, testing pure adding and retrieval cost with artificial values. There, I could see costs for add were static (as expected), and lookup cost increased with table population. On my machine:
> 
> | lu table entries            | time per lookup |
> | ------ |:-------------:|
> | 1000    | 3 ns         |
> | 1 mio   | 240 ns       |
> 
> As you can see, if lookup table population goes beyond 1 mio entries, lookup time starts being noticeable over background noise. But with these numbers, I am not worried. Standard lookup population should be around *300-500*, with very long command lines resulting in table populations of *~1000*. We should never seen 10000 entries, let alone millions of them.
> 
> Still, I added a jtreg test to verify the expected hash table population. To catch errors like an unforeseen mass of pre-init allocations (lets say a leak or badly written code sneaked in), or if the hash algorithm suddenly is not good anymore.
> 
> Two more points
> 
> 1) I kept this coding deliberately simple. If we are really worried about a degenerated lookup table, we can do things to fix that:
>   - we could automatically resize and rehash
>   - we could, if we sense something wrong, just stop filling it and disable NMT, stopping NMT init phase prematurely at the cost of not being able to use NMT.
>   
> The latter I had implemented already but removed it again to keep complexity down, and because I saw no need.
> 
> 2) In our propietary production VM we have a system similar to NMT, but predating it. In that system we don't use malloc headers but store all (millions of) malloc'ed pointers in a big hash map. It performs excellent on *all our libc variants*. It is so fast that we just leave it always switched on. This solution has been productive since >10 years, and therefore I am confident that this is viable. This proposed hashmap with a planned population of 300-1000 is really not much :)

Thanks Thomas! I appreciate the detailed investigation.

Cheers,
David

> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/4874
>