RFR: JDK-8291878: NMT: Malloc limits

Mon Aug 22 15:55:45 UTC 2022

On Tue, 16 Aug 2022 11:14:05 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> (Second try, I withdrew the first patch https://github.com/openjdk/jdk/pull/9778 to do some other NMT improvements first)
> 
> This PR introduces malloc limits, similar to what the ancient `-XX:MallocMaxTestWords` did intend. `MallocMaxTestWords` is broken, but the solution proposed in this PR works fine since it is based on NMT. I plan to remove `-XX:MallocMaxTestWords`, but in a separate RFE since it requires fiddling with compiler tests.
> 
> ### Why this is useful:
> 
> We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused compiler arena OOMs. We used to have such problems in the past a lot when our CPU ports were young. They are rarer nowadays but still happen.
> 
> A switch to limit compiler-related mallocs would have been nice: something to cause the VM to stop with a fatal error in the compiler allocation path when the compiler arena size reached a certain point. I first tried `MallocMaxTestWords`, but that turned out to be broken since it does not de-account memory allocations.
> 
> We finally managed to get a retry file by reproducing the bug locally and ulimit-ing the virtual process size, but it was cumbersome. A simple switch like `MallocMaxTestWords` would have been much better.
> 
> In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.
> 
> ### How it works:
> 
> Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:
> 
> 1 impose a total global limit to the size hotspot is allowed to malloc: 
> 
> -XX:MallocLimit=<size>
> 
> 2 impose a limit to a selected NMT category, or to multiple NMT categories: 
> 
> -XX:MallocLimit=<category>:<size>[,<category>:<size>...]
> 
> 
> If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:
> 
> 
> thomas at starfish$ ./images/jdk/bin/java  -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
> SHOT.jar
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
> #  guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824) 
> #
> ...
> # An error report file with more information is saved as:
> # /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
> #
> # Compiler replay data is saved as:
> # /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
> #
> 
> 
> ### Costs:
> 
> The feature requires that NMT checks the current number of allocated bytes via a given limit, on each malloc. 
> 
> - NMT disabled:
>   - no costs
> - NMT enabled but malloc limits are not used
>   - we pay a very small memory overhead (138 bytes)
>   - we don't pay performance
> - NMT enabled and malloc limits are also enabled
>   - if we have category-specific limits (e.g. `-XX:MallocLimit=compiler:1g`), the performance cost is so small it cannot be measured even in micro benchmarks. Which makes sense since we just compare two values that are likely to be cached by registers at the time of comparison.
>   - Only if we have global limits (e.g. `-XX:MallocLimit=1g`) we pay a noticeable performance overhead. That is because on each malloc we need to add ~15 atomic counters to get the total malloc use (*). Noticeable means that with renaissance "philosophers" benchmark we see a performance drop of about 4%. This can be alleviated by future improvements of NMT that I have planned (only bottleneck for NMT improvement is the number of Reviewers willing to look at it).

Friendly ping.

-------------

PR: https://git.openjdk.org/jdk/pull/9891