RFR: JDK-8291878: NMT: Malloc limits [v3]
Thomas Stuefe
stuefe at openjdk.org
Wed Aug 24 08:02:00 UTC 2022
On Tue, 23 Aug 2022 20:26:46 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> (Second try, I withdrew the first patch https://github.com/openjdk/jdk/pull/9778 to do some other NMT improvements first)
>>
>> This PR introduces malloc limits, similar to what the ancient `-XX:MallocMaxTestWords` did intend. `MallocMaxTestWords` is broken, but the solution proposed in this PR works fine since it is based on NMT. I plan to remove `-XX:MallocMaxTestWords`, but in a separate RFE since it requires fiddling with compiler tests.
>>
>> ### Why this is useful:
>>
>> This allows us to generate a fatal error in the VM if the malloc amount - either globally or for a speicific NMT category - surpasses a given threshold.
>>
>> We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused compiler arena OOMs. A switch to limit compiler-related mallocs would have been very nice to cause a fatal error in the compiler thread, together with a replay file. I first tried `MallocMaxTestWords`, but that's broken since it does not de-account memory allocations.
>>
>> In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests, adding malloc usage envelopes to tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.
>>
>> ### How it works:
>>
>> Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:
>>
>> 1 impose a total global limit to the size hotspot is allowed to malloc:
>>
>> -XX:MallocLimit=<size>
>>
>> 2 impose a limit to a selected NMT category, or to multiple NMT categories:
>>
>> -XX:MallocLimit=<category>:<size>[,<category>:<size>...]
>>
>>
>> If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:
>>
>>
>> thomas at starfish$ ./images/jdk/bin/java -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
>> SHOT.jar
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
>> # guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824)
>> #
>> ...
>> # An error report file with more information is saved as:
>> # /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
>> #
>> # Compiler replay data is saved as:
>> # /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
>> #
>>
>>
>> ### Costs:
>>
>> The feature requires that NMT checks the current number of allocated bytes via a given limit, on each malloc.
>>
>> - NMT disabled:
>> - no costs
>> - NMT enabled but malloc limits are not used
>> - we pay a very small memory overhead (138 bytes)
>> - we don't pay performance
>> - NMT enabled and malloc limits are also enabled
>> - if we have category-specific limits (e.g. `-XX:MallocLimit=compiler:1g`), the performance cost is so small it cannot be measured even in micro benchmarks. Which makes sense since we just compare two values that are likely to be cached by registers at the time of comparison.
>> - Only if we have global limits (e.g. `-XX:MallocLimit=1g`) we pay a noticeable performance overhead. That is because on each malloc we need to add ~15 atomic counters to get the total malloc use (*). Noticeable means that with renaissance "philosophers" benchmark we see a performance drop of about 4%. This can be alleviated by future improvements of NMT that I have planned (only bottleneck for NMT improvement is the number of Reviewers willing to look at it).
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> Feedback Alexey
Thank you Aleksey.
x86 errors unrelated.
-------------
PR: https://git.openjdk.org/jdk/pull/9891
More information about the hotspot-runtime-dev
mailing list