RFR: JDK-8291878: NMT: Diagnostic malloc limits [v2]

David Holmes dholmes at openjdk.org
Mon Aug 8 00:47:43 UTC 2022


On Sat, 6 Aug 2022 12:14:45 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This PR introduces malloc limits, similar to what `MallocMaxTestWords` was intending. `MallocMaxTestWords` is broken, but this one works fine since it is based on NMT. If this one is in, I'd like to remove `MallocMaxTestWords` or, if people really care, redirect it to the new switch.
>> 
>> ----
>> 
>> Why this is useful:
>> 
>> We recently analyzed [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a jdk11u-specific regression that caused a compiler arena to explode. We used to have such problems in the past a lot, when our CPU ports were young. They are rarer nowadays but still happen.
>> 
>> A switch to limit compiler-related mallocs would have been nice: something to cause the VM to stop with a fatal error in the compiler allocation path when the compiler arena size reached a certain point. I first tried `MallocMaxTestWords`, but that turned out to be broken since it does not de-account memory allocations.
>> 
>> We finally managed to get a retry file by reproducing the bug locally and ulimit-ing the virtual process size, but it was cumbersome. A simple switch like `MallocMaxTestWords` would have been much better.
>> 
>> In addition to customer scenarios like these, such a switch could be used to add sanity checks to compiler jtreg tests. Maybe we could have caught [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919) before shipment.
>> 
>> -----
>> 
>> How it works:
>> 
>> Patch introduces a new diagnostic switch `-XX:MallocLimit`. That switch can be used in two ways:
>> 
>> 1 impose a total global limit to the size hotspot is allowed to malloc: 
>> 
>> -XX:MallocLimit=<size>
>> 
>> 2 impose a limit to a selected NMT category, or to multiple NMT categories: 
>> 
>> -XX:MallocLimit=<category>:<size>[,<category>:<size>...]
>> 
>> 
>> If the switch is set, and the VM mallocs more in total (1) or for the given category (2), it will now stop with a fatal error. That way we can e.g. limit compiler arenas to a certain maximum in situations where the compiler runs amok, and get a compiler retry file. See here, with an artificial compiler bug introduced:
>> 
>> 
>> thomas at starfish$ ./images/jdk/bin/java  -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:MallocLimit=compiler:1g -jar /shared/projects/spring-petclinic/target/spring-petclinic-2.5.0-SNAP
>> SHOT.jar
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (mallocTracker.cpp:146), pid=519822, tid=519836
>> #  guarantee(false) failed: MallocLimit: category "Compiler" reached limit (size: 1073765608, limit: 1073741824) 
>> #
>> ...
>> # An error report file with more information is saved as:
>> # /shared/projects/openjdk/jdk-jdk/output-release/hs_err_pid519822.log
>> #
>> # Compiler replay data is saved as:
>> # /shared/projects/openjdk/jdk-jdk/output-release/replay_pid519822.log
>> #
>> 
>> -----
>> 
>> The patch:
>> - adds the option and its handling to NMT
>> - adds regression tests.
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix Test

I can see there is some benefit for artificially forcing an OOM condition, but at what cost? What is the overhead of doing this?

It mentions this needs NMT but I don't see anything that rejects the option if NMT is not present. Also not clear if it needs NMT to be enabled to specific level of detail, or simply present.

Thanks.

-------------

PR: https://git.openjdk.org/jdk/pull/9778


More information about the hotspot-runtime-dev mailing list