RFR: JDK-8293313: NMT: Rework MallocLimit [v4]
Gerard Ziemski
gziemski at openjdk.org
Tue Jan 24 21:10:09 UTC 2023
On Tue, 24 Jan 2023 08:14:31 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Rework NMT `MallocLimit` to make it more useful for native OOM analysis. Also remove `MallocMaxTestWords`, which had been mostly redundant with NMT (for background, see JDK-8293292).
>>
>>
>> ### Background
>>
>> Some months ago we had [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a compiler regression that caused compiler arenas to grow boundlessly. Process was usually OOM-killed before we could take a look, so this was difficult to analyze.
>>
>> To improve analysis options, we added NMT *malloc limits* with [JDK-8291878](https://bugs.openjdk.org/browse/JDK-8291878). Malloc limits let us set one or multiple limits to malloc load. Surpassing a limit causes the VM to stop with a fatal error in the allocation function, giving us a hs-err file and a core right at the point that (probably) causes the leak. This makes error analysis a lot simpler, and is also valuable for regression-testing footprint usage.
>>
>> Some people wished for a way to not end with a fatal error but to mimic a native OOM (causing os::malloc to return NULL). This would allow us to test resilience in the face of native OOMs, among other things.
>>
>> ### Patch
>>
>> - Expands the `MallocLimit` option syntax, allowing for an "oom mode" that mimics an oom:
>>
>>
>>
>> Global form:
>> -XX:MallocLimit=<size>[:<mode>]
>> Category-specific form:
>> -XX:MallocLimit=<category>:<size>[:<mode>][,<category>:<size>[:<mode>] ...]
>> Examples:
>> -XX:MallocLimit=3g
>> -XX:MallocLimit=3g:oom
>> -XX:MallocLimit=compiler:200m:oom
>>
>>
>> - moves parsing of `-XX:MallocLimit` out of arguments.cpp into `mallocLimit.hpp/cpp`, and rewrites it.
>>
>> - changes when limits are checked. Before, the limits were checked *after* the allocation that caused the threshold overflow. Now we check beforehand.
>>
>> - removes `MallocMaxTestWords`, replacing its uses with `MallocLimit`
>>
>> - adds new gtests and new jtreg tests
>>
>> - removes a bunch of jtreg tests which are now better served by the new gtests.
>>
>> - gives us more precise error messages upon reaching a limit. For example:
>>
>> before:
>>
>> # fatal error: MallocLimit: category "Compiler" reached limit (size: 20997680, limit: 20971520)
>>
>>
>> now:
>>
>> # fatal error: MallocLimit: reached category "Compiler" limit (triggering allocation size: 1920B, allocated so far: 20505K, limit: 20480K)
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert strchrnul
Made more progress on the review today, tomorrow I have the parser to review and java tests...
src/hotspot/share/runtime/globals.hpp line 1344:
> 1342: "Number of exits until ZombieALot kicks in") \
> 1343: \
> 1344: product(ccstr, MallocLimit, nullptr, DIAGNOSTIC, \
Pre-existing, but I wish this flag was named `NMTMallocLimit`, not `MallocLimit`.
src/hotspot/share/services/mallocTracker.inline.hpp line 35:
> 33: #include "utilities/vmError.hpp"
> 34:
> 35: static inline bool suppress_limit_handling() {
Why did we bother to wrap `VMError::is_error_reported()` into `suppress_limit_handling()`?
Are you anticipating more exclusions here in the future?
src/hotspot/share/services/mallocTracker.inline.hpp line 56:
> 54: size_t so_far = as_snapshot()->total();
> 55: if ((so_far + s) > l->sz) { // hit the limit
> 56: if (!suppress_limit_handling()) {
I would prefer to see `suppress_limit_handling()` checked inside `total_limit_reached()`, same for `category_limit_reached()`. This way we would force those APIs to always obey the suppression without having to bother to add `suppress_limit_handling()`, they could call `VMError::is_error_reported()` directly.
-------------
Changes requested by gziemski (Committer).
PR: https://git.openjdk.org/jdk/pull/11371
More information about the hotspot-dev
mailing list