RFR: JDK-8293313: NMT: Rework MallocLimit [v4]

Gerard Ziemski gziemski at openjdk.org
Wed Jan 25 20:31:48 UTC 2023


On Tue, 24 Jan 2023 08:14:31 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Rework NMT `MallocLimit` to make it more useful for native OOM analysis. Also remove `MallocMaxTestWords`, which had been mostly redundant with NMT (for background, see JDK-8293292).
>> 
>> 
>> ### Background
>> 
>> Some months ago we had [JDK-8291919](https://bugs.openjdk.org/browse/JDK-8291919), a compiler regression that caused compiler arenas to grow boundlessly. Process was usually OOM-killed before we could take a look, so this was difficult to analyze.
>> 
>> To improve analysis options, we added NMT *malloc limits* with [JDK-8291878](https://bugs.openjdk.org/browse/JDK-8291878). Malloc limits let us set one or multiple limits to malloc load. Surpassing a limit causes the VM to stop with a fatal error in the allocation function, giving us a hs-err file and a core right at the point that (probably) causes the leak. This makes error analysis a lot simpler, and is also valuable for regression-testing footprint usage.
>> 
>> Some people wished for a way to not end with a fatal error but to mimic a native OOM (causing os::malloc to return NULL). This would allow us to test resilience in the face of native OOMs, among other things.
>> 
>> ### Patch
>> 
>> - Expands the `MallocLimit` option syntax, allowing for an "oom mode" that mimics an oom:
>> 
>> 
>> 
>> Global form:
>> -XX:MallocLimit=<size>[:<mode>]
>> Category-specific form:
>> -XX:MallocLimit=<category>:<size>[:<mode>][,<category>:<size>[:<mode>] ...]
>> Examples:
>> -XX:MallocLimit=3g
>> -XX:MallocLimit=3g:oom
>> -XX:MallocLimit=compiler:200m:oom
>> 
>> 
>> - moves parsing of `-XX:MallocLimit` out of arguments.cpp into `mallocLimit.hpp/cpp`, and rewrites it.
>> 
>> - changes when limits are checked. Before, the limits were checked *after* the allocation that caused the threshold overflow. Now we check beforehand.
>> 
>> - removes `MallocMaxTestWords`, replacing its uses with `MallocLimit`
>> 
>> - adds new gtests and new jtreg tests
>> 
>> - removes a bunch of jtreg tests which are now better served by the new gtests.
>> 
>> - gives us more precise error messages upon reaching a limit. For example:
>> 
>> before:
>> 
>> #  fatal error: MallocLimit: category "Compiler" reached limit (size: 20997680, limit: 20971520)
>> 
>> 
>> now:
>> 
>> #  fatal error: MallocLimit: reached category "Compiler" limit (triggering allocation size: 1920B, allocated so far: 20505K, limit: 20480K)
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert strchrnul

More feedback...

src/hotspot/share/services/mallocLimit.cpp line 150:

> 148:     }
> 149:   }
> 150: }

I am not crazy about:


static const char* flagnames[] = { "fatal", "oom" };


which requires that we position the strings in `flagnames` in a way that matches `MallocLimitMode` enum values:


enum class MallocLimitMode {
  trigger_fatal = 0,
  trigger_oom   = 1
};


Could we do instead:


void MallocLimitSet::print_on(outputStream* st) const {
  if (_glob.sz > 0) {
    st->print_cr("MallocLimit: total limit: " PROPERFMT " (%s)",
                 PROPERFMTARGS(_glob.sz), MallocLimitSet::limit_name(_glob.mode));
  } else {
    for (int i = 0; i < mt_number_of_types; i++) {
      if (_cat[i].sz > 0) {
        st->print_cr("MallocLimit: category "%s" limit: " PROPERFMT " (%s)",
          NMTUtil::flag_to_enum_name(NMTUtil::index_to_flag(i)),
          PROPERFMTARGS(_cat[i].sz), MallocLimitSet::limit_name(_cat[i].mode));
      }
    }
  }
}


and add new API to `MallocLimitSet`:


  static const char* const limit_name(MallocLimitMode l)  { return (l==MallocLimitMode::trigger_fatal)?"fatal":"oom"; }


This way we also avoid the `(int)` casting from the `enum`.

We could also potentially use `MallocLimitSet::limit_name()` in `match_mode_flag()` instead of hardcoding "oom" and "fatal"

-------------

Changes requested by gziemski (Committer).

PR: https://git.openjdk.org/jdk/pull/11371


More information about the hotspot-dev mailing list