Should OutOfMemoryError from NIO direct memory honor JVM flags for handling OOME?

Fri Sep 23 22:24:21 UTC 2022

Thank you both for the feedback.

> I realise the focus of JDK-8294052 has narrowed
> since it was created but I think it would be fair to say that it would
> probably require going through them case by case to see if it make sense.

Yes. I completely agree that the OOMEs should be analyzed case by case.
Also completely agree that the OOMEs for large array allocations do not
need to handle these JVM flags.

> I think there is a case for triggering a heap dump or doing the other
> OOME actions when a direct buffer can't be allocated.
> ...

It sounds like I could proceed sending the RFR to support these JVM flags
for direct memory OOME?
In our case, the application sets -XX:MaxDirectMemorySize and it ran out of
this limit.
While bumping up MaxDirectMemorySize could solve the problem, we'd like to
understand where the increase comes from. We already monitor direct memory
usage via BufferPoolMXBean, but it is not sufficient to dig into the root
cause. We really heap dumps to understand the root cause.

-Man

On Fri, Sep 23, 2022 at 2:14 AM Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Hi Alan,
>
> On Fri, Sep 23, 2022 at 10:21 AM Alan Bateman <Alan.Bateman at oracle.com>
> wrote:
>
>> On 23/09/2022 00:26, Man Cao wrote:
>> > Hi all,
>> >
>> > I recently opened an RFE
>> > (https://bugs.openjdk.org/browse/JDK-8294052) to make JVM flags such
>> > as -XX:+HeapDumpOnOutOfMemoryError and -XX:+ExitOnOutOfMemoryError
>> > work for OutOfMemoryErrors from filling up the NIO direct memory.
>> > Supporting HeapDumpOnOutOfMemoryError would greatly help users debug
>> > OOMEs from NIO direct memory. In our case, the OOMEs from direct
>> > memory are infrequent and unpredictable, and it is quite infeasible to
>> > manually trigger a heap dump just before those OOMEs happen.
>> >
>> > However, [~dholmes] seems a bit wary of supporting those JVM flags for
>> > NIO direct memory, as those JVM flags are currently for OOMEs thrown
>> > from the JVM, and for OOMEs about Java heap and metaspace, which are
>> > both managed by the JVM.
>> >
>> > Do you think it is a good idea to support those JVM flags for OOMEs
>> > from NIO direct memory? Are there any concerns that we shouldn't
>> > support them?
>> >
>>
>> There are many places in the library code that throw OOME.
>>
>> One common case in APIs is where some parameters or conditions require
>> allocating an array larger than 2Gb. That's not possible of course so
>> some exception needs to be thrown and it is usually OOME (this is
>> specified in some cases, not in others). The error can be confusing as
>> there isn't really an out of memory (java or native heap). These cases
>> are probably not good candidates to trigger a heap dump or cause the VM
>> to exit.
>>
>> There are other cases (200+) where malloc or some other allocation in
>> native code fails. These throw OOME (via JNU_ThrowOutOfMemoryError as
>> you've probably found). There have been debates over the years as to
>> whether OOME is the right error to throw as seeing this exception in a
>> log can can make some some people think they need to increase -Xmx when
>> the issue is somewhere else. If the native heap is completely exhausted
>> then there's a good chance the VM will crash or exit quickly. Should a
>> heap dump or these OOME trigger the VM to exit? Maybe in some cases
>> where native resources are being kept alive but some object that would
>> be expected to be GC'ed. I realise the focus of JDK-8294052 has narrowed
>> since it was created but I think it would be fair to say that it would
>> probably require going through them case by case to see if it make sense.
>>
>> I think there is a case for triggering a heap dump or doing the other
>> OOME actions when a direct buffer can't be allocated. Direct memory
>> isn't managed by the VM but we know that hitting the limit means there
>> are objects keeping the direct buffers alive and there is tuning (via a
>> XX option) that may be required for some applications, and maybe a
>> memory leak in others. Related is that we added BufferPoolMXBean in Java
>> 7 so that direct memory could be monitored by JMX tooling on the same
>> level as memory pools managed by the VM.
>>
>> -Alan
>>
>
> Thank you for the detailed explanation. Now I remember again how complex
> this issue was.
>
> For us, even if a heap dump can be of limited use when analyzing e.g.
> thread start errors, the other OOM actions can still be quite useful. Our
> customers often just wish for a way to quickly exit the VM, to be able to
> restart it. Or to crash it quickly, to get a hs-err file with information
> which are often more geared toward analyzing native OOM situations (e.g.
> rlimits, NMT, metaspace report, or the process' pmap...).
>
> We found that customers are often confused that ..OnOutOfMemory switches
> don't work for all cases of OOMEs. They come up with different mechanics,
> e.g. monitoring JVMTI ResourceExhausted. Which also does not fire for every
> resource exhaustion, e.g. if a native OOM occurs in the compiler.
>
> Out of desperation, we added a facility do our downstream VM to react on
> impending resource exhaustion with, among other things, the ability to fire
> up jcmds automatically. Before the process gets OOM-killed and vanishes
> without a trace (
> https://github.com/SAP/SapMachine/wiki/SapMachine-High-Memory-Reports) -
> similar to SIGDANGER on some proprietary Unices.
>
> Cheers, Thomas
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/nio-dev/attachments/20220923/792222c6/attachment-0001.htm>