<div dir="ltr"><div>Thank you both for the feedback.</div><div><br></div>> I realise the focus of JDK-8294052 has narrowed<br>> since it was created but I think it would be fair to say that it would<br>> probably require going through them case by case to see if it make sense.<div><br></div><div>Yes. I completely agree that the OOMEs should be analyzed case by case.</div><div>Also completely agree that the OOMEs for large array allocations do not need to handle these JVM flags.</div><div><br></div><div>> I think there is a case for triggering a heap dump or doing the other<br>> OOME actions when a direct buffer can't be allocated.</div><div>> ...</div><div><br></div><div>It sounds like I could proceed sending the RFR to support these JVM flags for direct memory OOME?</div><div>In our case, the application sets -XX:MaxDirectMemorySize and it ran out of this limit.</div><div>While bumping up MaxDirectMemorySize could solve the problem, we'd like to understand where the increase comes from. We already monitor direct memory usage via BufferPoolMXBean, but it is not sufficient to dig into the root cause. We really heap dumps to understand the root cause.</div><div><br></div><div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">-Man</div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 23, 2022 at 2:14 AM Thomas Stüfe <<a href="mailto:thomas.stuefe@gmail.com">thomas.stuefe@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Alan,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 23, 2022 at 10:21 AM Alan Bateman <<a href="mailto:Alan.Bateman@oracle.com" target="_blank">Alan.Bateman@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 23/09/2022 00:26, Man Cao wrote:<br>
> Hi all,<br>
><br>
> I recently opened an RFE <br>
> (<a href="https://bugs.openjdk.org/browse/JDK-8294052" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8294052</a>) to make JVM flags such <br>
> as -XX:+HeapDumpOnOutOfMemoryError and -XX:+ExitOnOutOfMemoryError <br>
> work for OutOfMemoryErrors from filling up the NIO direct memory.<br>
> Supporting HeapDumpOnOutOfMemoryError would greatly help users debug <br>
> OOMEs from NIO direct memory. In our case, the OOMEs from direct <br>
> memory are infrequent and unpredictable, and it is quite infeasible to <br>
> manually trigger a heap dump just before those OOMEs happen.<br>
><br>
> However, [~dholmes] seems a bit wary of supporting those JVM flags for <br>
> NIO direct memory, as those JVM flags are currently for OOMEs thrown <br>
> from the JVM, and for OOMEs about Java heap and metaspace, which are <br>
> both managed by the JVM.<br>
><br>
> Do you think it is a good idea to support those JVM flags for OOMEs <br>
> from NIO direct memory? Are there any concerns that we shouldn't <br>
> support them?<br>
><br>
<br>
There are many places in the library code that throw OOME.<br>
<br>
One common case in APIs is where some parameters or conditions require <br>
allocating an array larger than 2Gb. That's not possible of course so <br>
some exception needs to be thrown and it is usually OOME (this is <br>
specified in some cases, not in others). The error can be confusing as <br>
there isn't really an out of memory (java or native heap). These cases <br>
are probably not good candidates to trigger a heap dump or cause the VM <br>
to exit.<br>
<br>
There are other cases (200+) where malloc or some other allocation in <br>
native code fails. These throw OOME (via JNU_ThrowOutOfMemoryError as <br>
you've probably found). There have been debates over the years as to <br>
whether OOME is the right error to throw as seeing this exception in a <br>
log can can make some some people think they need to increase -Xmx when <br>
the issue is somewhere else. If the native heap is completely exhausted <br>
then there's a good chance the VM will crash or exit quickly. Should a <br>
heap dump or these OOME trigger the VM to exit? Maybe in some cases <br>
where native resources are being kept alive but some object that would <br>
be expected to be GC'ed. I realise the focus of JDK-8294052 has narrowed <br>
since it was created but I think it would be fair to say that it would <br>
probably require going through them case by case to see if it make sense.<br>
<br>
I think there is a case for triggering a heap dump or doing the other <br>
OOME actions when a direct buffer can't be allocated. Direct memory <br>
isn't managed by the VM but we know that hitting the limit means there <br>
are objects keeping the direct buffers alive and there is tuning (via a <br>
XX option) that may be required for some applications, and maybe a <br>
memory leak in others. Related is that we added BufferPoolMXBean in Java <br>
7 so that direct memory could be monitored by JMX tooling on the same <br>
level as memory pools managed by the VM.<br>
<br>
-Alan<br></blockquote><div><br></div><div>Thank you for the detailed explanation. Now I remember again how complex this issue was.</div><div><br></div><div>For us, even if a heap dump can be of limited use when analyzing e.g. thread start errors, the other OOM actions can still be quite useful. Our customers often just wish for a way to quickly exit the VM, to be able to restart it. Or to crash it quickly, to get a hs-err file with information which are often more geared toward analyzing native OOM situations (e.g. rlimits, NMT, metaspace report, or the process' pmap...).</div><div><br></div><div>We found that customers are often confused that ..OnOutOfMemory switches don't work for all cases of OOMEs. They come up with different mechanics, e.g. monitoring JVMTI ResourceExhausted. Which also does not fire for every resource exhaustion, e.g. if a native OOM occurs in the compiler.</div><div><br></div><div>Out of desperation, we added a facility do our downstream VM to react on impending resource exhaustion with, among other things, the ability to fire up jcmds automatically. Before the process gets OOM-killed and vanishes without a trace (<a href="https://github.com/SAP/SapMachine/wiki/SapMachine-High-Memory-Reports" target="_blank">https://github.com/SAP/SapMachine/wiki/SapMachine-High-Memory-Reports</a>) - similar to SIGDANGER on some proprietary Unices.</div><div><br></div><div>Cheers, Thomas</div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div></div></div>
</blockquote></div>