Should OutOfMemoryError from NIO direct memory honor JVM flags for handling OOME?

Fri Sep 23 08:20:45 UTC 2022

On 23/09/2022 00:26, Man Cao wrote:
> Hi all,
>
> I recently opened an RFE 
> (https://bugs.openjdk.org/browse/JDK-8294052) to make JVM flags such 
> as -XX:+HeapDumpOnOutOfMemoryError and -XX:+ExitOnOutOfMemoryError 
> work for OutOfMemoryErrors from filling up the NIO direct memory.
> Supporting HeapDumpOnOutOfMemoryError would greatly help users debug 
> OOMEs from NIO direct memory. In our case, the OOMEs from direct 
> memory are infrequent and unpredictable, and it is quite infeasible to 
> manually trigger a heap dump just before those OOMEs happen.
>
> However, [~dholmes] seems a bit wary of supporting those JVM flags for 
> NIO direct memory, as those JVM flags are currently for OOMEs thrown 
> from the JVM, and for OOMEs about Java heap and metaspace, which are 
> both managed by the JVM.
>
> Do you think it is a good idea to support those JVM flags for OOMEs 
> from NIO direct memory? Are there any concerns that we shouldn't 
> support them?
>

There are many places in the library code that throw OOME.

One common case in APIs is where some parameters or conditions require 
allocating an array larger than 2Gb. That's not possible of course so 
some exception needs to be thrown and it is usually OOME (this is 
specified in some cases, not in others). The error can be confusing as 
there isn't really an out of memory (java or native heap). These cases 
are probably not good candidates to trigger a heap dump or cause the VM 
to exit.

There are other cases (200+) where malloc or some other allocation in 
native code fails. These throw OOME (via JNU_ThrowOutOfMemoryError as 
you've probably found). There have been debates over the years as to 
whether OOME is the right error to throw as seeing this exception in a 
log can can make some some people think they need to increase -Xmx when 
the issue is somewhere else. If the native heap is completely exhausted 
then there's a good chance the VM will crash or exit quickly. Should a 
heap dump or these OOME trigger the VM to exit? Maybe in some cases 
where native resources are being kept alive but some object that would 
be expected to be GC'ed. I realise the focus of JDK-8294052 has narrowed 
since it was created but I think it would be fair to say that it would 
probably require going through them case by case to see if it make sense.

I think there is a case for triggering a heap dump or doing the other 
OOME actions when a direct buffer can't be allocated. Direct memory 
isn't managed by the VM but we know that hitting the limit means there 
are objects keeping the direct buffers alive and there is tuning (via a 
XX option) that may be required for some applications, and maybe a 
memory leak in others. Related is that we added BufferPoolMXBean in Java 
7 so that direct memory could be monitored by JMX tooling on the same 
level as memory pools managed by the VM.

-Alan