Bits.reserveMemory OutOfMemoryError does not seem to trigger -XX:OnOutOfMemoryError

Wed Dec 18 01:31:55 UTC 2024

Hi Steven,

The -XX OOM relating flags are only for OOM conditions directly detected 
by the VM itself - please also see:

https://bugs.openjdk.org/browse/JDK-8257790

Unfortunately the java mapage documentation didn't get updated to make 
this clear, but I will address that. We did clarify in the source [1]:

   product(ccstrlist, OnOutOfMemoryError, "", 
     \
           "Run user-defined commands on first 
java.lang.OutOfMemoryError "  \
           "thrown from JVM") 
     \

You need to handle Java triggered OOM conditions in the Java code.

David
-----

[1] https://bugs.openjdk.org/browse/JDK-8258058

On 18/12/2024 9:52 am, Steven Schlansker wrote:
> Hi hotspot-dev,
> 
> In our continuing mission to explore strange new VM memory limits
> (aren't containers fun?), we have encountered a situation where a
> un-serviceable direct memory allocation request leaves the running
> application in a live but unusable state. In the container world, we
> expect to run into resource misconfigurations from time to time, it
> seems to be a fact of life for the moment. But having the app unable
> to recover sucks.
> 
> We run:
> openjdk 23.0.1+11
> netty 4.1.115
> 
> A big workload spike comes in, and suddenly we allocate a lot of
> memory, and run out:
> java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct
> buffer memory (allocated: 804069357, limit: 805306368) at
> java.base/java.nio.Bits.reserveMemory(Bits.java:178) at
> java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:111)
> at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:363)
> at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:693)
> 
> We configure our jvm with
> '-XX:OnError=bin/crasher %p' '-XX:OnOutOfMemoryError=bin/crasher %p'
> 
> where crasher is a shell script that dumps various things and runs
> kill -9 on the process to ensure recovery by kubernetes starting a new
> container.
> 
> The java help page says,
> 
>> -XX:OnOutOfMemoryError=string Sets a custom command or a series of semicolon-separated commands to run when an OutOfMemoryError exception is first thrown. ...
> 
>  From a simple reading of this, it sounds like this Bits.reserveMemory
> OOM should trigger the OnOutOfMemoryError handler. We would expect the
> behavior to invoke the error handler, leading to a kill signal.
> 
> Instead, the program proceeds. Something quickly goes wrong with
> reference counting inside of the Lettuce Redis client:
> 
> Caused by: java.lang.NullPointerException: Cannot invoke
> "io.netty.buffer.ByteBuf.refCnt()" because "this.buffer" is null at
> io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:597)
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
> 
> and then the whole application wedges. I reported this separately
> (https://github.com/redis/lettuce/issues/3087).
> 
> While it will be nice to improve the Lettuce / Netty handling of OOME,
> we felt like our configuration of OnOutOfMemoryError should have
> covered this case - the help message doesn't qualify a particular type
> of OOME, like "out of Java heap memory" - and a clean kill of the
> process would have reduced a multi-hour outage (until a human could
> notice + respond) to moments while the process restarts.
> 
> Is this an appropriate expectation? I'm happy to file an issue if this
> would be considered a bug.
> 
> Thank you for your consideration.