Reliability of JVM in face of "recoverable" Errors, e.g. out of code cache space

Tue Aug 6 02:08:18 UTC 2024

Hi Steven,

On 3/08/2024 5:11 am, Steven Schlansker wrote:
> Hi hotspot-dev,
> 
> Please let me know me if this is not an appropriate place to raise this kind of question -
> happy to move to another more appropriate list

This does seem like an issue with method linking in hotspot and so is 
appropriate. I would suggest filing a bug in JBS.

> We run the JVM (22.0.1) with many different application contexts loaded into one JVM, like an application server.
> This places a rather high demand on the code cache. We've observed warning messages about the
> code cache being full and compiler being disabled, so we increase our ReservedCodeCacheSize to make some space,
> and move on with life.
> 
> This week, we ran into a new type of failure, that is much more serious than a warning but non-fatal to the JVM.
> 
> Caused by: java.lang.ExceptionInInitializerError:
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class java.time.temporal.WeekFields
> Caused by: Exception java.lang.VirtualMachineError: Out of space in CodeCache for adapters
>   at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.printerParser(DateTimeFormatterBuilder.java:5264)
>   at java.base/java.time.format.DateTimeFormatterBuilder$WeekBasedFieldPrinterParser.format(DateTimeFormatterBuilder.java:5248)
>   at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2529)
>   at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1905)
>   at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1879)
>   at java.base/java.time.LocalDate.format(LocalDate.java:1797)
> ...
> 
> Once this happens, the affected classes (in this case the java.time infrastructure) is effectively dead for the remainder of JVM lifetime.

I don't know if something has changed, either in VM or in library code, 
that makes this more likely now, but the ability to throw an exception 
in this circumstance has existed for many, many years. That said, it 
seems inappropriate.

> As part of our JVM reliability configuration, we attempt to set
> -XX:OnError=/bin/gather-debuginfo-then-kill -9 %p
> to ensure that unexpected errors terminate the JVM, rather than leave it in an uncertain state.
> However, this particular VirtualMachineError does not seem to be triggering this OnError logic. Reading the docs, it seems
> that this is only triggered for 'irrecoverable' errors, which I guess this does not qualify as, since it triggers a userland exception
> not a hotspot dump.
> 
> However, trying to imagine how we would recover from such a situation, it's not clear at all what to do.
> At this point some arbitrary subset of classes are no longer usable, forever. Even logging a date could fail.
> Arguably, user code shouldn't be thinking about VirtualMachineError as a possibility at all, as what can
> you even trust to work afterward?

Yes this is a problem with exceptions that can happen during static 
initializers. From a JDK perspective we need to make our classes more 
robust in this area, and avoid potentially problematic API's if 
necessary. In this case I think we need to look at method linking and 
the role of adapters and see if this is truly a fatal condition for 
them, as it is not recoverable in any general sense by the user and can 
cripple arbitrary classes as evidenced here.

> The exception could be thrown in an arbitrary thread - maybe it's one we control, but maybe it's thrown in a background
> thread like a Jetty server or Redis client io thread. Where it is thrown is not predictable either, making it very hard to
> add a "catch" clause and terminate the JVM, since nearly any statement could fail.
> Most threads are careful to have a top-level catch and log, so the uncaught exception handler does not seem reliable either.
> 
> Ideally, I would turn on some VM option like '-XX:VMErrorIsAlwaysFatal' to trigger a hs dump, rather than ever seeing this
> sort of failure in userland.

There is the diagnostic -XX:AbortVMOnException=xxx flag, but here the 
xxx would be `OutOfMemoryError` and that may be too broad to be useful.

David
-----

> 
> How can a user application recover from such an error happening? (I think it cannot.) If we cannot recover, how can we reliably
> configure the JVM to crash completely if such an error happens? I suppose a debugger-like tool could breakpoint
> on throwing VirtualMachineError, or maybe an agent could transform the VME constructor, but this doesn't feel "production-ready".
> 
> Thank you for any advice!
> Steven Schlansker
>