RFR(S): 8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop

Mon Oct 20 11:04:17 UTC 2014

On 2014-10-18 07:29, David Holmes wrote:
> Hi Albert,
>
> As using the JIT is just an optimization, an OOME condition in the JIT
> should not be reported to the Java level code. It should be handled
> internally with suitable warnings and fallback - as happens elsewhere
> (but preferably not with vm_exit_out_of_memory! :) )

The VM can run out of metaspace for two reasons:
1. The user set the flag -XX:MaxMetaspaceSize to value, e.g. 128m.
2. The VM has used up all of the native memory available for the java
    process (we can't commit more memory).

If the VM is operating in scenario two, then the process will most 
likely have to throw an OOME at some point, either due to a metaspace 
allocation request or because some other subsystem of the VM tries to 
allocate native memory and fails.

If the user has set the MaxMetaspaceSize, then they have apparently 
chosen the wrong value, since their Java application needs more 
metaspace than they think it does. Unless the application, or the VM, 
can free metaspace memory, the VM will continue in a rather bad state:
- no JIT compilation, only iterpretation
- no metaspace allocation, i.e. no class loading
Please note that just having the interpreter/compiler stop allocating 
metaspace will not give us back more metaspace.

If I had written a Java application that encountered either of these two 
scenarios, then I would prefer that the application gets an OOME, 
because something is seriously wrong with my Java process, and I would 
prefer to find that out as soon as possible (a log message might be missed).

However, I'm not familiar with the specification, so I can't comment on 
if throwing an OOME in either one of this situations is allowed? There 
might also be some middle-ground here that I'm not aware of, something 
that is more critical than a warning on stderr but not as dramatic as 
throwing an OOME.

Thanks,
Erik

> David
>
> On 17/10/2014 11:18 PM, Albert Noll wrote:
>> Hi,
>>
>> could I get reviews for this patch:
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8037842
>>
>> Problem:
>> If the interpreter (or the compilers) fail to allocate from metaspace
>> (e.g., to allocate a MDO), the exception
>> is cleared and - as a result - not reported to the Java application. Not
>> propagating the OOME to the Java application
>> can lead to a serious performance regression, since every attempt to
>> allocate from metaspace (if we have run out
>> of metaspace and a full GC cannot free memory) triggers another full GC.
>> Consequently, the application continues
>> to run and schedules full GCs until (1) a critical allocation (one that
>> throws an OOME) fails, or (2) the application finishes
>> normally (successfully). Note that the VM can continue to execute
>> without allocating MethodCounters or MDOs.
>>
>> Solution 1:
>> Report OOME to the Java application. This solution avoids handling the
>> problem (running a large number of full GCs)
>> in the VM by passing the problem over to the the Java application. I.e.,
>> the performance regression is solved by
>> throwing an OOME. The only way to make the application run is to re-run
>> the application with a larger (yet unknown)
>> metaspace size. However, the application could have continued to run
>> (with an undefined performance drop).
>>
>> Note that the metaspace size in the failing test case is artificially
>> small (20m). Should we change the default behavior of Hotspot
>> to fix such a corner case?
>>
>> Also, I am not sure if throwing an OOME in such a case makes Hotspot
>> conform with the Java Language Specification.
>> The Specification says:
>>
>> "Asynchronous exceptions occur only as a result of:
>>
>> An internal error or resource limitation in the Java Virtual Machine
>> that prevents
>> it from implementing the semantics of the Java programming language. In
>> this
>> case, the asynchronous exception that is thrown is an instance of a
>> subclass of
>> VirtualMachineError"
>>
>> An OOME is an asynchronous exception. As I understand the paragraph
>> above, we are only allowed to throw an asynchronous
>> exception, if we are not able to "implement the semantics of the Java
>> programming language". Not being able to run the JIT
>> compiler does not seem to constrain the semantics of the Java language.
>>
>> Solution 2:
>> If allocation from metaspace fails, we (1) report a warning to the user
>> and (2) do not try to allocate MethodCounters and MDO
>> (as well as all other non-critical metaspace allocations) and thereby
>> avoid the overhead from running full GCs. As a result, the
>> application can continue to run. I have not yet worked on such a
>> solution. I just bring this up for discussion.
>>
>> Testing:
>> JPRT
>>
>> Webrev:
>> Here is the webrev for Solution 1. Please note that I am not familiar
>> with this part of the code.
>>
>> http://cr.openjdk.java.net/~anoll/8037842/webrev.00/
>>
>> May thanks in advance,
>> Albert
>>