RFR(S): 8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop

Tue Oct 21 01:17:52 UTC 2014

On 21/10/2014 5:11 AM, Vladimir Kozlov wrote:
> Inability to allocate in metaspace is different from allocation in
> codecache.
>
> The last one is JIT specific and needs only warning since whole java
> process is (almost) not impacted.
>
> The first one should produce OOM the same ways as it was when we had
> PermGen.
>
> I think the solution 1 is correct.

Throwing an asynchronous exception is very bad and should always be a 
last resort. Such exceptions can lead to corrupt state very easily.

IMHO problems encountered by the JIT should not manifest as Java-level 
exceptions. I would also consider them (regardless of whether they have 
always been there) a violation of the spec as quoted by Albert.

David

> Thanks,
> Vladimir
>
> On 10/17/14 6:18 AM, Albert Noll wrote:
>> Hi,
>>
>> could I get reviews for this patch:
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8037842
>>
>> Problem:
>> If the interpreter (or the compilers) fail to allocate from metaspace
>> (e.g., to allocate a MDO), the exception
>> is cleared and - as a result - not reported to the Java application. Not
>> propagating the OOME to the Java application
>> can lead to a serious performance regression, since every attempt to
>> allocate from metaspace (if we have run out
>> of metaspace and a full GC cannot free memory) triggers another full GC.
>> Consequently, the application continues
>> to run and schedules full GCs until (1) a critical allocation (one that
>> throws an OOME) fails, or (2) the application finishes
>> normally (successfully). Note that the VM can continue to execute
>> without allocating MethodCounters or MDOs.
>>
>> Solution 1:
>> Report OOME to the Java application. This solution avoids handling the
>> problem (running a large number of full GCs)
>> in the VM by passing the problem over to the the Java application. I.e.,
>> the performance regression is solved by
>> throwing an OOME. The only way to make the application run is to re-run
>> the application with a larger (yet unknown)
>> metaspace size. However, the application could have continued to run
>> (with an undefined performance drop).
>>
>> Note that the metaspace size in the failing test case is artificially
>> small (20m). Should we change the default behavior of Hotspot
>> to fix such a corner case?
>>
>> Also, I am not sure if throwing an OOME in such a case makes Hotspot
>> conform with the Java Language Specification.
>> The Specification says:
>>
>> "Asynchronous exceptions occur only as a result of:
>>
>> An internal error or resource limitation in the Java Virtual Machine
>> that prevents
>> it from implementing the semantics of the Java programming language. In
>> this
>> case, the asynchronous exception that is thrown is an instance of a
>> subclass of
>> VirtualMachineError"
>>
>> An OOME is an asynchronous exception. As I understand the paragraph
>> above, we are only allowed to throw an asynchronous
>> exception, if we are not able to "implement the semantics of the Java
>> programming language". Not being able to run the JIT
>> compiler does not seem to constrain the semantics of the Java language.
>>
>> Solution 2:
>> If allocation from metaspace fails, we (1) report a warning to the user
>> and (2) do not try to allocate MethodCounters and MDO
>> (as well as all other non-critical metaspace allocations) and thereby
>> avoid the overhead from running full GCs. As a result, the
>> application can continue to run. I have not yet worked on such a
>> solution. I just bring this up for discussion.
>>
>> Testing:
>> JPRT
>>
>> Webrev:
>> Here is the webrev for Solution 1. Please note that I am not familiar
>> with this part of the code.
>>
>> http://cr.openjdk.java.net/~anoll/8037842/webrev.00/
>>
>> May thanks in advance,
>> Albert
>>