RFR(S): 8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop

Albert Noll albert.noll at oracle.com
Tue Oct 21 09:10:06 UTC 2014


Hi,

I agree with David.

Executing a Java program with 'java myProgram' does not imply that we 
*must* be able to execute the program using a JIT compiler.
Throwing an OOME because we are unable to compile a method (due to 
insufficient metaspace) can cause much more problems than it solves. For 
example,
I tried to execute the JVMSPEC2008 compiler.compiler benchmark as follows:

java -XX:+ExitOnMetaSpaceAllocFail -XX:MaxMetaspaceSize=16m -jar 
SPECjvm2008.jar -ikv -wt 10 -i 5 -it 60 compiler.compiler

If the "ExitOnMetaSpaceAllocFail" is enabled, the JVM exits if we are 
unable to allocate MethodCounters and MDO. On my laptop, 
compiler.compiler finishes *without a performance regression* if 
ExitOnMetaSpaceAllocFail is false. If ExitOnMetaSpaceAllocFail is true, 
the run does not finish because we are out of metaspace. Such a behavior 
is clearly a regression. Given the large number of customers we have, it 
seems likely that throwing an OOME will cause trouble.

In addition, I think that throwing an OOME exposes implementation 
details of hotspot to the user that are not easy to understand. To 
understand the cause of the OOME (and I think it is very important for 
the user to understand the behavior of the JVM) the user must know that 
Hotspot uses JIT compilers that store method profiles in metaspace. If 
we decide to throw an OOME, it will be hard to debug the program, since 
compilers are not deterministic in when a method is compiled. I.e., the 
customer can get OOMEs at random (asynchronous) places. From a 
serviceability point of view, throwing an OOME is a poor choice.

I think we could add an option that lets the user decide on the behavior 
(ExitOnMetaSpaceAllocFail). However, the default behavior should be 
according to the Spec, i.e., ExitOnMetaSpaceAllocFail should be 'false' 
by default. I think it is reasonable to assume that someone who knows 
about -XX:MaxMetaspaceSize will know about -XX:ExitOnMetaSpaceAllocFail. 
The argument that performance problems are 'hidden' by not throwing an 
OOME is valid, but can be mitigated by such a flag.

Thanks,
Albert


On 10/21/2014 03:17 AM, David Holmes wrote:
> On 21/10/2014 5:11 AM, Vladimir Kozlov wrote:
>> Inability to allocate in metaspace is different from allocation in
>> codecache.
>>
>> The last one is JIT specific and needs only warning since whole java
>> process is (almost) not impacted.
>>
>> The first one should produce OOM the same ways as it was when we had
>> PermGen.
>>
>> I think the solution 1 is correct.
>
> Throwing an asynchronous exception is very bad and should always be a 
> last resort. Such exceptions can lead to corrupt state very easily.
>
> IMHO problems encountered by the JIT should not manifest as Java-level 
> exceptions. I would also consider them (regardless of whether they 
> have always been there) a violation of the spec as quoted by Albert.
>
> David
>
>> Thanks,
>> Vladimir
>>
>> On 10/17/14 6:18 AM, Albert Noll wrote:
>>> Hi,
>>>
>>> could I get reviews for this patch:
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8037842
>>>
>>> Problem:
>>> If the interpreter (or the compilers) fail to allocate from metaspace
>>> (e.g., to allocate a MDO), the exception
>>> is cleared and - as a result - not reported to the Java application. 
>>> Not
>>> propagating the OOME to the Java application
>>> can lead to a serious performance regression, since every attempt to
>>> allocate from metaspace (if we have run out
>>> of metaspace and a full GC cannot free memory) triggers another full 
>>> GC.
>>> Consequently, the application continues
>>> to run and schedules full GCs until (1) a critical allocation (one that
>>> throws an OOME) fails, or (2) the application finishes
>>> normally (successfully). Note that the VM can continue to execute
>>> without allocating MethodCounters or MDOs.
>>>
>>> Solution 1:
>>> Report OOME to the Java application. This solution avoids handling the
>>> problem (running a large number of full GCs)
>>> in the VM by passing the problem over to the the Java application. 
>>> I.e.,
>>> the performance regression is solved by
>>> throwing an OOME. The only way to make the application run is to re-run
>>> the application with a larger (yet unknown)
>>> metaspace size. However, the application could have continued to run
>>> (with an undefined performance drop).
>>>
>>> Note that the metaspace size in the failing test case is artificially
>>> small (20m). Should we change the default behavior of Hotspot
>>> to fix such a corner case?
>>>
>>> Also, I am not sure if throwing an OOME in such a case makes Hotspot
>>> conform with the Java Language Specification.
>>> The Specification says:
>>>
>>> "Asynchronous exceptions occur only as a result of:
>>>
>>> An internal error or resource limitation in the Java Virtual Machine
>>> that prevents
>>> it from implementing the semantics of the Java programming language. In
>>> this
>>> case, the asynchronous exception that is thrown is an instance of a
>>> subclass of
>>> VirtualMachineError"
>>>
>>> An OOME is an asynchronous exception. As I understand the paragraph
>>> above, we are only allowed to throw an asynchronous
>>> exception, if we are not able to "implement the semantics of the Java
>>> programming language". Not being able to run the JIT
>>> compiler does not seem to constrain the semantics of the Java language.
>>>
>>> Solution 2:
>>> If allocation from metaspace fails, we (1) report a warning to the user
>>> and (2) do not try to allocate MethodCounters and MDO
>>> (as well as all other non-critical metaspace allocations) and thereby
>>> avoid the overhead from running full GCs. As a result, the
>>> application can continue to run. I have not yet worked on such a
>>> solution. I just bring this up for discussion.
>>>
>>> Testing:
>>> JPRT
>>>
>>> Webrev:
>>> Here is the webrev for Solution 1. Please note that I am not familiar
>>> with this part of the code.
>>>
>>> http://cr.openjdk.java.net/~anoll/8037842/webrev.00/
>>>
>>> May thanks in advance,
>>> Albert
>>>



More information about the hotspot-dev mailing list