RFR(S): 8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop

Vitaly Davidovich vitalyd at gmail.com
Tue Oct 21 12:35:22 UTC 2014


+1, as a user.  I think an OOME from the VM should come if something
critical cannot proceed due to some space exhaustion.  PermGen was a
hotspot implementation detail (I.e. it's not a java or even a JVM
standard,  of course) so I don't think it sets any sort of precedent that
needs to be followed.  A loud warning to stdout/stderr indicating the issue
and stating that performance may degrade is sufficient.

Sent from my phone
On Oct 21, 2014 5:10 AM, "Albert Noll" <albert.noll at oracle.com> wrote:

> Hi,
>
> I agree with David.
>
> Executing a Java program with 'java myProgram' does not imply that we
> *must* be able to execute the program using a JIT compiler.
> Throwing an OOME because we are unable to compile a method (due to
> insufficient metaspace) can cause much more problems than it solves. For
> example,
> I tried to execute the JVMSPEC2008 compiler.compiler benchmark as follows:
>
> java -XX:+ExitOnMetaSpaceAllocFail -XX:MaxMetaspaceSize=16m -jar
> SPECjvm2008.jar -ikv -wt 10 -i 5 -it 60 compiler.compiler
>
> If the "ExitOnMetaSpaceAllocFail" is enabled, the JVM exits if we are
> unable to allocate MethodCounters and MDO. On my laptop, compiler.compiler
> finishes *without a performance regression* if ExitOnMetaSpaceAllocFail is
> false. If ExitOnMetaSpaceAllocFail is true, the run does not finish because
> we are out of metaspace. Such a behavior is clearly a regression. Given the
> large number of customers we have, it seems likely that throwing an OOME
> will cause trouble.
>
> In addition, I think that throwing an OOME exposes implementation details
> of hotspot to the user that are not easy to understand. To understand the
> cause of the OOME (and I think it is very important for the user to
> understand the behavior of the JVM) the user must know that Hotspot uses
> JIT compilers that store method profiles in metaspace. If we decide to
> throw an OOME, it will be hard to debug the program, since compilers are
> not deterministic in when a method is compiled. I.e., the customer can get
> OOMEs at random (asynchronous) places. From a serviceability point of view,
> throwing an OOME is a poor choice.
>
> I think we could add an option that lets the user decide on the behavior
> (ExitOnMetaSpaceAllocFail). However, the default behavior should be
> according to the Spec, i.e., ExitOnMetaSpaceAllocFail should be 'false' by
> default. I think it is reasonable to assume that someone who knows about
> -XX:MaxMetaspaceSize will know about -XX:ExitOnMetaSpaceAllocFail. The
> argument that performance problems are 'hidden' by not throwing an OOME is
> valid, but can be mitigated by such a flag.
>
> Thanks,
> Albert
>
>
> On 10/21/2014 03:17 AM, David Holmes wrote:
>
>> On 21/10/2014 5:11 AM, Vladimir Kozlov wrote:
>>
>>> Inability to allocate in metaspace is different from allocation in
>>> codecache.
>>>
>>> The last one is JIT specific and needs only warning since whole java
>>> process is (almost) not impacted.
>>>
>>> The first one should produce OOM the same ways as it was when we had
>>> PermGen.
>>>
>>> I think the solution 1 is correct.
>>>
>>
>> Throwing an asynchronous exception is very bad and should always be a
>> last resort. Such exceptions can lead to corrupt state very easily.
>>
>> IMHO problems encountered by the JIT should not manifest as Java-level
>> exceptions. I would also consider them (regardless of whether they have
>> always been there) a violation of the spec as quoted by Albert.
>>
>> David
>>
>>  Thanks,
>>> Vladimir
>>>
>>> On 10/17/14 6:18 AM, Albert Noll wrote:
>>>
>>>> Hi,
>>>>
>>>> could I get reviews for this patch:
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8037842
>>>>
>>>> Problem:
>>>> If the interpreter (or the compilers) fail to allocate from metaspace
>>>> (e.g., to allocate a MDO), the exception
>>>> is cleared and - as a result - not reported to the Java application. Not
>>>> propagating the OOME to the Java application
>>>> can lead to a serious performance regression, since every attempt to
>>>> allocate from metaspace (if we have run out
>>>> of metaspace and a full GC cannot free memory) triggers another full GC.
>>>> Consequently, the application continues
>>>> to run and schedules full GCs until (1) a critical allocation (one that
>>>> throws an OOME) fails, or (2) the application finishes
>>>> normally (successfully). Note that the VM can continue to execute
>>>> without allocating MethodCounters or MDOs.
>>>>
>>>> Solution 1:
>>>> Report OOME to the Java application. This solution avoids handling the
>>>> problem (running a large number of full GCs)
>>>> in the VM by passing the problem over to the the Java application. I.e.,
>>>> the performance regression is solved by
>>>> throwing an OOME. The only way to make the application run is to re-run
>>>> the application with a larger (yet unknown)
>>>> metaspace size. However, the application could have continued to run
>>>> (with an undefined performance drop).
>>>>
>>>> Note that the metaspace size in the failing test case is artificially
>>>> small (20m). Should we change the default behavior of Hotspot
>>>> to fix such a corner case?
>>>>
>>>> Also, I am not sure if throwing an OOME in such a case makes Hotspot
>>>> conform with the Java Language Specification.
>>>> The Specification says:
>>>>
>>>> "Asynchronous exceptions occur only as a result of:
>>>>
>>>> An internal error or resource limitation in the Java Virtual Machine
>>>> that prevents
>>>> it from implementing the semantics of the Java programming language. In
>>>> this
>>>> case, the asynchronous exception that is thrown is an instance of a
>>>> subclass of
>>>> VirtualMachineError"
>>>>
>>>> An OOME is an asynchronous exception. As I understand the paragraph
>>>> above, we are only allowed to throw an asynchronous
>>>> exception, if we are not able to "implement the semantics of the Java
>>>> programming language". Not being able to run the JIT
>>>> compiler does not seem to constrain the semantics of the Java language.
>>>>
>>>> Solution 2:
>>>> If allocation from metaspace fails, we (1) report a warning to the user
>>>> and (2) do not try to allocate MethodCounters and MDO
>>>> (as well as all other non-critical metaspace allocations) and thereby
>>>> avoid the overhead from running full GCs. As a result, the
>>>> application can continue to run. I have not yet worked on such a
>>>> solution. I just bring this up for discussion.
>>>>
>>>> Testing:
>>>> JPRT
>>>>
>>>> Webrev:
>>>> Here is the webrev for Solution 1. Please note that I am not familiar
>>>> with this part of the code.
>>>>
>>>> http://cr.openjdk.java.net/~anoll/8037842/webrev.00/
>>>>
>>>> May thanks in advance,
>>>> Albert
>>>>
>>>>
>


More information about the hotspot-dev mailing list