RFR(S): 8037842: Failing to allocate MethodCounters and MDO causes a serious performance drop

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Oct 21 19:46:28 UTC 2014


Okay, I see your and others points.

Albert, I will agree without your solution 2.

Thanks,
Vladimir

On 10/21/14 5:35 AM, Vitaly Davidovich wrote:
> +1, as a user.  I think an OOME from the VM should come if something
> critical cannot proceed due to some space exhaustion.  PermGen was a
> hotspot implementation detail (I.e. it's not a java or even a JVM
> standard,  of course) so I don't think it sets any sort of precedent that
> needs to be followed.  A loud warning to stdout/stderr indicating the issue
> and stating that performance may degrade is sufficient.
>
> Sent from my phone
> On Oct 21, 2014 5:10 AM, "Albert Noll" <albert.noll at oracle.com> wrote:
>
>> Hi,
>>
>> I agree with David.
>>
>> Executing a Java program with 'java myProgram' does not imply that we
>> *must* be able to execute the program using a JIT compiler.
>> Throwing an OOME because we are unable to compile a method (due to
>> insufficient metaspace) can cause much more problems than it solves. For
>> example,
>> I tried to execute the JVMSPEC2008 compiler.compiler benchmark as follows:
>>
>> java -XX:+ExitOnMetaSpaceAllocFail -XX:MaxMetaspaceSize=16m -jar
>> SPECjvm2008.jar -ikv -wt 10 -i 5 -it 60 compiler.compiler
>>
>> If the "ExitOnMetaSpaceAllocFail" is enabled, the JVM exits if we are
>> unable to allocate MethodCounters and MDO. On my laptop, compiler.compiler
>> finishes *without a performance regression* if ExitOnMetaSpaceAllocFail is
>> false. If ExitOnMetaSpaceAllocFail is true, the run does not finish because
>> we are out of metaspace. Such a behavior is clearly a regression. Given the
>> large number of customers we have, it seems likely that throwing an OOME
>> will cause trouble.
>>
>> In addition, I think that throwing an OOME exposes implementation details
>> of hotspot to the user that are not easy to understand. To understand the
>> cause of the OOME (and I think it is very important for the user to
>> understand the behavior of the JVM) the user must know that Hotspot uses
>> JIT compilers that store method profiles in metaspace. If we decide to
>> throw an OOME, it will be hard to debug the program, since compilers are
>> not deterministic in when a method is compiled. I.e., the customer can get
>> OOMEs at random (asynchronous) places. From a serviceability point of view,
>> throwing an OOME is a poor choice.
>>
>> I think we could add an option that lets the user decide on the behavior
>> (ExitOnMetaSpaceAllocFail). However, the default behavior should be
>> according to the Spec, i.e., ExitOnMetaSpaceAllocFail should be 'false' by
>> default. I think it is reasonable to assume that someone who knows about
>> -XX:MaxMetaspaceSize will know about -XX:ExitOnMetaSpaceAllocFail. The
>> argument that performance problems are 'hidden' by not throwing an OOME is
>> valid, but can be mitigated by such a flag.
>>
>> Thanks,
>> Albert
>>
>>
>> On 10/21/2014 03:17 AM, David Holmes wrote:
>>
>>> On 21/10/2014 5:11 AM, Vladimir Kozlov wrote:
>>>
>>>> Inability to allocate in metaspace is different from allocation in
>>>> codecache.
>>>>
>>>> The last one is JIT specific and needs only warning since whole java
>>>> process is (almost) not impacted.
>>>>
>>>> The first one should produce OOM the same ways as it was when we had
>>>> PermGen.
>>>>
>>>> I think the solution 1 is correct.
>>>>
>>>
>>> Throwing an asynchronous exception is very bad and should always be a
>>> last resort. Such exceptions can lead to corrupt state very easily.
>>>
>>> IMHO problems encountered by the JIT should not manifest as Java-level
>>> exceptions. I would also consider them (regardless of whether they have
>>> always been there) a violation of the spec as quoted by Albert.
>>>
>>> David
>>>
>>>   Thanks,
>>>> Vladimir
>>>>
>>>> On 10/17/14 6:18 AM, Albert Noll wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> could I get reviews for this patch:
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8037842
>>>>>
>>>>> Problem:
>>>>> If the interpreter (or the compilers) fail to allocate from metaspace
>>>>> (e.g., to allocate a MDO), the exception
>>>>> is cleared and - as a result - not reported to the Java application. Not
>>>>> propagating the OOME to the Java application
>>>>> can lead to a serious performance regression, since every attempt to
>>>>> allocate from metaspace (if we have run out
>>>>> of metaspace and a full GC cannot free memory) triggers another full GC.
>>>>> Consequently, the application continues
>>>>> to run and schedules full GCs until (1) a critical allocation (one that
>>>>> throws an OOME) fails, or (2) the application finishes
>>>>> normally (successfully). Note that the VM can continue to execute
>>>>> without allocating MethodCounters or MDOs.
>>>>>
>>>>> Solution 1:
>>>>> Report OOME to the Java application. This solution avoids handling the
>>>>> problem (running a large number of full GCs)
>>>>> in the VM by passing the problem over to the the Java application. I.e.,
>>>>> the performance regression is solved by
>>>>> throwing an OOME. The only way to make the application run is to re-run
>>>>> the application with a larger (yet unknown)
>>>>> metaspace size. However, the application could have continued to run
>>>>> (with an undefined performance drop).
>>>>>
>>>>> Note that the metaspace size in the failing test case is artificially
>>>>> small (20m). Should we change the default behavior of Hotspot
>>>>> to fix such a corner case?
>>>>>
>>>>> Also, I am not sure if throwing an OOME in such a case makes Hotspot
>>>>> conform with the Java Language Specification.
>>>>> The Specification says:
>>>>>
>>>>> "Asynchronous exceptions occur only as a result of:
>>>>>
>>>>> An internal error or resource limitation in the Java Virtual Machine
>>>>> that prevents
>>>>> it from implementing the semantics of the Java programming language. In
>>>>> this
>>>>> case, the asynchronous exception that is thrown is an instance of a
>>>>> subclass of
>>>>> VirtualMachineError"
>>>>>
>>>>> An OOME is an asynchronous exception. As I understand the paragraph
>>>>> above, we are only allowed to throw an asynchronous
>>>>> exception, if we are not able to "implement the semantics of the Java
>>>>> programming language". Not being able to run the JIT
>>>>> compiler does not seem to constrain the semantics of the Java language.
>>>>>
>>>>> Solution 2:
>>>>> If allocation from metaspace fails, we (1) report a warning to the user
>>>>> and (2) do not try to allocate MethodCounters and MDO
>>>>> (as well as all other non-critical metaspace allocations) and thereby
>>>>> avoid the overhead from running full GCs. As a result, the
>>>>> application can continue to run. I have not yet worked on such a
>>>>> solution. I just bring this up for discussion.
>>>>>
>>>>> Testing:
>>>>> JPRT
>>>>>
>>>>> Webrev:
>>>>> Here is the webrev for Solution 1. Please note that I am not familiar
>>>>> with this part of the code.
>>>>>
>>>>> http://cr.openjdk.java.net/~anoll/8037842/webrev.00/
>>>>>
>>>>> May thanks in advance,
>>>>> Albert
>>>>>
>>>>>
>>


More information about the hotspot-dev mailing list