[15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"

Thu Jul 16 01:07:33 UTC 2020

Hi Jamsheed,

tl;dr version: fix looks good. Thanks for working through things with me 
on this one.

Long version ... for the sake of other reviewers (and myself) I'm going 
to walk through the problem scenario and how the fix addresses it, 
because the bug report is long and confusing and touches on a number of 
different issues with async exception handling.

We are dealing with the code generated for Java method entry, and in 
particular for a synchronized Java method. We do a lot of things in the 
entry code before we actually lock the monitor and jump to the Java 
method. Some of those things include method profiling and the counter 
overflow check for the JIT. If an exception is thrown at this point, the 
logic to remove the activation would unlock the monitor - which we 
haven't actually locked yet! So we have the 
do_not_unlock_if_synchronized flag which is stored in the current 
JavaThread. We set that flag true so that if any exceptions result in 
activation removal, the removal logic won't try to unlock the monitor. 
Once we're ready to lock the monitor we set the flag back to false (note 
there is an implicit assumption here that monitor locking can never 
raise an exception).

The problem arises with async exceptions, or more specifically the async 
exception that is raised due to an "unsafe access error". This is where 
a memory-mapped ByteBuffer causes an access violation (SEGV) due to a 
bad pointer. The signal handler simply sets a flag to indicate we 
encountered an "unsafe access error", adjusts the BCI to the next 
instruction and allows execution to proceed at the next instruction. It 
is then expected that the runtime will "soon" notice this pending unsafe 
access error and create and throw the InternalError instance that 
indicates the ByteBuffer operation failed. This requires executing Java 
code.

One of the places that checks for that pending unsafe access error is in 
the destructor of the JRT_ENTRY wrapper that is used for the method 
profiling and counter overflow checking. This occurs whilst the 
do_not_unlock_if_synchronized flag is true, so the resulting 
InternalError won't result in an attempt to unlock the not-locked monitor.

The problem is that creating the InternalError executes Java code - it 
calls constructors, which call methods etc. And some of those methods 
are synchronized. So the method entry logic for such a call will set 
do_not_unlock_if_synchronized to true, perform all the preamble related 
to the call, then set do_not_unlock_if_synchronized to false, lock the 
monitor and make the call. When construction completes the InternalError 
is thrown and we remove the activation for the method we had originally 
started to call. But now the do_not_unlock_if_synchronized flag has been 
reset to false by the nested Java method call, so we do in fact try to 
unlock a monitor that was never locked, and things break.

This nesting problem is well known and we have a mechanism for dealing 
with - the UnlockFlagSaver. The actual logic executed for profiling 
methods and doing the counter overflow check contains the requisite 
UnlockFlagSaver to avoid the problem just outlined. Unfortunately the 
async exception is processed in the JRT_ENTRY wrapper, which is outside 
the scope of those UnlockFlagSaver helpers and so they don't help in 
this case.

So the fix is to "simply" move the UnlockFlagSaver deeper into the call 
stack to the code that actually does the async exception processing:

  void JavaThread::check_and_handle_async_exceptions(bool 
check_unsafe_error) {
+   // May be we are at method entry and requires to save do not unlock 
flag.
+   UnlockFlagSaver fs(this);

so now after the InternalError has been created and thrown we will 
restore the original value of the do_not_unlock_if_synchronized flag 
(false) and so the InternalError will not cause activation removal to 
attempt to unlock the not-locked monitor.

The scope of the UnlockFlagSaver could be narrowed to the actual logic 
for processing the unsafe access error, but it seems fine at method scope.

A second fix is that the overflow counter check had an assertion that it 
was not executed with any pending exceptions. But that turned out to be 
false for reasons I can't fully explain, but it again appears to relate 
to a pending async exception being installed prior to the method call - 
and seems related to the two referenced JVM TI functions. The simple 
solution here is to delete the assertion and to check for pending 
exceptions on entry to the code and just return immediately. The 
JRT_ENTRY destructor will see the pending exception and propagate it.

Cheers,
David

On 16/07/2020 9:50 am, David Holmes wrote:
> Hi Jamsheed,
> 
> On 16/07/2020 8:16 am, Jamsheed C M wrote:
>> (Thank you Dean, adding serviceability team as this issue involves 
>> JVMTI features PopFrame, EarlyReturn features)
> 
> It is not at all obvious how your proposed fix impacts the JVM TI features.
> 
>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381
>>
>> (testing: mach5, tier1-5 links in JBS)
>>
>> Best regards,
>>
>> Jamsheed
>>
>> On 15/07/2020 21:25, Jamsheed C M wrote:
>>>
>>> Hi,
>>>
>>> Async handling at method entry requires it to be aware of 
>>> synchronization(like whether it is doing async handling before lock 
>>> acquire or after)
>>>
>>> This is required as exception handler rely on this info for 
>>> unlocking.  Async handling code never had this special condition 
>>> handled and it worked most of the time as we were using biased 
>>> locking which got disabled by [1]
>>>
>>> There was one other issue reported in similar time[2]. This issue got 
>>> triggered in test case by [3], back to back extra safepoint after 
>>> suspend and TLH for ThreadDeath. So in this setup both PopFrame 
>>> request and Thread.Stop request happened together for the test 
>>> scenario and it reached java method entry with pending_exception set.
>>>
>>> I have done a partial fix for the issue, mainly to handle production 
>>> mode crash failures(do not unlock flag related ones)
>>>
>>> Fix detail:
>>>
>>> 1) I save restore the "do not unlock" flag in async handling.
> 
> Sorry but you completely changed the fix compared to what we discussed 
> and what I pre-reviewed! What happened to changing from JRT_ENTRY to 
> JRT_ENTRY_NOASYNC? It is going to take me a lot of time and effort to 
> determine that this save/restore of the "do not unlock flag" is actually 
> correct and valid!
> 
>>>
>>> 2) Return for floating pending exception for some cases(PopFrame, 
>>> Early return related). This is debug(JVMTI) feature and floating 
>>> exception can get cleaned just like that in present compiler request 
>>> and deopt code.
> 
> What part of the change addresses this?
> 
> Thanks,
> David
> -----
> 
>>>
>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
>>>
>>> There are more problems in these code areas, like we clear all 
>>> exceptions in compilation request path(interpreter,c1), as well as 
>>> deoptimization path.
>>>
>>> All these un-handled cases will be separately handled by 
>>> https://bugs.openjdk.java.net/browse/JDK-8249451
>>>
>>> Request for review.
>>>
>>> Best regards,
>>>
>>> Jamsheed
>>>
>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 
>>> <https://bugs.openjdk.java.net/browse/JDK-8231264>
>>>
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727
>>>
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207
>>>