[15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"

Thu Jul 16 07:00:18 UTC 2020

Hi all,

could i get another review?

Best regards,

Jamsheed

On 16/07/2020 06:37, David Holmes wrote:
> Hi Jamsheed,
>
> tl;dr version: fix looks good. Thanks for working through things with 
> me on this one.
>
> Long version ... for the sake of other reviewers (and myself) I'm 
> going to walk through the problem scenario and how the fix addresses 
> it, because the bug report is long and confusing and touches on a 
> number of different issues with async exception handling.
>
> We are dealing with the code generated for Java method entry, and in 
> particular for a synchronized Java method. We do a lot of things in 
> the entry code before we actually lock the monitor and jump to the 
> Java method. Some of those things include method profiling and the 
> counter overflow check for the JIT. If an exception is thrown at this 
> point, the logic to remove the activation would unlock the monitor - 
> which we haven't actually locked yet! So we have the 
> do_not_unlock_if_synchronized flag which is stored in the current 
> JavaThread. We set that flag true so that if any exceptions result in 
> activation removal, the removal logic won't try to unlock the monitor. 
> Once we're ready to lock the monitor we set the flag back to false 
> (note there is an implicit assumption here that monitor locking can 
> never raise an exception).
>
> The problem arises with async exceptions, or more specifically the 
> async exception that is raised due to an "unsafe access error". This 
> is where a memory-mapped ByteBuffer causes an access violation (SEGV) 
> due to a bad pointer. The signal handler simply sets a flag to 
> indicate we encountered an "unsafe access error", adjusts the BCI to 
> the next instruction and allows execution to proceed at the next 
> instruction. It is then expected that the runtime will "soon" notice 
> this pending unsafe access error and create and throw the 
> InternalError instance that indicates the ByteBuffer operation failed. 
> This requires executing Java code.
>
> One of the places that checks for that pending unsafe access error is 
> in the destructor of the JRT_ENTRY wrapper that is used for the method 
> profiling and counter overflow checking. This occurs whilst the 
> do_not_unlock_if_synchronized flag is true, so the resulting 
> InternalError won't result in an attempt to unlock the not-locked 
> monitor.
>
> The problem is that creating the InternalError executes Java code - it 
> calls constructors, which call methods etc. And some of those methods 
> are synchronized. So the method entry logic for such a call will set 
> do_not_unlock_if_synchronized to true, perform all the preamble 
> related to the call, then set do_not_unlock_if_synchronized to false, 
> lock the monitor and make the call. When construction completes the 
> InternalError is thrown and we remove the activation for the method we 
> had originally started to call. But now the 
> do_not_unlock_if_synchronized flag has been reset to false by the 
> nested Java method call, so we do in fact try to unlock a monitor that 
> was never locked, and things break.
>
> This nesting problem is well known and we have a mechanism for dealing 
> with - the UnlockFlagSaver. The actual logic executed for profiling 
> methods and doing the counter overflow check contains the requisite 
> UnlockFlagSaver to avoid the problem just outlined. Unfortunately the 
> async exception is processed in the JRT_ENTRY wrapper, which is 
> outside the scope of those UnlockFlagSaver helpers and so they don't 
> help in this case.
>
> So the fix is to "simply" move the UnlockFlagSaver deeper into the 
> call stack to the code that actually does the async exception processing:
>
>  void JavaThread::check_and_handle_async_exceptions(bool 
> check_unsafe_error) {
> +   // May be we are at method entry and requires to save do not 
> unlock flag.
> +   UnlockFlagSaver fs(this);
>
> so now after the InternalError has been created and thrown we will 
> restore the original value of the do_not_unlock_if_synchronized flag 
> (false) and so the InternalError will not cause activation removal to 
> attempt to unlock the not-locked monitor.
>
> The scope of the UnlockFlagSaver could be narrowed to the actual logic 
> for processing the unsafe access error, but it seems fine at method 
> scope.
>
> A second fix is that the overflow counter check had an assertion that 
> it was not executed with any pending exceptions. But that turned out 
> to be false for reasons I can't fully explain, but it again appears to 
> relate to a pending async exception being installed prior to the 
> method call - and seems related to the two referenced JVM TI 
> functions. The simple solution here is to delete the assertion and to 
> check for pending exceptions on entry to the code and just return 
> immediately. The JRT_ENTRY destructor will see the pending exception 
> and propagate it.
>
> Cheers,
> David
>
> On 16/07/2020 9:50 am, David Holmes wrote:
>> Hi Jamsheed,
>>
>> On 16/07/2020 8:16 am, Jamsheed C M wrote:
>>> (Thank you Dean, adding serviceability team as this issue involves 
>>> JVMTI features PopFrame, EarlyReturn features)
>>
>> It is not at all obvious how your proposed fix impacts the JVM TI 
>> features.
>>
>>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381
>>>
>>> (testing: mach5, tier1-5 links in JBS)
>>>
>>> Best regards,
>>>
>>> Jamsheed
>>>
>>> On 15/07/2020 21:25, Jamsheed C M wrote:
>>>>
>>>> Hi,
>>>>
>>>> Async handling at method entry requires it to be aware of 
>>>> synchronization(like whether it is doing async handling before lock 
>>>> acquire or after)
>>>>
>>>> This is required as exception handler rely on this info for 
>>>> unlocking.  Async handling code never had this special condition 
>>>> handled and it worked most of the time as we were using biased 
>>>> locking which got disabled by [1]
>>>>
>>>> There was one other issue reported in similar time[2]. This issue 
>>>> got triggered in test case by [3], back to back extra safepoint 
>>>> after suspend and TLH for ThreadDeath. So in this setup both 
>>>> PopFrame request and Thread.Stop request happened together for the 
>>>> test scenario and it reached java method entry with 
>>>> pending_exception set.
>>>>
>>>> I have done a partial fix for the issue, mainly to handle 
>>>> production mode crash failures(do not unlock flag related ones)
>>>>
>>>> Fix detail:
>>>>
>>>> 1) I save restore the "do not unlock" flag in async handling.
>>
>> Sorry but you completely changed the fix compared to what we 
>> discussed and what I pre-reviewed! What happened to changing from 
>> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time 
>> and effort to determine that this save/restore of the "do not unlock 
>> flag" is actually correct and valid!
>>
>>>>
>>>> 2) Return for floating pending exception for some cases(PopFrame, 
>>>> Early return related). This is debug(JVMTI) feature and floating 
>>>> exception can get cleaned just like that in present compiler 
>>>> request and deopt code.
>>
>> What part of the change addresses this?
>>
>> Thanks,
>> David
>> -----
>>
>>>>
>>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
>>>>
>>>> There are more problems in these code areas, like we clear all 
>>>> exceptions in compilation request path(interpreter,c1), as well as 
>>>> deoptimization path.
>>>>
>>>> All these un-handled cases will be separately handled by 
>>>> https://bugs.openjdk.java.net/browse/JDK-8249451
>>>>
>>>> Request for review.
>>>>
>>>> Best regards,
>>>>
>>>> Jamsheed
>>>>
>>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 
>>>> <https://bugs.openjdk.java.net/browse/JDK-8231264>
>>>>
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727
>>>>
>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207
>>>>