[15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Jul 16 14:43:17 UTC 2020
Thanks to David's description of the problem and the fix, this makes
sense to me now.
I don't like it and we should revisit async exceptions for all the other
problems it causes, but this change looks safe and good.
thanks,
Coleen
On 7/16/20 3:00 AM, Jamsheed C M wrote:
> Hi all,
>
> could i get another review?
>
> Best regards,
>
> Jamsheed
>
> On 16/07/2020 06:37, David Holmes wrote:
>> Hi Jamsheed,
>>
>> tl;dr version: fix looks good. Thanks for working through things with
>> me on this one.
>>
>> Long version ... for the sake of other reviewers (and myself) I'm
>> going to walk through the problem scenario and how the fix addresses
>> it, because the bug report is long and confusing and touches on a
>> number of different issues with async exception handling.
>>
>> We are dealing with the code generated for Java method entry, and in
>> particular for a synchronized Java method. We do a lot of things in
>> the entry code before we actually lock the monitor and jump to the
>> Java method. Some of those things include method profiling and the
>> counter overflow check for the JIT. If an exception is thrown at this
>> point, the logic to remove the activation would unlock the monitor -
>> which we haven't actually locked yet! So we have the
>> do_not_unlock_if_synchronized flag which is stored in the current
>> JavaThread. We set that flag true so that if any exceptions result in
>> activation removal, the removal logic won't try to unlock the
>> monitor. Once we're ready to lock the monitor we set the flag back to
>> false (note there is an implicit assumption here that monitor locking
>> can never raise an exception).
>>
>> The problem arises with async exceptions, or more specifically the
>> async exception that is raised due to an "unsafe access error". This
>> is where a memory-mapped ByteBuffer causes an access violation (SEGV)
>> due to a bad pointer. The signal handler simply sets a flag to
>> indicate we encountered an "unsafe access error", adjusts the BCI to
>> the next instruction and allows execution to proceed at the next
>> instruction. It is then expected that the runtime will "soon" notice
>> this pending unsafe access error and create and throw the
>> InternalError instance that indicates the ByteBuffer operation
>> failed. This requires executing Java code.
>>
>> One of the places that checks for that pending unsafe access error is
>> in the destructor of the JRT_ENTRY wrapper that is used for the
>> method profiling and counter overflow checking. This occurs whilst
>> the do_not_unlock_if_synchronized flag is true, so the resulting
>> InternalError won't result in an attempt to unlock the not-locked
>> monitor.
>>
>> The problem is that creating the InternalError executes Java code -
>> it calls constructors, which call methods etc. And some of those
>> methods are synchronized. So the method entry logic for such a call
>> will set do_not_unlock_if_synchronized to true, perform all the
>> preamble related to the call, then set do_not_unlock_if_synchronized
>> to false, lock the monitor and make the call. When construction
>> completes the InternalError is thrown and we remove the activation
>> for the method we had originally started to call. But now the
>> do_not_unlock_if_synchronized flag has been reset to false by the
>> nested Java method call, so we do in fact try to unlock a monitor
>> that was never locked, and things break.
>>
>> This nesting problem is well known and we have a mechanism for
>> dealing with - the UnlockFlagSaver. The actual logic executed for
>> profiling methods and doing the counter overflow check contains the
>> requisite UnlockFlagSaver to avoid the problem just outlined.
>> Unfortunately the async exception is processed in the JRT_ENTRY
>> wrapper, which is outside the scope of those UnlockFlagSaver helpers
>> and so they don't help in this case.
>>
>> So the fix is to "simply" move the UnlockFlagSaver deeper into the
>> call stack to the code that actually does the async exception
>> processing:
>>
>> void JavaThread::check_and_handle_async_exceptions(bool
>> check_unsafe_error) {
>> + // May be we are at method entry and requires to save do not
>> unlock flag.
>> + UnlockFlagSaver fs(this);
>>
>> so now after the InternalError has been created and thrown we will
>> restore the original value of the do_not_unlock_if_synchronized flag
>> (false) and so the InternalError will not cause activation removal to
>> attempt to unlock the not-locked monitor.
>>
>> The scope of the UnlockFlagSaver could be narrowed to the actual
>> logic for processing the unsafe access error, but it seems fine at
>> method scope.
>>
>> A second fix is that the overflow counter check had an assertion that
>> it was not executed with any pending exceptions. But that turned out
>> to be false for reasons I can't fully explain, but it again appears
>> to relate to a pending async exception being installed prior to the
>> method call - and seems related to the two referenced JVM TI
>> functions. The simple solution here is to delete the assertion and to
>> check for pending exceptions on entry to the code and just return
>> immediately. The JRT_ENTRY destructor will see the pending exception
>> and propagate it.
>>
>> Cheers,
>> David
>>
>> On 16/07/2020 9:50 am, David Holmes wrote:
>>> Hi Jamsheed,
>>>
>>> On 16/07/2020 8:16 am, Jamsheed C M wrote:
>>>> (Thank you Dean, adding serviceability team as this issue involves
>>>> JVMTI features PopFrame, EarlyReturn features)
>>>
>>> It is not at all obvious how your proposed fix impacts the JVM TI
>>> features.
>>>
>>>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381
>>>>
>>>> (testing: mach5, tier1-5 links in JBS)
>>>>
>>>> Best regards,
>>>>
>>>> Jamsheed
>>>>
>>>> On 15/07/2020 21:25, Jamsheed C M wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Async handling at method entry requires it to be aware of
>>>>> synchronization(like whether it is doing async handling before
>>>>> lock acquire or after)
>>>>>
>>>>> This is required as exception handler rely on this info for
>>>>> unlocking. Async handling code never had this special condition
>>>>> handled and it worked most of the time as we were using biased
>>>>> locking which got disabled by [1]
>>>>>
>>>>> There was one other issue reported in similar time[2]. This issue
>>>>> got triggered in test case by [3], back to back extra safepoint
>>>>> after suspend and TLH for ThreadDeath. So in this setup both
>>>>> PopFrame request and Thread.Stop request happened together for the
>>>>> test scenario and it reached java method entry with
>>>>> pending_exception set.
>>>>>
>>>>> I have done a partial fix for the issue, mainly to handle
>>>>> production mode crash failures(do not unlock flag related ones)
>>>>>
>>>>> Fix detail:
>>>>>
>>>>> 1) I save restore the "do not unlock" flag in async handling.
>>>
>>> Sorry but you completely changed the fix compared to what we
>>> discussed and what I pre-reviewed! What happened to changing from
>>> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time
>>> and effort to determine that this save/restore of the "do not unlock
>>> flag" is actually correct and valid!
>>>
>>>>>
>>>>> 2) Return for floating pending exception for some cases(PopFrame,
>>>>> Early return related). This is debug(JVMTI) feature and floating
>>>>> exception can get cleaned just like that in present compiler
>>>>> request and deopt code.
>>>
>>> What part of the change addresses this?
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>>>
>>>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
>>>>>
>>>>> There are more problems in these code areas, like we clear all
>>>>> exceptions in compilation request path(interpreter,c1), as well as
>>>>> deoptimization path.
>>>>>
>>>>> All these un-handled cases will be separately handled by
>>>>> https://bugs.openjdk.java.net/browse/JDK-8249451
>>>>>
>>>>> Request for review.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jamsheed
>>>>>
>>>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8231264>
>>>>>
>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727
>>>>>
>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207
>>>>>
More information about the serviceability-dev
mailing list