[15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"
Jamsheed C M
jamsheed.c.m at oracle.com
Thu Jul 16 14:49:48 UTC 2020
Hi Coleen,
Thank you for the review.
Best regards,
Jamsheed
On 16/07/2020 20:13, coleen.phillimore at oracle.com wrote:
>
> Thanks to David's description of the problem and the fix, this makes
> sense to me now.
> I don't like it and we should revisit async exceptions for all the
> other problems it causes, but this change looks safe and good.
>
> thanks,
> Coleen
>
> On 7/16/20 3:00 AM, Jamsheed C M wrote:
>> Hi all,
>>
>> could i get another review?
>>
>> Best regards,
>>
>> Jamsheed
>>
>> On 16/07/2020 06:37, David Holmes wrote:
>>> Hi Jamsheed,
>>>
>>> tl;dr version: fix looks good. Thanks for working through things
>>> with me on this one.
>>>
>>> Long version ... for the sake of other reviewers (and myself) I'm
>>> going to walk through the problem scenario and how the fix addresses
>>> it, because the bug report is long and confusing and touches on a
>>> number of different issues with async exception handling.
>>>
>>> We are dealing with the code generated for Java method entry, and in
>>> particular for a synchronized Java method. We do a lot of things in
>>> the entry code before we actually lock the monitor and jump to the
>>> Java method. Some of those things include method profiling and the
>>> counter overflow check for the JIT. If an exception is thrown at
>>> this point, the logic to remove the activation would unlock the
>>> monitor - which we haven't actually locked yet! So we have the
>>> do_not_unlock_if_synchronized flag which is stored in the current
>>> JavaThread. We set that flag true so that if any exceptions result
>>> in activation removal, the removal logic won't try to unlock the
>>> monitor. Once we're ready to lock the monitor we set the flag back
>>> to false (note there is an implicit assumption here that monitor
>>> locking can never raise an exception).
>>>
>>> The problem arises with async exceptions, or more specifically the
>>> async exception that is raised due to an "unsafe access error". This
>>> is where a memory-mapped ByteBuffer causes an access violation
>>> (SEGV) due to a bad pointer. The signal handler simply sets a flag
>>> to indicate we encountered an "unsafe access error", adjusts the BCI
>>> to the next instruction and allows execution to proceed at the next
>>> instruction. It is then expected that the runtime will "soon" notice
>>> this pending unsafe access error and create and throw the
>>> InternalError instance that indicates the ByteBuffer operation
>>> failed. This requires executing Java code.
>>>
>>> One of the places that checks for that pending unsafe access error
>>> is in the destructor of the JRT_ENTRY wrapper that is used for the
>>> method profiling and counter overflow checking. This occurs whilst
>>> the do_not_unlock_if_synchronized flag is true, so the resulting
>>> InternalError won't result in an attempt to unlock the not-locked
>>> monitor.
>>>
>>> The problem is that creating the InternalError executes Java code -
>>> it calls constructors, which call methods etc. And some of those
>>> methods are synchronized. So the method entry logic for such a call
>>> will set do_not_unlock_if_synchronized to true, perform all the
>>> preamble related to the call, then set do_not_unlock_if_synchronized
>>> to false, lock the monitor and make the call. When construction
>>> completes the InternalError is thrown and we remove the activation
>>> for the method we had originally started to call. But now the
>>> do_not_unlock_if_synchronized flag has been reset to false by the
>>> nested Java method call, so we do in fact try to unlock a monitor
>>> that was never locked, and things break.
>>>
>>> This nesting problem is well known and we have a mechanism for
>>> dealing with - the UnlockFlagSaver. The actual logic executed for
>>> profiling methods and doing the counter overflow check contains the
>>> requisite UnlockFlagSaver to avoid the problem just outlined.
>>> Unfortunately the async exception is processed in the JRT_ENTRY
>>> wrapper, which is outside the scope of those UnlockFlagSaver helpers
>>> and so they don't help in this case.
>>>
>>> So the fix is to "simply" move the UnlockFlagSaver deeper into the
>>> call stack to the code that actually does the async exception
>>> processing:
>>>
>>> void JavaThread::check_and_handle_async_exceptions(bool
>>> check_unsafe_error) {
>>> + // May be we are at method entry and requires to save do not
>>> unlock flag.
>>> + UnlockFlagSaver fs(this);
>>>
>>> so now after the InternalError has been created and thrown we will
>>> restore the original value of the do_not_unlock_if_synchronized flag
>>> (false) and so the InternalError will not cause activation removal
>>> to attempt to unlock the not-locked monitor.
>>>
>>> The scope of the UnlockFlagSaver could be narrowed to the actual
>>> logic for processing the unsafe access error, but it seems fine at
>>> method scope.
>>>
>>> A second fix is that the overflow counter check had an assertion
>>> that it was not executed with any pending exceptions. But that
>>> turned out to be false for reasons I can't fully explain, but it
>>> again appears to relate to a pending async exception being installed
>>> prior to the method call - and seems related to the two referenced
>>> JVM TI functions. The simple solution here is to delete the
>>> assertion and to check for pending exceptions on entry to the code
>>> and just return immediately. The JRT_ENTRY destructor will see the
>>> pending exception and propagate it.
>>>
>>> Cheers,
>>> David
>>>
>>> On 16/07/2020 9:50 am, David Holmes wrote:
>>>> Hi Jamsheed,
>>>>
>>>> On 16/07/2020 8:16 am, Jamsheed C M wrote:
>>>>> (Thank you Dean, adding serviceability team as this issue involves
>>>>> JVMTI features PopFrame, EarlyReturn features)
>>>>
>>>> It is not at all obvious how your proposed fix impacts the JVM TI
>>>> features.
>>>>
>>>>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381
>>>>>
>>>>> (testing: mach5, tier1-5 links in JBS)
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jamsheed
>>>>>
>>>>> On 15/07/2020 21:25, Jamsheed C M wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Async handling at method entry requires it to be aware of
>>>>>> synchronization(like whether it is doing async handling before
>>>>>> lock acquire or after)
>>>>>>
>>>>>> This is required as exception handler rely on this info for
>>>>>> unlocking. Async handling code never had this special condition
>>>>>> handled and it worked most of the time as we were using biased
>>>>>> locking which got disabled by [1]
>>>>>>
>>>>>> There was one other issue reported in similar time[2]. This issue
>>>>>> got triggered in test case by [3], back to back extra safepoint
>>>>>> after suspend and TLH for ThreadDeath. So in this setup both
>>>>>> PopFrame request and Thread.Stop request happened together for
>>>>>> the test scenario and it reached java method entry with
>>>>>> pending_exception set.
>>>>>>
>>>>>> I have done a partial fix for the issue, mainly to handle
>>>>>> production mode crash failures(do not unlock flag related ones)
>>>>>>
>>>>>> Fix detail:
>>>>>>
>>>>>> 1) I save restore the "do not unlock" flag in async handling.
>>>>
>>>> Sorry but you completely changed the fix compared to what we
>>>> discussed and what I pre-reviewed! What happened to changing from
>>>> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of
>>>> time and effort to determine that this save/restore of the "do not
>>>> unlock flag" is actually correct and valid!
>>>>
>>>>>>
>>>>>> 2) Return for floating pending exception for some cases(PopFrame,
>>>>>> Early return related). This is debug(JVMTI) feature and floating
>>>>>> exception can get cleaned just like that in present compiler
>>>>>> request and deopt code.
>>>>
>>>> What part of the change addresses this?
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>>>
>>>>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
>>>>>>
>>>>>> There are more problems in these code areas, like we clear all
>>>>>> exceptions in compilation request path(interpreter,c1), as well
>>>>>> as deoptimization path.
>>>>>>
>>>>>> All these un-handled cases will be separately handled by
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8249451
>>>>>>
>>>>>> Request for review.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Jamsheed
>>>>>>
>>>>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264
>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8231264>
>>>>>>
>>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727
>>>>>>
>>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207
>>>>>>
>
More information about the hotspot-compiler-dev
mailing list