RFR(XXXS): 8205648 fix for 8205195 breaks secondary error handling
Thomas Stüfe
thomas.stuefe at gmail.com
Tue Jun 26 17:09:36 UTC 2018
On Tue, Jun 26, 2018 at 7:06 PM, Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
> Hi Daniel,
>
> test if I understand this correctly:
>
> The original bug: we execute
> NestedThreadsListHandleInErrorHandlingTest, which is supposed to
> controlled-crash the VM and find SMR related info in hs-err file. But
> during error reporting, ThreadsSMRSupport::print_info_on() crashes,
> since internal SMR data structures are modified concurrently and
> ThreadsSMRSupport::print_info_on() does not lock. The solution was to
> unconditionally, in VMError::controlled_crash(), grab the
> Threads_lock.
>
> David spotted one path where this may be called recursively: when we
> test secondary signal handling, we call VMError::controlled_crash()
> from within the error handler. In that case, the second call to
> VMError::controlled_crash() will now, instead of crashing, trigger an
> assert, since we already own Threads_lock.
>
> This did not trip up SecondaryErrorTest.java since it was supposed to
> crash. Now it crashes with an assert, not a secondary signal. Which
> quietly castrates the test, btw - the reason we contributed this as a
> regression for running into blocked signals during signal handling,
> which is a bad thing. Converting this second signal to assert makes
> the test ineffective.
>
> So far correct?
>
> ---
>
> Yes, the secondary signal test is too coarse. In our port, we have
> embellished the "Error occurred during error reporting" message and
> print out the signal number and pc. But we have not yet brought that
> improvement upstream.
>
> I think your patch is okay.
>
> But now I wonder if it would be cleaner to try-lock-if-not-owned
> inside ThreadsSMRSupport::print_info_on() ? Or its caller,
> Thread::print_on_error()? Or above that, around that call in that one
> STEP inside VMError::report() ?
>
just to be clear, I am fine with the patch in its current form, if you
want to ship that.
..Thomas
> Thanks, Thomas
>
>
> On Tue, Jun 26, 2018 at 6:17 PM, Daniel D. Daugherty
> <daniel.daugherty at oracle.com> wrote:
>> Thanks for the quick review Serguei!
>>
>> One more review would be nice... David H. or Thomas Stüfe?
>>
>>
>> On 6/26/18 12:12 PM, serguei.spitsyn at oracle.com wrote:
>>>
>>> Hi Dan,
>>>
>>> The fix looks reasonable to me.
>>
>>
>> Thanks. I like simple one liners... :-)
>>
>>
>>> Nice catch by David!
>>
>>
>> Definitely.
>>
>> Dan
>>
>>
>>
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 6/26/18 06:06, Daniel D. Daugherty wrote:
>>>>
>>>> Greetings,
>>>>
>>>> For this one liner review, I'd love to hear from David H., Thomas Stüfe,
>>>> and Serguei Spitsyn.
>>>>
>>>> David Holmes spotted a locking problem with my fix for the following bug:
>>>>
>>>> JDK-8205195 NestedThreadsListHandleInErrorHandlingTest fails because
>>>> hs_err doesn't contain _nested_thread_list_max
>>>> https://bugs.openjdk.java.net/browse/JDK-8205195
>>>>
>>>> Webrev URL:
>>>> http://cr.openjdk.java.net/~dcubed/8205195-webrev/0-for-jdk-jdk/
>>>>
>>>> The fix for the issue is a one liner that only grabs the Threads_lock
>>>> when the caller does not already own it:
>>>>
>>>> $ hg diff
>>>> diff -r 5698cf4e50f1 src/hotspot/share/utilities/vmError.cpp
>>>> --- a/src/hotspot/share/utilities/vmError.cpp Fri Jun 22 12:15:16 2018
>>>> -0400
>>>> +++ b/src/hotspot/share/utilities/vmError.cpp Mon Jun 25 21:20:52 2018
>>>> -0400
>>>> @@ -1703,7 +1703,7 @@
>>>> // from racing with Threads::add() or Threads::remove() as we
>>>> // generate the hs_err_pid file. This makes our ErrorHandling tests
>>>> // more stable.
>>>> - MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
>>>> + MutexLockerEx ml(Threads_lock->owned_by_self() ? NULL : Threads_lock,
>>>> Mutex::_no_safepoint_check_flag);
>>>>
>>>> switch (how) {
>>>> case 1: vmassert(str == NULL, "expected null"); break;
>>>>
>>>>
>>>> Figuring out a way to detect the failure mode in order to test the
>>>> fix was much more involved than the fix itself. Gory details are
>>>> in the bug:
>>>>
>>>> JDK-8205648 fix for 8205195 breaks secondary error handling
>>>> https://bugs.openjdk.java.net/browse/JDK-8205648
>>>>
>>>> No webrev since the one liner diff is shown above. This fix was tested
>>>> with the test/hotspot/jtreg/runtime/ErrorHandling tests on Linux-X64
>>>> and with a special version of ErrorHandling/SecondaryErrorTest.java
>>>> that is documented in JDK-8205648.
>>>>
>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>
>>>> Dan
>>>
>>>
>>
More information about the serviceability-dev
mailing list