RFR(XXXS): 8205648 fix for 8205195 breaks secondary error handling

Thomas Stüfe thomas.stuefe at gmail.com
Tue Jun 26 17:06:12 UTC 2018


Hi Daniel,

test if I understand this correctly:

The original bug: we execute
NestedThreadsListHandleInErrorHandlingTest, which is supposed to
controlled-crash the VM and find SMR related info in hs-err file. But
during error reporting, ThreadsSMRSupport::print_info_on() crashes,
since internal SMR data structures are modified concurrently and
ThreadsSMRSupport::print_info_on() does not lock. The solution was to
unconditionally, in VMError::controlled_crash(), grab the
Threads_lock.

David spotted one path where this may be called recursively: when we
test secondary signal handling, we call VMError::controlled_crash()
from within the error handler. In that case, the second call to
VMError::controlled_crash() will now, instead of crashing, trigger an
assert, since we already own Threads_lock.

This did not trip up SecondaryErrorTest.java since it was supposed to
crash. Now it crashes with an assert, not a secondary signal. Which
quietly castrates the test, btw - the reason we contributed this as a
regression for running into blocked signals during signal handling,
which is a bad thing. Converting this second signal to assert makes
the test ineffective.

So far correct?

---

Yes, the secondary signal test is too coarse. In our port, we have
embellished the "Error occurred during error reporting" message and
print out the signal number and pc. But we have not yet brought that
improvement upstream.

I think your patch is okay.

But now I wonder if it would be cleaner to try-lock-if-not-owned
inside ThreadsSMRSupport::print_info_on() ? Or its caller,
Thread::print_on_error()? Or above that, around that call in that one
STEP inside VMError::report() ?

Thanks, Thomas


On Tue, Jun 26, 2018 at 6:17 PM, Daniel D. Daugherty
<daniel.daugherty at oracle.com> wrote:
> Thanks for the quick review Serguei!
>
> One more review would be nice... David H. or Thomas Stüfe?
>
>
> On 6/26/18 12:12 PM, serguei.spitsyn at oracle.com wrote:
>>
>> Hi Dan,
>>
>> The fix looks reasonable to me.
>
>
> Thanks. I like simple one liners... :-)
>
>
>> Nice catch by David!
>
>
> Definitely.
>
> Dan
>
>
>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 6/26/18 06:06, Daniel D. Daugherty wrote:
>>>
>>> Greetings,
>>>
>>> For this one liner review, I'd love to hear from David H., Thomas Stüfe,
>>> and Serguei Spitsyn.
>>>
>>> David Holmes spotted a locking problem with my fix for the following bug:
>>>
>>>     JDK-8205195 NestedThreadsListHandleInErrorHandlingTest fails because
>>>                 hs_err doesn't contain _nested_thread_list_max
>>>     https://bugs.openjdk.java.net/browse/JDK-8205195
>>>
>>>     Webrev URL:
>>> http://cr.openjdk.java.net/~dcubed/8205195-webrev/0-for-jdk-jdk/
>>>
>>> The fix for the issue is a one liner that only grabs the Threads_lock
>>> when the caller does not already own it:
>>>
>>> $ hg diff
>>> diff -r 5698cf4e50f1 src/hotspot/share/utilities/vmError.cpp
>>> --- a/src/hotspot/share/utilities/vmError.cpp Fri Jun 22 12:15:16 2018
>>> -0400
>>> +++ b/src/hotspot/share/utilities/vmError.cpp Mon Jun 25 21:20:52 2018
>>> -0400
>>> @@ -1703,7 +1703,7 @@
>>>    // from racing with Threads::add() or Threads::remove() as we
>>>    // generate the hs_err_pid file. This makes our ErrorHandling tests
>>>    // more stable.
>>> - MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
>>> + MutexLockerEx ml(Threads_lock->owned_by_self() ? NULL : Threads_lock,
>>> Mutex::_no_safepoint_check_flag);
>>>
>>>    switch (how) {
>>>      case 1: vmassert(str == NULL, "expected null"); break;
>>>
>>>
>>> Figuring out a way to detect the failure mode in order to test the
>>> fix was much more involved than the fix itself. Gory details are
>>> in the bug:
>>>
>>>     JDK-8205648 fix for 8205195 breaks secondary error handling
>>>     https://bugs.openjdk.java.net/browse/JDK-8205648
>>>
>>> No webrev since the one liner diff is shown above. This fix was tested
>>> with the test/hotspot/jtreg/runtime/ErrorHandling tests on Linux-X64
>>> and with a special version of ErrorHandling/SecondaryErrorTest.java
>>> that is documented in JDK-8205648.
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Dan
>>
>>
>


More information about the hotspot-runtime-dev mailing list