RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process
Gerard Ziemski
gerard.ziemski at oracle.com
Tue Feb 3 21:15:19 UTC 2015
Looks good! Thank you for a great fix!
cheers
On 2/3/2015 10:40 AM, Thomas Stüfe wrote:
> Hi Gerard,
>
> thank you for your feedback! See here my new change:
> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.07/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8065895/webrev.07/webrev/>
>
> On Mon, Feb 2, 2015 at 4:19 PM, Gerard Ziemski
> <gerard.ziemski at oracle.com <mailto:gerard.ziemski at oracle.com>> wrote:
>
> hi Thomas,
>
> On 2/1/2015 9:56 AM, Thomas Stüfe wrote:
>> On Fri, Jan 30, 2015 at 11:23 PM, Gerard Ziemski
>> <gerard.ziemski at oracle.com <mailto:gerard.ziemski at oracle.com>> wrote:
>>
>> hi Thomas,
>>
>> #1 The following comment in vmError_bsd.cpp,
>> vmError_linux.cpp and vmError_solaris.cpp is now outdated and
>> should be updated from:
>>
>> // handle all synchronous program error signals which may
>> happen during error
>> // reporting. They must be unblocked, caught, handled.
>> //
>> // Note that in hotspot signal handlers are installed with a
>> fully filled
>> // signal mask, i.e. upon delivery all signals - also
>> synchronous error signals
>> // - are blocked. For synchronous error signals this is
>> deadly - program will
>> // hang or be terminated immediately if secondary errors
>> happen during error
>> // reporting.
>>
>> to the comment from vmError_aix.cpp:
>>
>> // Handle all synchronous signals which may happen during
>> signal handling,
>> // not just SIGSEGV and SIGBUS.
>>
>>
>> I am not sure I understand: the problem is that signal handlers
>> are installed in the hotspot with sa_mask completely set, so all
>> signals are blocked. See e.g. os_linux.cpp, line 4304++
>> (os::Linux::set_signal_handler()). So the comment appears to be
>> correct.
> We continue to block the signals initially, but the comment, or at
> least its last sentence, does not apply anymore because of the
> crux of your fix - ie. the program will not hang because we
> immediately unblock the synchronous traps in the crash handler. So
> maybe we should make that clear in that comment?
>
>
> Okay, I see that now. I modified the comments as you suggested earlier.
>
>> POSIX itself mentioned only the four, see documentation on
>> sigprocmask:
>>
>> "If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are
>> generated while they are blocked, the result is undefined, unless
>> the signal was generated by the /kill/()
>> <http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html> function,
>> the /sigqueue/()
>> <http://pubs.opengroup.org/onlinepubs/009695399/functions/sigqueue.html> function,
>> or the /raise/()
>> <http://pubs.opengroup.org/onlinepubs/009695399/functions/raise.html> function."
>>
>> I added SIGTRAP on AIX because it is synchronously delivered too,
>> but do not know any other signals. I am happy to add any other
>> signals you know of.
>>
> Nope, I do not know any more synchronous signals - just wanted to
> make sure we take a second look and collectively agree that this
> list is exhaustive.
>
>> I would like to test it at least twice, to see that signal
>> handling continues to work after secondary handler ran once. It
>> needs to correctly unset the signal mask for the synchronous
>> signals.
>>
>> Executing it four times may be a bit excessive; if you prefer, I
>> can remove it for the out_done case.
>>
>> Note that there is one benefit of repeatedly crashing in the
>> error handler, it is to stress the stack usage during error
>> reporting. But I admit that this test could be done much better.
>>
> It's a bit confusing at first when someone debugs this code and
> sees four of those secondary signals coming through, so maybe just
> a small comment here would suffice?
>
>
> I changed the coding to only fire for the log case. That should be
> less confusing.
>
>> vmerrors.sh is very minimal. The bug symptom was incomplete or
>> empty hs-err files, and you would not catch this with the
>> vmerrors.sh test, would you not? You also can reproduce this bug
>> only by using both -XX:ErrorHandlerTest and
>> -XX:TestCrashInErrorHandler together, so it does not quite fit
>> into what vmerrors.sh does.
>>
>> At SAP we have a nice test suite which crashes the VM in a number
>> of ways and then checks whether the hs-err files are complete
>> (enough). I was planning to add a Jtreg test which does something
>> similar, but did not add it to this change because I did not want
>> to make the change larger than necessary.
> I would think that the added complexity of the test case, would in
> fact make it easier to review the test with the fix at the same
> time. Also, since the test in this case is feasible, isn't it
> recommended to have the fix committed along with its test?
>
>
> I added a small jtreg test. Note that the test cannot really
> demonstrate the bug without my fix, because it needs the new
> -XX:TestCrashInErrorHandler options which is part of the fix. But it
> will show regressions.
>
> cheers
>
>
> Best Regards, Thomas
>
More information about the hotspot-runtime-dev
mailing list