RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process

David Holmes david.holmes at oracle.com
Wed Feb 4 10:18:58 UTC 2015


On 4/02/2015 8:15 PM, Thomas Stüfe wrote:
> Great, thank you!
>
> @David: Do you still need something from me to sponsor the change?

No I think I have everything I need. Will push it in the morning.

You'll be listed as Contributor as you don't have Author status for jdk9 
project.

Thanks,
David

> Kind Regards, Thaoms
>
> On Tue, Feb 3, 2015 at 10:15 PM, Gerard Ziemski
> <gerard.ziemski at oracle.com <mailto:gerard.ziemski at oracle.com>> wrote:
>
>     Looks good! Thank you for a great fix!
>
>     cheers
>
>
>     On 2/3/2015 10:40 AM, Thomas Stüfe wrote:
>>     Hi Gerard,
>>
>>     thank you for your feedback! See here my new change:
>>     http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.07/webrev/
>>     <http://cr.openjdk.java.net/%7Estuefe/webrevs/8065895/webrev.07/webrev/>
>>
>>
>>     On Mon, Feb 2, 2015 at 4:19 PM, Gerard Ziemski
>>     <gerard.ziemski at oracle.com <mailto:gerard.ziemski at oracle.com>> wrote:
>>
>>         hi Thomas,
>>
>>         On 2/1/2015 9:56 AM, Thomas Stüfe wrote:
>>>         On Fri, Jan 30, 2015 at 11:23 PM, Gerard Ziemski
>>>         <gerard.ziemski at oracle.com
>>>         <mailto:gerard.ziemski at oracle.com>> wrote:
>>>
>>>             hi Thomas,
>>>
>>>             #1 The following comment in vmError_bsd.cpp,
>>>             vmError_linux.cpp and vmError_solaris.cpp is now outdated
>>>             and should be updated from:
>>>
>>>             // handle all synchronous program error signals which may
>>>             happen during error
>>>             // reporting. They must be unblocked, caught, handled.
>>>             //
>>>             // Note that in hotspot signal handlers are installed
>>>             with a fully filled
>>>             // signal mask, i.e. upon delivery all signals - also
>>>             synchronous error signals
>>>             // - are blocked. For synchronous error signals this is
>>>             deadly - program will
>>>             // hang or be terminated immediately if secondary errors
>>>             happen during error
>>>             // reporting.
>>>
>>>             to the comment from vmError_aix.cpp:
>>>
>>>             // Handle all synchronous signals which may happen during
>>>             signal handling,
>>>             // not just SIGSEGV and SIGBUS.
>>>
>>>
>>>         I am not sure I understand: the problem is that signal
>>>         handlers are installed in the hotspot with sa_mask completely
>>>         set, so all signals are blocked. See e.g. os_linux.cpp, line
>>>         4304++ (os::Linux::set_signal_handler()). So the comment
>>>         appears to be correct.
>>         We continue to block the signals initially, but the comment,
>>         or at least its last sentence, does not apply anymore because
>>         of the crux of your fix - ie. the program will not hang
>>         because we immediately unblock the synchronous traps in the
>>         crash handler. So maybe we should make that clear in that comment?
>>
>>
>>     Okay, I see that now. I modified the comments as you suggested
>>     earlier.
>>
>>>         POSIX itself mentioned only the four, see documentation on
>>>         sigprocmask:
>>>
>>>         "If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are
>>>         generated while they are blocked, the result is undefined,
>>>         unless the signal was generated by the /kill/()
>>>         <http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html> function,
>>>         the /sigqueue/()
>>>         <http://pubs.opengroup.org/onlinepubs/009695399/functions/sigqueue.html> function,
>>>         or the /raise/()
>>>         <http://pubs.opengroup.org/onlinepubs/009695399/functions/raise.html> function."
>>>
>>>         I added SIGTRAP on AIX because it is synchronously delivered
>>>         too, but do not know any other signals. I am happy to add any
>>>         other signals you know of.
>>>
>>         Nope, I do not know any more synchronous signals - just wanted
>>         to make sure we take a second look and collectively agree that
>>         this list is exhaustive.
>>
>>>         I would like to test it at least twice, to see that signal
>>>         handling continues to work after secondary handler ran once.
>>>         It needs to correctly unset the signal mask for the
>>>         synchronous signals.
>>>
>>>         Executing it four times may be a bit excessive; if you
>>>         prefer, I can remove it for the out_done case.
>>>
>>>         Note that there is one benefit of repeatedly crashing in the
>>>         error handler, it is to stress the stack usage during error
>>>         reporting. But I admit that this test could be done much better.
>>>
>>         It's a bit confusing at first when someone debugs this code
>>         and sees four of those secondary signals coming through, so
>>         maybe just a small comment here would suffice?
>>
>>
>>     I changed the coding to only fire for the log case. That should be
>>     less confusing.
>>
>>>         vmerrors.sh is very minimal. The bug symptom was incomplete
>>>         or empty hs-err files, and you would not catch this with the
>>>         vmerrors.sh test, would you not? You also can reproduce this
>>>         bug only by using both -XX:ErrorHandlerTest and
>>>         -XX:TestCrashInErrorHandler together, so it does not quite
>>>         fit into what vmerrors.sh does.
>>>
>>>         At SAP we have a nice test suite which crashes the VM in a
>>>         number of ways and then checks whether the hs-err files are
>>>         complete (enough). I was planning to add a Jtreg test which
>>>         does something similar, but did not add it to this change
>>>         because I did not want to make the change larger than necessary.
>>         I would think that the added complexity of the test case,
>>         would in fact make it easier to review the test with the fix
>>         at the same time. Also, since the test in this case is
>>         feasible, isn't it recommended to have the fix committed along
>>         with its test?
>>
>>
>>     I added a small jtreg test. Note that the test cannot really
>>     demonstrate the bug without my fix, because it needs the new
>>     -XX:TestCrashInErrorHandler options which is part of the fix. But
>>     it will show regressions.
>>
>>         cheers
>>
>>
>>     Best Regards, Thomas
>>
>
>


More information about the hotspot-runtime-dev mailing list