RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process

Wed Jan 21 09:39:03 UTC 2015

Hi all,

I would like to take up discussion about this issue again.

As a reminder, here the bug report:
https://bugs.openjdk.java.net/browse/JDK-8065895

And here a new version of my patch:
http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.05/webrev/

As far as I remember the discussion, the issue itself was understood and
the fix itself met with approval (I think), but the point of contention was
the code I added to reproduce the error for regression tests.

In order to reproduce the error, I need two different synchronous signals
to happen, one in normal code, one in the error handler which writes the
hs-err file. Originally I choose SIGSEGV and SIGILL. I added functions to
generate (true, real) SIGSEGVs and SIGILLs. But nobody liked my
generate-sigill function, therefore I changed the code to generate a SIGFPE
instead. For the test, it does not matter which signals are generated as
long as they are synchronous (so, one of SEGV,ILL,BUS,FPE).

In order to check that the fix works, one can do:

java -XX:ErrorHandlerTest=15 -XX:TestCrashInErrorHandler=14

The VM will first crash with a SIGSEGV, enter error handling, crash again
with a different synchronous signal (FPE, in this case). An unfixed VM will
die immediately and we get a torn hs-err file. A fixed VM will show the
"error occurred during error handling" string and continue with the error
reporting.

Kind Regards,

Thomas Stüfe

On Mon, Dec 8, 2014 at 11:37 AM, Thomas Stüfe <thomas.stuefe at gmail.com>
wrote:

> Hi David, Dean,
>
> On Fri, Dec 5, 2014 at 8:05 AM, David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 3/12/2014 8:47 PM, Thomas Stüfe wrote:
>>
>>> Hi Dean,
>>>
>>> I dont understand. Such a function does not exist, does it? So I would
>>> have to write it:
>>>
>>> Do you mean generating and using a StubRoutine which would SIGILL? I did
>>> not do this because I wanted to be able to generate SIGILL also in
>>> initialization code, where StubRoutines may not yet be generated. This
>>> point may may be arguable, but as this function is used to test error
>>> handling, it may be interesting to test it for half-initialized VMs too.
>>>
>>> Otherwise I would implement the CPU specific
>>> generate_illegal_instruction___sequence() probably the same way as I do
>>> now the crash_with_sigill() function. That would mean a bit of more code
>>> duplication because:
>>> - Either I use the method I use now (reserve_memory and copy the
>>> instructions to the reserved page)
>>> - Or I use inline assembly - which probably does not work across
>>> multiple OSs, so for CPUs which span various OSs I would have to add one
>>> function per os_cpu combination, not just per cpu.
>>>
>>
>> I don't think there is any OS dependency with inline assembly - only
>> compiler. And I am also concerned that writing code to an executable page
>> will also enter the realm of "self-modifying code" and all the jumping
>> through hoops that entails. That aspect hadn't occurred to me till Dean
>> raised it. I'm forming the view that triggering a SIGILL is more effort
>> than it is worth for a secondary testing function.
>>
>>
> Well, the code is used and works in our VM since some years on a number of
> CPUs, so the problem with the flushing do not occur at least in our cases.
> But I agree with you, and this seems to be a point of contention and it is
> really too unimportant to stop the whole patch.
>
> The whole point of using SIGILL was to have another unblockable signal
> besides SIGSEGV to occur naturally (without raising) to be able to
> demonstrate the bug before fixing it. I will now attempt to change the
> patch to use either SIGFPE or SIGBUS as a secondary signal. Maybe
> generating those signals with pure C/C++ is easier. If that does not work
> out, I will see what I can do with raise().
>
> Thanks and Kind regards, Thomas
>
>
>