RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process

Thu Jan 29 13:47:34 UTC 2015

Hi David,

thank you. Here is the new patch:

http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.06/webrev/

Just corrected the typos and removed trailing whitespaces.

Kind Regards, Thomas

On Thu, Jan 29, 2015 at 8:47 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> Sorry for the delay in getting back to this. Looks good - only a few typos:
>
> debug.hpp:
>
> + // 15 - SIGILL
>
> should be SIGFPE.
>
> vmError_<os>.cpp:
>
> + // Note that in the hotspot
>
> Delete 'the'.
>
> + // hang or be ended immediately if secondary errors happend during error
>
> s/terminated/ended/
>
> s/happen/happend/
>
> Thanks,
> David
>
> On 26/01/2015 8:23 PM, Thomas Stüfe wrote:
>
>> Anyone? I would really like to close this issue.
>>
>> Kind Regards,
>>
>> Thomas Stüfe
>>
>> On Wed, Jan 21, 2015 at 10:39 AM, Thomas Stüfe <thomas.stuefe at gmail.com
>> <mailto:thomas.stuefe at gmail.com>> wrote:
>>
>>     Hi all,
>>
>>     I would like to take up discussion about this issue again.
>>
>>     As a reminder, here the bug report:
>>     https://bugs.openjdk.java.net/browse/JDK-8065895
>>
>>     And here a new version of my patch:
>>     http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.05/webrev/
>>
>>     As far as I remember the discussion, the issue itself was understood
>>     and the fix itself met with approval (I think), but the point of
>>     contention was the code I added to reproduce the error for
>>     regression tests.
>>
>>     In order to reproduce the error, I need two different synchronous
>>     signals to happen, one in normal code, one in the error handler
>>     which writes the hs-err file. Originally I choose SIGSEGV and
>>     SIGILL. I added functions to generate (true, real) SIGSEGVs and
>>     SIGILLs. But nobody liked my generate-sigill function, therefore I
>>     changed the code to generate a SIGFPE instead. For the test, it does
>>     not matter which signals are generated as long as they are
>>     synchronous (so, one of SEGV,ILL,BUS,FPE).
>>
>>     In order to check that the fix works, one can do:
>>
>>     java -XX:ErrorHandlerTest=15 -XX:TestCrashInErrorHandler=14
>>
>>     The VM will first crash with a SIGSEGV, enter error handling, crash
>>     again with a different synchronous signal (FPE, in this case). An
>>     unfixed VM will die immediately and we get a torn hs-err file. A
>>     fixed VM will show the "error occurred during error handling" string
>>     and continue with the error reporting.
>>
>>     Kind Regards,
>>
>>     Thomas Stüfe
>>
>>
>>
>>     On Mon, Dec 8, 2014 at 11:37 AM, Thomas Stüfe
>>     <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>> wrote:
>>
>>         Hi David, Dean,
>>
>>         On Fri, Dec 5, 2014 at 8:05 AM, David Holmes
>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>
>>             On 3/12/2014 8:47 PM, Thomas Stüfe wrote:
>>
>>                 Hi Dean,
>>
>>                 I dont understand. Such a function does not exist, does
>>                 it? So I would
>>                 have to write it:
>>
>>                 Do you mean generating and using a StubRoutine which
>>                 would SIGILL? I did
>>                 not do this because I wanted to be able to generate
>>                 SIGILL also in
>>                 initialization code, where StubRoutines may not yet be
>>                 generated. This
>>                 point may may be arguable, but as this function is used
>>                 to test error
>>                 handling, it may be interesting to test it for
>>                 half-initialized VMs too.
>>
>>                 Otherwise I would implement the CPU specific
>>                 generate_illegal_instruction_____sequence() probably the
>>
>>                 same way as I do
>>                 now the crash_with_sigill() function. That would mean a
>>                 bit of more code
>>                 duplication because:
>>                 - Either I use the method I use now (reserve_memory and
>>                 copy the
>>                 instructions to the reserved page)
>>                 - Or I use inline assembly - which probably does not
>>                 work across
>>                 multiple OSs, so for CPUs which span various OSs I would
>>                 have to add one
>>                 function per os_cpu combination, not just per cpu.
>>
>>
>>             I don't think there is any OS dependency with inline
>>             assembly - only compiler. And I am also concerned that
>>             writing code to an executable page will also enter the realm
>>             of "self-modifying code" and all the jumping through hoops
>>             that entails. That aspect hadn't occurred to me till Dean
>>             raised it. I'm forming the view that triggering a SIGILL is
>>             more effort than it is worth for a secondary testing function.
>>
>>
>>         Well, the code is used and works in our VM since some years on a
>>         number of CPUs, so the problem with the flushing do not occur at
>>         least in our cases. But I agree with you, and this seems to be a
>>         point of contention and it is really too unimportant to stop the
>>         whole patch.
>>
>>         The whole point of using SIGILL was to have another unblockable
>>         signal besides SIGSEGV to occur naturally (without raising) to
>>         be able to demonstrate the bug before fixing it. I will now
>>         attempt to change the patch to use either SIGFPE or SIGBUS as a
>>         secondary signal. Maybe generating those signals with pure C/C++
>>         is easier. If that does not work out, I will see what I can do
>>         with raise().
>>
>>         Thanks and Kind regards, Thomas
>>
>>
>>
>>
>>