RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process

David Holmes david.holmes at oracle.com
Thu Jan 29 07:47:33 UTC 2015


Hi Thomas,

Sorry for the delay in getting back to this. Looks good - only a few typos:

debug.hpp:

+ // 15 - SIGILL

should be SIGFPE.

vmError_<os>.cpp:

+ // Note that in the hotspot

Delete 'the'.

+ // hang or be ended immediately if secondary errors happend during error

s/terminated/ended/

s/happen/happend/

Thanks,
David

On 26/01/2015 8:23 PM, Thomas Stüfe wrote:
> Anyone? I would really like to close this issue.
>
> Kind Regards,
>
> Thomas Stüfe
>
> On Wed, Jan 21, 2015 at 10:39 AM, Thomas Stüfe <thomas.stuefe at gmail.com
> <mailto:thomas.stuefe at gmail.com>> wrote:
>
>     Hi all,
>
>     I would like to take up discussion about this issue again.
>
>     As a reminder, here the bug report:
>     https://bugs.openjdk.java.net/browse/JDK-8065895
>
>     And here a new version of my patch:
>     http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.05/webrev/
>
>     As far as I remember the discussion, the issue itself was understood
>     and the fix itself met with approval (I think), but the point of
>     contention was the code I added to reproduce the error for
>     regression tests.
>
>     In order to reproduce the error, I need two different synchronous
>     signals to happen, one in normal code, one in the error handler
>     which writes the hs-err file. Originally I choose SIGSEGV and
>     SIGILL. I added functions to generate (true, real) SIGSEGVs and
>     SIGILLs. But nobody liked my generate-sigill function, therefore I
>     changed the code to generate a SIGFPE instead. For the test, it does
>     not matter which signals are generated as long as they are
>     synchronous (so, one of SEGV,ILL,BUS,FPE).
>
>     In order to check that the fix works, one can do:
>
>     java -XX:ErrorHandlerTest=15 -XX:TestCrashInErrorHandler=14
>
>     The VM will first crash with a SIGSEGV, enter error handling, crash
>     again with a different synchronous signal (FPE, in this case). An
>     unfixed VM will die immediately and we get a torn hs-err file. A
>     fixed VM will show the "error occurred during error handling" string
>     and continue with the error reporting.
>
>     Kind Regards,
>
>     Thomas Stüfe
>
>
>
>     On Mon, Dec 8, 2014 at 11:37 AM, Thomas Stüfe
>     <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>> wrote:
>
>         Hi David, Dean,
>
>         On Fri, Dec 5, 2014 at 8:05 AM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>
>             On 3/12/2014 8:47 PM, Thomas Stüfe wrote:
>
>                 Hi Dean,
>
>                 I dont understand. Such a function does not exist, does
>                 it? So I would
>                 have to write it:
>
>                 Do you mean generating and using a StubRoutine which
>                 would SIGILL? I did
>                 not do this because I wanted to be able to generate
>                 SIGILL also in
>                 initialization code, where StubRoutines may not yet be
>                 generated. This
>                 point may may be arguable, but as this function is used
>                 to test error
>                 handling, it may be interesting to test it for
>                 half-initialized VMs too.
>
>                 Otherwise I would implement the CPU specific
>                 generate_illegal_instruction_____sequence() probably the
>                 same way as I do
>                 now the crash_with_sigill() function. That would mean a
>                 bit of more code
>                 duplication because:
>                 - Either I use the method I use now (reserve_memory and
>                 copy the
>                 instructions to the reserved page)
>                 - Or I use inline assembly - which probably does not
>                 work across
>                 multiple OSs, so for CPUs which span various OSs I would
>                 have to add one
>                 function per os_cpu combination, not just per cpu.
>
>
>             I don't think there is any OS dependency with inline
>             assembly - only compiler. And I am also concerned that
>             writing code to an executable page will also enter the realm
>             of "self-modifying code" and all the jumping through hoops
>             that entails. That aspect hadn't occurred to me till Dean
>             raised it. I'm forming the view that triggering a SIGILL is
>             more effort than it is worth for a secondary testing function.
>
>
>         Well, the code is used and works in our VM since some years on a
>         number of CPUs, so the problem with the flushing do not occur at
>         least in our cases. But I agree with you, and this seems to be a
>         point of contention and it is really too unimportant to stop the
>         whole patch.
>
>         The whole point of using SIGILL was to have another unblockable
>         signal besides SIGSEGV to occur naturally (without raising) to
>         be able to demonstrate the bug before fixing it. I will now
>         attempt to change the patch to use either SIGFPE or SIGBUS as a
>         secondary signal. Maybe generating those signals with pure C/C++
>         is easier. If that does not work out, I will see what I can do
>         with raise().
>
>         Thanks and Kind regards, Thomas
>
>
>
>


More information about the hotspot-runtime-dev mailing list