RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process
David Holmes
david.holmes at oracle.com
Fri Jan 30 03:58:26 UTC 2015
Looks good!
Just need to wait for Gerard to give a second review.
I can sponsor this for you, but it may need to wait till my Monday as it
is already Friday afternoon for me.
Thanks,
David
On 29/01/2015 11:47 PM, Thomas Stüfe wrote:
> Hi David,
>
> thank you. Here is the new patch:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.06/webrev/
>
> Just corrected the typos and removed trailing whitespaces.
>
> Kind Regards, Thomas
>
>
> On Thu, Jan 29, 2015 at 8:47 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Thomas,
>
> Sorry for the delay in getting back to this. Looks good - only a few
> typos:
>
> debug.hpp:
>
> + // 15 - SIGILL
>
> should be SIGFPE.
>
> vmError_<os>.cpp:
>
> + // Note that in the hotspot
>
> Delete 'the'.
>
> + // hang or be ended immediately if secondary errors happend during
> error
>
> s/terminated/ended/
>
> s/happen/happend/
>
> Thanks,
> David
>
> On 26/01/2015 8:23 PM, Thomas Stüfe wrote:
>
> Anyone? I would really like to close this issue.
>
> Kind Regards,
>
> Thomas Stüfe
>
> On Wed, Jan 21, 2015 at 10:39 AM, Thomas Stüfe
> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>
> <mailto:thomas.stuefe at gmail.__com
> <mailto:thomas.stuefe at gmail.com>>> wrote:
>
> Hi all,
>
> I would like to take up discussion about this issue again.
>
> As a reminder, here the bug report:
> https://bugs.openjdk.java.net/__browse/JDK-8065895
> <https://bugs.openjdk.java.net/browse/JDK-8065895>
>
> And here a new version of my patch:
> http://cr.openjdk.java.net/~__stuefe/webrevs/8065895/webrev.__05/webrev/
> <http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.05/webrev/>
>
> As far as I remember the discussion, the issue itself was
> understood
> and the fix itself met with approval (I think), but the
> point of
> contention was the code I added to reproduce the error for
> regression tests.
>
> In order to reproduce the error, I need two different
> synchronous
> signals to happen, one in normal code, one in the error handler
> which writes the hs-err file. Originally I choose SIGSEGV and
> SIGILL. I added functions to generate (true, real) SIGSEGVs and
> SIGILLs. But nobody liked my generate-sigill function,
> therefore I
> changed the code to generate a SIGFPE instead. For the
> test, it does
> not matter which signals are generated as long as they are
> synchronous (so, one of SEGV,ILL,BUS,FPE).
>
> In order to check that the fix works, one can do:
>
> java -XX:ErrorHandlerTest=15 -XX:TestCrashInErrorHandler=14
>
> The VM will first crash with a SIGSEGV, enter error
> handling, crash
> again with a different synchronous signal (FPE, in this
> case). An
> unfixed VM will die immediately and we get a torn hs-err
> file. A
> fixed VM will show the "error occurred during error
> handling" string
> and continue with the error reporting.
>
> Kind Regards,
>
> Thomas Stüfe
>
>
>
> On Mon, Dec 8, 2014 at 11:37 AM, Thomas Stüfe
> <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>
> <mailto:thomas.stuefe at gmail.__com
> <mailto:thomas.stuefe at gmail.com>>> wrote:
>
> Hi David, Dean,
>
> On Fri, Dec 5, 2014 at 8:05 AM, David Holmes
> <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>
> <mailto:david.holmes at oracle.__com
> <mailto:david.holmes at oracle.com>>> wrote:
>
> On 3/12/2014 8:47 PM, Thomas Stüfe wrote:
>
> Hi Dean,
>
> I dont understand. Such a function does not
> exist, does
> it? So I would
> have to write it:
>
> Do you mean generating and using a StubRoutine
> which
> would SIGILL? I did
> not do this because I wanted to be able to generate
> SIGILL also in
> initialization code, where StubRoutines may not
> yet be
> generated. This
> point may may be arguable, but as this function
> is used
> to test error
> handling, it may be interesting to test it for
> half-initialized VMs too.
>
> Otherwise I would implement the CPU specific
> generate_illegal_instruction_______sequence()
> probably the
>
> same way as I do
> now the crash_with_sigill() function. That
> would mean a
> bit of more code
> duplication because:
> - Either I use the method I use now
> (reserve_memory and
> copy the
> instructions to the reserved page)
> - Or I use inline assembly - which probably
> does not
> work across
> multiple OSs, so for CPUs which span various
> OSs I would
> have to add one
> function per os_cpu combination, not just per cpu.
>
>
> I don't think there is any OS dependency with inline
> assembly - only compiler. And I am also concerned that
> writing code to an executable page will also enter
> the realm
> of "self-modifying code" and all the jumping
> through hoops
> that entails. That aspect hadn't occurred to me
> till Dean
> raised it. I'm forming the view that triggering a
> SIGILL is
> more effort than it is worth for a secondary
> testing function.
>
>
> Well, the code is used and works in our VM since some
> years on a
> number of CPUs, so the problem with the flushing do not
> occur at
> least in our cases. But I agree with you, and this
> seems to be a
> point of contention and it is really too unimportant to
> stop the
> whole patch.
>
> The whole point of using SIGILL was to have another
> unblockable
> signal besides SIGSEGV to occur naturally (without
> raising) to
> be able to demonstrate the bug before fixing it. I will now
> attempt to change the patch to use either SIGFPE or
> SIGBUS as a
> secondary signal. Maybe generating those signals with
> pure C/C++
> is easier. If that does not work out, I will see what I
> can do
> with raise().
>
> Thanks and Kind regards, Thomas
>
>
>
>
>
More information about the hotspot-runtime-dev
mailing list