RFR(s): 8065895: Synchronous signals during error reporting may terminate or hang VM process

Fri Jan 30 03:58:26 UTC 2015

Looks good!

Just need to wait for Gerard to give a second review.

I can sponsor this for you, but it may need to wait till my Monday as it 
is already Friday afternoon for me.

Thanks,
David

On 29/01/2015 11:47 PM, Thomas Stüfe wrote:
> Hi David,
>
> thank you. Here is the new patch:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.06/webrev/
>
> Just corrected the typos and removed trailing whitespaces.
>
> Kind Regards, Thomas
>
>
> On Thu, Jan 29, 2015 at 8:47 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     Sorry for the delay in getting back to this. Looks good - only a few
>     typos:
>
>     debug.hpp:
>
>     + // 15 - SIGILL
>
>     should be SIGFPE.
>
>     vmError_<os>.cpp:
>
>     + // Note that in the hotspot
>
>     Delete 'the'.
>
>     + // hang or be ended immediately if secondary errors happend during
>     error
>
>     s/terminated/ended/
>
>     s/happen/happend/
>
>     Thanks,
>     David
>
>     On 26/01/2015 8:23 PM, Thomas Stüfe wrote:
>
>         Anyone? I would really like to close this issue.
>
>         Kind Regards,
>
>         Thomas Stüfe
>
>         On Wed, Jan 21, 2015 at 10:39 AM, Thomas Stüfe
>         <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>
>         <mailto:thomas.stuefe at gmail.__com
>         <mailto:thomas.stuefe at gmail.com>>> wrote:
>
>              Hi all,
>
>              I would like to take up discussion about this issue again.
>
>              As a reminder, here the bug report:
>         https://bugs.openjdk.java.net/__browse/JDK-8065895
>         <https://bugs.openjdk.java.net/browse/JDK-8065895>
>
>              And here a new version of my patch:
>         http://cr.openjdk.java.net/~__stuefe/webrevs/8065895/webrev.__05/webrev/
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8065895/webrev.05/webrev/>
>
>              As far as I remember the discussion, the issue itself was
>         understood
>              and the fix itself met with approval (I think), but the
>         point of
>              contention was the code I added to reproduce the error for
>              regression tests.
>
>              In order to reproduce the error, I need two different
>         synchronous
>              signals to happen, one in normal code, one in the error handler
>              which writes the hs-err file. Originally I choose SIGSEGV and
>              SIGILL. I added functions to generate (true, real) SIGSEGVs and
>              SIGILLs. But nobody liked my generate-sigill function,
>         therefore I
>              changed the code to generate a SIGFPE instead. For the
>         test, it does
>              not matter which signals are generated as long as they are
>              synchronous (so, one of SEGV,ILL,BUS,FPE).
>
>              In order to check that the fix works, one can do:
>
>              java -XX:ErrorHandlerTest=15 -XX:TestCrashInErrorHandler=14
>
>              The VM will first crash with a SIGSEGV, enter error
>         handling, crash
>              again with a different synchronous signal (FPE, in this
>         case). An
>              unfixed VM will die immediately and we get a torn hs-err
>         file. A
>              fixed VM will show the "error occurred during error
>         handling" string
>              and continue with the error reporting.
>
>              Kind Regards,
>
>              Thomas Stüfe
>
>
>
>              On Mon, Dec 8, 2014 at 11:37 AM, Thomas Stüfe
>              <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>
>         <mailto:thomas.stuefe at gmail.__com
>         <mailto:thomas.stuefe at gmail.com>>> wrote:
>
>                  Hi David, Dean,
>
>                  On Fri, Dec 5, 2014 at 8:05 AM, David Holmes
>                  <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.__com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>                      On 3/12/2014 8:47 PM, Thomas Stüfe wrote:
>
>                          Hi Dean,
>
>                          I dont understand. Such a function does not
>         exist, does
>                          it? So I would
>                          have to write it:
>
>                          Do you mean generating and using a StubRoutine
>         which
>                          would SIGILL? I did
>                          not do this because I wanted to be able to generate
>                          SIGILL also in
>                          initialization code, where StubRoutines may not
>         yet be
>                          generated. This
>                          point may may be arguable, but as this function
>         is used
>                          to test error
>                          handling, it may be interesting to test it for
>                          half-initialized VMs too.
>
>                          Otherwise I would implement the CPU specific
>                          generate_illegal_instruction_______sequence()
>         probably the
>
>                          same way as I do
>                          now the crash_with_sigill() function. That
>         would mean a
>                          bit of more code
>                          duplication because:
>                          - Either I use the method I use now
>         (reserve_memory and
>                          copy the
>                          instructions to the reserved page)
>                          - Or I use inline assembly - which probably
>         does not
>                          work across
>                          multiple OSs, so for CPUs which span various
>         OSs I would
>                          have to add one
>                          function per os_cpu combination, not just per cpu.
>
>
>                      I don't think there is any OS dependency with inline
>                      assembly - only compiler. And I am also concerned that
>                      writing code to an executable page will also enter
>         the realm
>                      of "self-modifying code" and all the jumping
>         through hoops
>                      that entails. That aspect hadn't occurred to me
>         till Dean
>                      raised it. I'm forming the view that triggering a
>         SIGILL is
>                      more effort than it is worth for a secondary
>         testing function.
>
>
>                  Well, the code is used and works in our VM since some
>         years on a
>                  number of CPUs, so the problem with the flushing do not
>         occur at
>                  least in our cases. But I agree with you, and this
>         seems to be a
>                  point of contention and it is really too unimportant to
>         stop the
>                  whole patch.
>
>                  The whole point of using SIGILL was to have another
>         unblockable
>                  signal besides SIGSEGV to occur naturally (without
>         raising) to
>                  be able to demonstrate the bug before fixing it. I will now
>                  attempt to change the patch to use either SIGFPE or
>         SIGBUS as a
>                  secondary signal. Maybe generating those signals with
>         pure C/C++
>                  is easier. If that does not work out, I will see what I
>         can do
>                  with raise().
>
>                  Thanks and Kind regards, Thomas
>
>
>
>
>