Debugging segmentation faults in the JVM on linux-powerpc

Sun Jun 11 18:58:11 UTC 2017

Hi Adrian,

On Sun, Jun 11, 2017 at 12:53 PM, John Paul Adrian Glaubitz <
glaubitz at physik.fu-berlin.de> wrote:

> Hi Thomas!
>
> On 06/11/2017 08:45 AM, Thomas Stüfe wrote:
> > I'll take a look at it, I believe the final SafeFetch implementation for
> zero was last done by me: https://bugs.openjdk.java.net/browse/JDK-8076185
> .
>
> Thanks. I'm very glad to hear that someone more knowledgeable with the
> code will have a look.
>
> > SafeFetch is used to load data from a potentially unmapped address,
> mainly used in error reporting. If that load triggers a segfault, that
> fault is catched and
> > the function returns a special value to indicate the address was
> unmapped.
>
> Yeah. I have learned that now as well ;).
>
> > Its function is in the debug build tested at VM startup, which is the
> segfault you are seeing. If it were to work correctly, signal handler would
> recognize the
> > segfault to be originating from a safefetch call and not crash but
> return the mentioned special value.
> >
> > On almost all platforms this is implemented via stub assembler but as
> zero aims to be pure C we did implement this using posix setjmp. I'll take
> a look at why
> > this stopped working.
> >
> > In the meantime, as a workaround just comment out the calls to
> test_safefetch32() and test_safefetchN() in StubRoutines::initialize2().
>
> That doesn't seem to work though, it still crashes [1].
>
> I made this change:
>
> --- a/hotspot/src/share/vm/runtime/stubRoutines.cpp~    2017-05-11
> 15:11:42.000000000 +0300
> +++ b/hotspot/src/share/vm/runtime/stubRoutines.cpp     2017-06-11
> 12:25:56.068000000 +0300
> @@ -358,13 +358,6 @@
>    test_arraycopy_func(CAST_FROM_FN_PTR(address,
> Copy::aligned_conjoint_words), sizeof(jlong));
>    test_arraycopy_func(CAST_FROM_FN_PTR(address,
> Copy::aligned_disjoint_words), sizeof(jlong));
>
> -  // test safefetch routines
> -  // Not on Windows 32bit until 8074860 is fixed
> -#if ! (defined(_WIN32) && defined(_M_IX86))
> -  test_safefetch32();
> -  test_safefetchN();
> -#endif
> -
>  #endif
>  }
>
> But it still segfaults. Are there other places where safefetch*() needs to
> be disabled?
>
>
Sorry, I was wrong, the workaround cannot work. SafeFetch is required to
work in a number of other places, so there is no easy way around fixing
this. I also am very curious as to why we crash, the SafeFetch mechanism on
Zero is quite simple.

> Please note:
>
> I cannot reproduce the problem on x86_64 which made me believe to think
> that there might
> be some code guarded out on x86_64 which is only used on the generic zero
> targets.
>

Unfortunately, I ran into a number of issues building zero on ppc and x64,
which is quite annoying. Zero gets not enough love :) Finally managed to
build it on x64, but as you said, SafeFetch works fine here.

I was not able to build it on ppc64 (the only ppc machines I have
available). Will retry next week. Without reproducing this error, it is
difficult to fix.

Note that I have no ppc32 (big endian?) machine available.

If you feel like investigating yourself, feel free. The mechanism for
safefetch on zero is quite simple, see the patch for JDK-8076185:
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/633053d4d137

Basically, before the questionable memory access a jump buffer is set up
and its pointer is stored in pthread TLS; which then is recovered in the
signal handler. Its existence is taken as proof that the SIGSEGV was caused
by SafeFetch; and we use longjmp to jump back to before the crash. Simple
and unexciting :)

Kind Regards, Thomas

> Thanks!
> Adrian
>
> > [1] https://buildd.debian.org/status/fetch.php?pkg=openjdk-9&
> arch=powerpc&ver=9%7Eb170-2&stamp=1497177935&raw=0
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer - glaubitz at debian.org
> `. `'   Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
>