Debugging segmentation faults in the JVM on linux-powerpc

Sun Jun 11 19:15:26 UTC 2017

On 06/11/2017 08:58 PM, Thomas Stüfe wrote:
> Sorry, I was wrong, the workaround cannot work. SafeFetch is required to work in a number
> of other places, so there is no easy way around fixing this. I also am
> very curious as to why we crash, the SafeFetch mechanism on Zero is quite simple.

No problem.

As for the why of the crash, I have discussed this with another friend and expert today
and he suggested that gcc might be optimizing too aggressively here.

>     I cannot reproduce the problem on x86_64 which made me believe to think that there might
>     be some code guarded out on x86_64 which is only used on the generic zero targets.
> 
> 
> Unfortunately, I ran into a number of issues building zero on ppc and x64, which is quite annoying.
> Zero gets not enough love :) Finally managed to build it on
> x64, but as you said, SafeFetch works fine here.

Yep.

> I was not able to build it on ppc64 (the only ppc machines I have available). Will retry next
> week. Without reproducing this error, it is difficult to fix.

I can just create an account for you on the PPC machine which I used. It's actually a POWER8
machine, so rebuilding everything is reasonably fast - even for zero.

So, if you send me a private email, preferably signed by some trusted key together with your
public SSH key and your desired user account, I'm happy to create an account for you on
this machine. From your name, I assume you're German, so we can also speak German then.

I have been providing accounts to Debian porter boxes for various upstream projects in the past :).

> Note that I have no ppc32 (big endian?) machine available.

Yes. That's big-endian. Not sure whether little-endian ppc32 actually exists.

> If you feel like investigating yourself, feel free. The mechanism for safefetch on zero is quite simple, see the patch for
> JDK-8076185: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/633053d4d137
> 
> Basically, before the questionable memory access a jump buffer is set up and its pointer is stored in pthread TLS; which then is recovered in the signal
> handler. Its existence is taken as proof that the SIGSEGV was caused by SafeFetch; and we use longjmp to jump back to before the crash. Simple and unexciting :)

Thanks. I will have a look, too. But as I said, if you want, I'll just create
an account for you.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz at debian.org
`. `'   Freie Universitaet Berlin - glaubitz at physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913