SIGILL crashes JVM on PPC64 LE

Volker Simonis volker.simonis at gmail.com
Wed Jun 1 17:06:31 UTC 2016


Hi Hiroshi, Gustavo,

I'm currently trying to better understand the cause of the crash.
When looking at the Cassandra sources [1] I can see that on ppc we
should actually not call Unsafe.getInt() at all:

UNALIGNED = arch.equals("i386") || arch.equals("x86")
     || arch.equals("amd64") || arch.equals("x86_64") || arch.equals("s390x");

public static int getInt(long address)
{
return UNALIGNED ? unsafe.getInt(address) : getIntByByte(address);
}

Is this behavior different in the version of Cassandra which you have
used for your tests?

I just want to make sure that the problem we reproduce with your
stand-alone test case is the same like the one we are seeing in the
initial Cassandra crash.

Could you please provide the exact versions of Cassandra you have used
and a description of the tests and the way you have executed them when
you saw the initial error?

Thanks a lot for your help,
Volker

[1] https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/memory/MemoryUtil.java

On Wed, Jun 1, 2016 at 10:51 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:
> Hi Volker,
>
> Thank you for your reviewing our fix.
>
> To avoid a generation of illegal instructions when ldisp is not 4-alignment,
> I changed ppc.ad to generate always two instructions for each ld and lwa as
> follows.
> I mean, when ldisp is 4-alignment, nop() is generated redundantly.
>
>      // Operand 'ds' requires 4-alignment.
>      if (Idisp & 0x3) {
>        __ addi($dst$$Register, $mem$$base$$Register, Idisp);
>        __ ld($dst$$Register, 0, $dst$$Register);
>      } else {
>         __ ld($dst$$Register, Idisp, $mem$$base$$Register);
>        __ nop();
>      }
>
> I'm not sure this fix is elegant or not.
>
> In my understanding, an argument of size(n) in ADL must be constant.
> Correct?
> If the number can be dynamic, we can avoid generating nop()...
> Also, we may be able to fix this bug in more higher level (such as IR
> generation).
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> Volker Simonis <volker.simonis at gmail.com> wrote on 06/01/2016 15:37:21:
>
>> From: Volker Simonis <volker.simonis at gmail.com>
>> To: Gustavo Romero <gromero at linux.vnet.ibm.com>
>> Cc: "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> dev at openjdk.java.net>, "hotspot-dev at openjdk.java.net" <hotspot-
>> dev at openjdk.java.net>, Breno Leitao <brenohl at br.ibm.com>, Hiroshi H
>> Horii/Japan/IBM at IBMJP
>> Date: 06/01/2016 15:38
>> Subject: Re: SIGILL crashes JVM on PPC64 LE
>
>>
>> Hi Gustavo, Hiroshi,
>>
>> thanks a lot for the great analysis and the nice stand-alone test
>> case. This is indeed a problem, and it also occurs on ppc64
>> big-endian.
>>
>> I've opened "8158260: PPC64: unaligned Unsafe.getInt can lead to the
>> generation of illegal instructions"
>> (https://bugs.openjdk.java.net/browse/JDK-8158260) for this issue.
>>
>> I'm currently looking at your proposed fix and will come back with a
>> new webrev soon.
>>
>> Thanks a lot and best regards,
>> Volker
>>
>>
>> On Tue, May 31, 2016 at 3:31 AM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>> > Hi Volker
>> >
>> > The following test case has been isolated by Hiroshi Horii and generates
>> > the illegal instruction, crashing the JVM on PPC64 LE:
>> >
>> > UnalignedUnsafeAccess.java:
>> > http://hastebin.com/raw/uqegukific
>> >
>> > $ javac UnalignedUnsafeAccess.java
>> > $ java -Xcomp -Xbatch UnalignedUnsafeAccess
>> >
>> > The issue can be reproduced on OpenJDK 8 downstream, OpenJDK 8, and
>> > OpenJDK 9 - hs_err logs:
>> >
>> > OpenJDK 9, tag 0be6f4f5d186 jdk-9+120:
>> > http://hastebin.com/raw/ecuhukutur
>> >
>> > OpenJDK 8, tag 5aaa43d91c73 tip:
>> > http://hastebin.com/raw/ipohoyafos
>> >
>> > OpenJDK 8 downstream:
>> >
>> > Ubuntu 16.04 LTS
>> > build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14
>> > http://hastebin.com/raw/yetizebofo
>> >
>> > RHEL 7.2:
>> > build 1.8.0_91-b14
>> > http://hastebin.com/raw/irequfawaw
>> >
>> > The crash happens when an illegal instruction - 0xea2f0013 - is
>> > executed.
>> >
>> > The backtrace shows:
>> >
>> > Stack: [0x00003fff56030000,0x00003fff56430000],
>> sp=0x00003fff5642b8d0,  free space=4078k
>> > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>> C=native code)
>> > V  [libjvm.so+0x162104]  loadI2LNode::emit(CodeBuffer&,
>> PhaseRegAlloc*) const+0x194
>> > V  [libjvm.so+0x8ece28]  Compile::fill_buffer(CodeBuffer*,
>> unsigned int*)+0x4e8
>> > V  [libjvm.so+0x368e08]  Compile::Code_Gen()+0x3c8
>> > V  [libjvm.so+0x369e04]  Compile::Compile(ciEnv*, C2Compiler*,
>> ciMethod*, int, bool, bool, bool)+0xf64
>> > V  [libjvm.so+0x271380]  C2Compiler::compile_method(ciEnv*,
>> ciMethod*, int)+0x1f0
>> > V  [libjvm.so+0x3785a4]  CompileBroker::invoke_compiler_on_method
>> (CompileTask*)+0xd54
>> > V  [libjvm.so+0x379dc8]  CompileBroker::compiler_thread_loop()+0x488
>> > V  [libjvm.so+0xa5de90]  compiler_thread_entry(JavaThread*,
>> > Thread*)+0x20
>> > V  [libjvm.so+0xa690c8]  JavaThread::thread_main_inner()+0x178
>> > V  [libjvm.so+0x8c8c10]  java_start(Thread*)+0x170
>> > C  [libpthread.so.0+0x833c]  start_thread+0xfc
>> > C  [libc.so.6+0x12b014]  clone+0xe4
>> >
>> > loadI2LNode class is generated according to the following ADL code in
>> > ppc.ad file:
>> >
>> > instruct loadI2L(iRegLdst dst, memory mem) %{
>> >   match(Set dst (ConvI2L (LoadI mem)));
>> >   predicate(_kids[0]->_leaf->as_Load()->is_unordered());
>> >   ins_cost(MEMORY_REF_COST);
>> >
>> >   format %{ "LWA     $dst, $mem \t// loadI2L" %}
>> >   size(4);
>> >   ins_encode %{
>> >     // TODO: PPC port $archOpcode(ppc64Opcode_lwa);
>> >     int Idisp = $mem$$disp + frame_slots_bias($mem$$base, ra_);
>> >     __ lwa($dst$$Register, Idisp, $mem$$base$$Register);
>> >   %}
>> >   ins_pipe(pipe_class_memory);
>> > %}
>> >
>> > So the generated illegal instruction comes from:
>> > lwa 17,17,15  (DS-form: lwa RT, DS, RA)
>> >
>> > As DS field must always be 4-byte aligned (i.e. DS field is always
>> > concatenated with 0b00), 17 as DS (middle 17 value) is illegal,
>> > generating the illegal instruction in question:
>> >
>> > 11101010000000000000000000000010: LWA
>> > 00000010001000000000000000000000: 17
>> > 00000000000000000000000000010001: 17
>> > 00000000000011110000000000000000: 15
>> > --------------------------------
>> > 11101010001011110000000000010011: 0xEA2F0013 => Illegal instruction
>> >
>> > The following change is proposed to fix the issue and deals with the
>> > unaligned displacements:
>> >
>> > OpenJDK 9 webrev:
>> > 81.de.7a9f.ip4.static.sl-reverse.com./illegal/9
>> >
>> > OpenJDK 8 webrev:
>> > 81.de.7a9f.ip4.static.sl-reverse.com./illegal/8
>> >
>> > Could we open a JIRA ticket regarding this issue in order to include it
>> > in the webrev?
>> >
>> > Thank you!
>> >
>> > Best regards,
>> > Gustavo
>> >
>> > On 12-05-2016 09:39, Volker Simonis wrote:
>> >> And I forgot to mention: I've checked and we don't emit vsel
>> >> instructions in jdk8 on ppc. So it must be a coincidence that changing
>> >> the endianess of the offending instruction yields a valid 'vsel'
>> >> instruction.
>> >>
>> >>
>> >>
>> >> On Thu, May 12, 2016 at 2:26 PM, Volker Simonis
>> >> <volker.simonis at gmail.com> wrote:
>> >>> Hi Gustavo,
>> >>>
>> >>> thanks for the bug report. The hs_err file you provided indicates that
>> >>> this crash happened with Ubuntu's openjdk 8 version. Can you still
>> >>> reproduce this with the the newest jdk9 builds?
>> >>>
>> >>> Also, I can see from the hs_err file that the crash happened in the C2
>> >>> compiled method java.util.TimSort.countRunAndMakeAscending which
>> >>> doesn't seem to be related to nio and unsafe.
>> >>>
>> >>> Ideally, you could post an easy test case to reproduce the problem. If
>> >>> that's not possible, it would be helpful if you could post the output
>> >>> of a failing run with
>> >>> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
>> >>> -
>>
>> XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
>> >>> In order to get the disassembly output for compiled methods you have
>> >>> to build the hsdis library from hotspot/src/share/tools/hsdis (it has
>> >>> a README with build instructions).
>> >>>
>> >>> Regards,
>> >>> Volker
>> >>>
>> >>>
>> >>> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
>> >>> <gromero at linux.vnet.ibm.com> wrote:
>> >>>> Hi
>> >>>>
>> >>>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>> >>>>
>> >>>> hs_err log:
>> >>>> http://hastebin.com/raw/fovagunaci
>> >>>>
>> >>>> The application employs methods from both java.nio.ByteBuffer and
>> >>>> sun.misc.Unsafe classes in order to write and read from an
>> allocated buffer.
>> >>>>
>> >>>> A interesting thing is that after debugging the instruction
>> that caused the
>> >>>> said SIGILL:
>> >>>>
>> >>>>    0x3fff902839a4:      cmpwi   cr6,r17,0
>> >>>>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>> >>>>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal
>> instruction
>> >>>>    0x3fff902839b0:      add     r15,r15,r17
>> >>>>    0x3fff902839b4:      add     r14,r17,r14
>> >>>>
>> >>>> I found that when its endianness is changed it turns out to be a
>> >>>> valid
>> >>>> instruction: vsel v24,v0,v5,v31
>> >>>>
>> >>>> However, I'm still unable to determine if it's an application
>> issue, something
>> >>>> with JVM unsafe interface code, or something else.
>> >>>>
>> >>>> Any clue on how to narrow down this SIGILL?
>> >>>>
>> >>>> Thank you!
>> >>>>
>> >>>> Regards,
>> >>>> Gustavo
>> >>>>
>> >>
>> >
>>
>


More information about the hotspot-dev mailing list