PPC64: JVM crash on Hadoop + Terasort

Gustavo Romero gromero at linux.vnet.ibm.com
Thu Dec 8 13:49:23 UTC 2016


Hi Martin,

ah! Yup, it really seems to be the same problem, a memory ordering issue... very interesting!

I backported to 8 and I'll do some additional tests on Hadoop + Terasort and let
you know about the results:

http://cr.openjdk.java.net/~gromero/8170409/

Does SAP plan to backport to 8u?

Thank you very much!


Best regards,
Gustavo

On 08-12-2016 07:39, Doerr, Martin wrote:
> Hi Gustavo,
> 
> seems to be the bug which was recently fixed in 9 but not downported to 8:
> 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Romero
> Sent: Mittwoch, 7. Dezember 2016 23:21
> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
> Subject: Re: PPC64: JVM crash on Hadoop + Terasort
> 
> Also - I forgot to mention - the problem seems to disappear if OpenJDK 8 tag
> 65-b17 is used instead.
> 
> On 07-12-2016 20:14, Gustavo Romero wrote:
>> Hi,
>>
>> We are experiencing some JVM crashes on 8 when trying to load a large 
>> Terasort data set (1 TiB) into Hadoop on PPC64.
>>
>> Please refer to the following hs_err log:
>> http://paste.fedoraproject.org/501254/48114364/raw/
>>
>> The crash seems to be due to an indexed doubleword store in
>> libjvm.so:
>>
>> 0x85aeb4:	2a 49 e8 7e 	stdx    r23,r8,r9
>>
>> that by its turn is generated from
>>
>> <CardTableModRefBS::process_stride(Space*, MemRegion, int, int, OopsInGenClosure*, CardTableRS*, signed char**, unsigned long, unsigned long)+468>:	stdx    r23,r8,r9
>>
>> present in Parallel GC code 
>> hotspot/src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp
>>
>> 259	  } else {
>> 260	    // In this case we can help our neighbour by just asking them
>> 261	    // to stop at our first card (even though it may not be dirty).
>> 262	    NOISY(tty->print_cr(" LNC: first block is not a non-array object; setting LNC to first card of current chunk");)
>> 263	    assert(lowest_non_clean[cur_chunk_index] == NULL, "Write once : value should be stable hereafter");
>> 264	    jbyte* first_card_of_cur_chunk = byte_for(chunk_mr.start());
>> 265	    lowest_non_clean[cur_chunk_index] = first_card_of_cur_chunk;
>> 266	  }
>>
>> 0x85aeb4: 2a 49 e8 7e stdx r23,r8,r9 is generated from parCardTableModRefBS.cpp:265 more precisely.
>>
>> I'm not able to reproduce that yet on 9 and create a test case but I 
>> would like to ask if someone know some issue potentially related to 
>> that one and already reported to the JBS. I could not find any recent ones.
>>
>> Also, what could be a good code or JVM configuration to exercise 
>> intensively that code, i.e. the else branch parCardTableModRefBS.cpp:259-265.
>>
>> Thank you very much.
>>
>>
>> Regards,
>> Gustavo
>>
> 



More information about the ppc-aix-port-dev mailing list