PPC64: JVM crash on Hadoop + Terasort

Thu Dec 8 14:48:30 UTC 2016

Hi Gustavo,

I can do the downport.

It would be good tough if you could confirm that the fix solves your
Hadoop/Terasort problem.

Regards,
Volker

On Thu, Dec 8, 2016 at 2:49 PM, Gustavo Romero
<gromero at linux.vnet.ibm.com> wrote:
> Hi Martin,
>
> ah! Yup, it really seems to be the same problem, a memory ordering issue... very interesting!
>
> I backported to 8 and I'll do some additional tests on Hadoop + Terasort and let
> you know about the results:
>
> http://cr.openjdk.java.net/~gromero/8170409/
>
> Does SAP plan to backport to 8u?
>
> Thank you very much!
>
>
> Best regards,
> Gustavo
>
> On 08-12-2016 07:39, Doerr, Martin wrote:
>> Hi Gustavo,
>>
>> seems to be the bug which was recently fixed in 9 but not downported to 8:
>> 8170409: CMS: Crash in CardTableModRefBSForCTRS::process_chunk_boundaries
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Romero
>> Sent: Mittwoch, 7. Dezember 2016 23:21
>> To: hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net
>> Subject: Re: PPC64: JVM crash on Hadoop + Terasort
>>
>> Also - I forgot to mention - the problem seems to disappear if OpenJDK 8 tag
>> 65-b17 is used instead.
>>
>> On 07-12-2016 20:14, Gustavo Romero wrote:
>>> Hi,
>>>
>>> We are experiencing some JVM crashes on 8 when trying to load a large
>>> Terasort data set (1 TiB) into Hadoop on PPC64.
>>>
>>> Please refer to the following hs_err log:
>>> http://paste.fedoraproject.org/501254/48114364/raw/
>>>
>>> The crash seems to be due to an indexed doubleword store in
>>> libjvm.so:
>>>
>>> 0x85aeb4:    2a 49 e8 7e     stdx    r23,r8,r9
>>>
>>> that by its turn is generated from
>>>
>>> <CardTableModRefBS::process_stride(Space*, MemRegion, int, int, OopsInGenClosure*, CardTableRS*, signed char**, unsigned long, unsigned long)+468>:  stdx    r23,r8,r9
>>>
>>> present in Parallel GC code
>>> hotspot/src/share/vm/gc_implementation/parNew/parCardTableModRefBS.cpp
>>>
>>> 259    } else {
>>> 260      // In this case we can help our neighbour by just asking them
>>> 261      // to stop at our first card (even though it may not be dirty).
>>> 262      NOISY(tty->print_cr(" LNC: first block is not a non-array object; setting LNC to first card of current chunk");)
>>> 263      assert(lowest_non_clean[cur_chunk_index] == NULL, "Write once : value should be stable hereafter");
>>> 264      jbyte* first_card_of_cur_chunk = byte_for(chunk_mr.start());
>>> 265      lowest_non_clean[cur_chunk_index] = first_card_of_cur_chunk;
>>> 266    }
>>>
>>> 0x85aeb4: 2a 49 e8 7e stdx r23,r8,r9 is generated from parCardTableModRefBS.cpp:265 more precisely.
>>>
>>> I'm not able to reproduce that yet on 9 and create a test case but I
>>> would like to ask if someone know some issue potentially related to
>>> that one and already reported to the JBS. I could not find any recent ones.
>>>
>>> Also, what could be a good code or JVM configuration to exercise
>>> intensively that code, i.e. the else branch parCardTableModRefBS.cpp:259-265.
>>>
>>> Thank you very much.
>>>
>>>
>>> Regards,
>>> Gustavo
>>>
>>
>