RFR: 8241603: ZGC: java/lang/management/MemoryMXBean/MemoryTestZGC.sh crashes on macOS

Tue Apr 14 11:12:13 UTC 2020

Thanks a lot for testing! The APIC id issue seems to come in a slightly 
different shape from what I expected. I'll try to dig deeper and get back.

cheers,
Per

On 4/9/20 8:46 AM, Zeller, Arno wrote:
> Hi Per,
> 
> thanks for trying to find a solution for this issue! I am sorry to report that the patch did not help. The SIGSEGV still occurs. I copied some parts of the hs_err file below
> 
> The VMware VM is always configured to have 6 cores. The difference is, that in case of the crash, it is configured to have 2 * 3 cores. When setting to 1 * 6 cores it does work fine.
> Sorry for not being able to give you  better information. I have no direct access to the hypervisor myself and have to ask our IT colleagues to do changes and then to explain to me what they have done 😊.
> 
> Best regards,
> Arno
> ----
> # A fatal error has been detected by the Java Runtime Environment:
> #  SIGSEGV (0xb) at pc=0x000000010c3aff88, pid=74065, tid=9219
> ...
> Host: MacPro6,1 x86_64 3337 MHz, 6 cores, 16G, Darwin 18.5.0
> Time: Thu Apr  9 00:12:46 2020 CEST elapsed time: 0.160864 seconds (0d 0h 0m 0s)
> ...
> Current thread (0x00007fe038801000):  JavaThread "main" [_thread_in_vm, id=9219, stack(0x000070000d45e000,0x000070000d55e000)]
> 
> Stack: [0x000070000d45e000,0x000070000d55e000],  sp=0x000070000d55d380,  free space=1020k
> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.dylib+0x7a0f88]  ZCPU::id_slow()+0x56
> V  [libjvm.dylib+0x7aef1b]  ZObjectAllocator::shared_small_page_addr() const+0x41
> V  [libjvm.dylib+0x7af7d9]  ZObjectAllocator::remaining() const+0x9
> V  [libjvm.dylib+0x7a4369]  ZHeap::unsafe_max_tlab_alloc() const+0xd
> V  [libjvm.dylib+0x56218b]  ThreadLocalAllocBuffer::compute_size(unsigned long)+0x33
> V  [libjvm.dylib+0x562080]  MemAllocator::allocate_inside_tlab_slow(MemAllocator::Allocation&) const+0xca
> V  [libjvm.dylib+0x562270]  MemAllocator::mem_allocate(MemAllocator::Allocation&) const+0x24
> V  [libjvm.dylib+0x5622d1]  MemAllocator::allocate() const+0x47
> V  [libjvm.dylib+0x7a1318]  ZCollectedHeap::array_allocate(Klass*, int, int, bool, Thread*)+0x28
> V  [libjvm.dylib+0x32c0c7]  InstanceKlass::allocate_objArray(int, int, Thread*)+0xd7
> ----
> 
>> -----Original Message-----
>> From: Per Liden <per.liden at oracle.com>
>> Sent: Dienstag, 7. April 2020 12:53
>> To: Baesken, Matthias <matthias.baesken at sap.com>; hotspot-gc-dev
>> <hotspot-gc-dev at openjdk.java.net>; Langer, Christoph
>> <christoph.langer at sap.com>; Zeller, Arno <arno.zeller at sap.com>
>> Subject: Re: RFR: 8241603: ZGC:
>> java/lang/management/MemoryMXBean/MemoryTestZGC.sh crashes on
>> macOS
>>
>> Thanks! Just checking, are you testing without the workaround[1] you did
>> to your VMware instances?
>>
>> cheers,
>> Per
>>
>> [1] "We solved our issue by reconfiguring the VMWare VM to have no
>> hyperthreading and have the CPUs pinned to the VM. This solved the
>> issues for us." -
>> https://bugs.openjdk.java.net/browse/JDK-
>> 8241603?focusedCommentId=14327438&page=com.atlassian.jira.plugin.syst
>> em.issuetabpanels:comment-tabpanel#comment-14327438
>>
>>
>> On 4/7/20 12:07 PM, Baesken, Matthias wrote:
>>> Hi Per , I put your patch  into our  build/test queue .
>>>
>>> Best regards, Matthias
>>>
>>>
>>> -----Original Message-----
>>> From: Per Liden <per.liden at oracle.com>
>>> Sent: Montag, 6. April 2020 17:04
>>> To: hotspot-gc-dev <hotspot-gc-dev at openjdk.java.net>; Langer,
>> Christoph <christoph.langer at sap.com>; Baesken, Matthias
>> <matthias.baesken at sap.com>; Zeller, Arno <arno.zeller at sap.com>
>>> Subject: RFR: 8241603: ZGC:
>> java/lang/management/MemoryMXBean/MemoryTestZGC.sh crashes on
>> macOS
>>>
>>> It was reported that "Every few days, the test
>>> java/lang/management/MemoryMXBean/MemoryTestZGC.sh crashes on
>> macOS. It
>>> is macOS 10.14.4, and it is a virtualized machine running with VMWare
>>> hypervisor."
>>>
>>> The problem seems to be that the hypervisor (in some configurations) can
>>> migrate a "virtual CPU" from one physical CPU to another, and start to
>>> report a different APIC id. As a result, it can appear as if there are
>>> more than os:processor_count() CPUs in the system. To void this, we
>>> allow more than one APIC id to be mapped to the same logical processor
>>> id, so that os::processor_id() always returns a processor id that is
>>> less than os::processos_count().
>>>
>>> One could argue that this is really a hypervisor bug, but we can still
>>> make an effort to mitigate the problem in the JVM.
>>>
>>> SAP-folks (CC:ing those who commented in the bug), since you ran into
>>> this problem and I don't have access to a VMware setup where I can
>>> test/reproduce this, could you please test this patch to verify it
>>> solves the problem? If so, that would be much appreciated.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8241603
>>> Webrev: http://cr.openjdk.java.net/~pliden/8241603/webrev.0
>>> Testing: Tier 1-6 on macOS (but not macOS on top of VMware)
>>>
>>> cheers,
>>> Per
>>>