[jdk11u-dev] RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v7]

Thomas Stuefe stuefe at openjdk.java.net
Thu Jan 27 07:53:37 UTC 2022


On Thu, 20 Jan 2022 15:43:46 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

>> Initial version of JEP-391 backport to jdk11u.
>> Build system changes are mostly clean, except cds disabling part, it's copy&paste from aix part.
>> Things needing attention: os_bsd_aarch64.cpp and W^X transitions.
>> serviceability agent is mostly clean.
>> This passed GHA_tier1 testing.
>> Full regression testing is running now on Azul's infra, will report/update PR when done.
>> **Update: TCK passed, full regression testing is fine on intel platforms, macos/aarch64/openjdk11 is good compared to macos/aarch64/zulu11**
>> Sharing this PR slightly earlier to let other interested parties to run the tests too.
>> Example of cross-building on intel mac (needs Xcode12/13):
>> sh configure --with-boot-jdk=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/ --with-build-jdk=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/ --disable-warnings-as-errors --openjdk-target=aarch64-apple-darwin --with-extra-cflags='-arch arm64' --with-extra-ldflags='-arch arm64' --with-extra-cxxflags='-arch arm64'
>
> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Ignore TestOnError.java on macos_aarch64

Hi, 

I was asked off-list to have a quick look at these problems.

-----

About the runtime-exec related crashes. I looked a the hs-err file [forkAndExec_coredump.log](https://github.com/openjdk/jdk11u-dev/files/7927720/forkAndExec_coredump.log):

- We crash with a SIGBUS because of accessing an unaligned address: `siginfo: si_signo: 10 (SIGBUS), si_code: 1 (BUS_ADRALN), si_addr: 0x000000013000bbfc`
- because we try to execute code from there `pc=0x000000013000bbfc` and its not 64-bit aligned.
- Crash address points into the code heap, into interpreter stub for calling native methods: `method entry point (kind = native)  [0x000000013000b940, 0x000000013000c100]  1984 bytes`
- the native method involved is `java.lang.ProcessImpl.forkAndExec(I[B[B[BI[BI[B[IZ)I+0` but I assume we never entered it. Also see note that the crash offset is 0.

Atm I assume something is wrong with the native call stub generated by the template interpreter. Its weird though that this would manifest only when calling into this particular native method.

I don't think jspawnhelper is involved. Beside what I wrote above, we do crash in the parent process, before exec'ing jspawnhelper. I think we never even got around to fork.

It's weird that this is only happening in 11. Is it? 

Just for completeness I compared the native process handling code between 11 and 19, and see not much difference. That code seldom changed, and nothing relevant changed for MacOS. Most changes had to do with Solaris platform removal. Note that MacOS always used posix_spawn.

Looking at the code generating method entry stubs on aarch64, there are some differences between JDK19 and JDK11 in `TemplateInterpreterGenerator::generate_native_entry`. Could 11 miss some fixes here? E.g. there is mentioning of a race fixed in the context of JEP 376 (https://github.com/openjdk/jdk/blob/2ea0edf2c40edde4c191864a40e7a4d741ac0b8e/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#L1333-L1340). Not sure how relevant that would be in older releases and outside ZGC though. 

I am not an aarch64 expert, maybe someone from RedHat could look at this (@theRealAph)  ?

-----

NMT crashes

I looked at https://github.com/openjdk/jdk11u-dev/files/7927880/assertCrash_coredump.log.

All NMT tests crash when trying to generate an NMT report via jcmd. They crash when walking stored information about virtual memory segments prior to printing, because there boundaries are not page aligned.

What confused me was that I thought MacOS m1 uses 64k pages. I tried to confirm this by looking at the crash dump (I wish we would just print out os::vm_page_size) but I found the point where we protect stack guards:


Event: 8.927 Protecting memory [0x00000001722b8000,0x00000001722c4000] with protection modes 0


and these boundaries are not 64-k page aligned. Could someone with an m1 Mac confirm that os::vm_page_size is 64k? Is there anything special on MacOS m1 wrt to page size? 

One possible reason would be that NMT also accounts for thread stacks too, and traditionally accounted those as if they were allocated using mmap. But there is nothing that says that posix threads need to align their stacks to page size. E.g. on AIX it was not so. So they may not be page-aligned, which would generate the kind of errors you see. 

Zhengyu fixed this code up in JDK 13 with https://bugs.openjdk.java.net/browse/JDK-8204552. Maybe a similar solution is needed for MacOS m1?

To quickly confirm or exclude this theory, you could just disable the accounting of thread stacks (e.g. by stubbing out `MemTracker::record_thread_stack` and `MemTracker::release_thread_stack`. Other than that, maybe @zhengyu123 can take a look too .

---------

Hope I could help,

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk11u-dev/pull/715


More information about the jdk-updates-dev mailing list