[premain] need help merging g1BarrierSetC2.cpp
In the premain branch, we have made some changes in g1BarrierSetC2.cpp commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024 Cleanup AOT runtime constants code commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024 AOT G1 barrier loads region grain and card shifts via runtime constants data area Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes. commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024 8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024 8334060: Implementation of Late Barrier Expansion for G1 When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain. commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024 8341091: CDS: Segmented roots array misses roots Andrew Dinn or Vladimir Kozlov, could you help merging with mainline? Thanks - Ioi
Hi Ioi/Vladimir, I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch. https://github.com/adinn/leyden/tree/premain The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run. The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped. I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx. regards, Andrew Dinn ----------- On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
Thanks Andrew. I'll test your merge on our CI and then push to premain. - Ioi On 10/9/24 7:46 AM, Andrew Dinn wrote:
Hi Ioi/Vladimir,
I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch.
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!...
The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run.
The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped.
I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx.
regards,
Andrew Dinn -----------
On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
I am seeing some failures on linux/x64 with various assertion failures in the GC code, either during collection or gc verification. I merged up to d504df0e3d7928f0e7ff10ace03e002953c6a58f from Andrew's repo. runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden_old runtime/cds/appcds/applications/JavacBench.java#leyden runtime/cds/appcds/applications/JavacBench.java#leyden_old runtime/cds/appcds/applications/SpringPetClinic.java#leyden_old runtime/cds/appcds/leyden/ExcludedClasses.java A sample stack trace # SIGSEGV (0xb) at pc=0x00007ff1d9260f78, pid=472145, tid=472252 V [libjvm.so+0x636f78] oopDesc::size_given_klass(Klass*)+0x28 (oop.inline.hpp:196) V [libjvm.so+0xd9e66d] G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x9d (g1ParScanThreadState.cpp:480) V [libjvm.so+0xda43db] void G1ParScanThreadState::do_oop_evac<narrowOop>(narrowOop*)+0x12b (g1ParScanThreadState.cpp:221) V [libjvm.so+0xd9fa98] G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x488 (g1ParScanThreadState.cpp:334) V [libjvm.so+0xdb96fa] G1ParScanThreadState::trim_queue_partially()+0x3a (g1ParScanThreadState.inline.hpp:53) V [libjvm.so+0xdbe943] void G1ScanHRForRegionClosure::ChunkScanner::on_dirty_cards<G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}>(G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}&&)+0x2c3 (g1RemSet.cpp:519) V [libjvm.so+0xdb6f39] G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x5e9 (g1RemSet.cpp:664) V [libjvm.so+0xdea89b] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b (g1YoungCollector.cpp:670) V [libjvm.so+0xdeac6a] G1EvacuateRegionsBaseTask::work(unsigned int)+0x8a (g1YoungCollector.cpp:656) V [libjvm.so+0x1a2b480] WorkerThread::run()+0x80 (workerThread.cpp:70) V [libjvm.so+0x18cc40a] Thread::call_run()+0xba (thread.cpp:245) V [libjvm.so+0x159e0f9] thread_native_entry(Thread*)+0x1c9 (os_linux.cpp:858) siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000010e I'll run tests on aarch64 as well and will report back when I get the results. Thanks - Ioi On 10/9/24 8:23 AM, ioi.lam@oracle.com wrote:
Thanks Andrew.
I'll test your merge on our CI and then push to premain.
- Ioi
On 10/9/24 7:46 AM, Andrew Dinn wrote:
Hi Ioi/Vladimir,
I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch.
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!...
The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run.
The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped.
I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx.
regards,
Andrew Dinn -----------
On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
I'm also seeing failures on macos-aarch64 and linux-aarch64. In addition to the tests I listed below, these two also fail reliably runtime/cds/appcds/applications/MicronautFirstApp.java#leyden runtime/cds/appcds/applications/MicronautFirstApp.java#leyden_old - Ioi On 10/9/24 9:02 AM, ioi.lam@oracle.com wrote:
I am seeing some failures on linux/x64 with various assertion failures in the GC code, either during collection or gc verification. I merged up to d504df0e3d7928f0e7ff10ace03e002953c6a58f from Andrew's repo.
runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden_old runtime/cds/appcds/applications/JavacBench.java#leyden runtime/cds/appcds/applications/JavacBench.java#leyden_old runtime/cds/appcds/applications/SpringPetClinic.java#leyden_old runtime/cds/appcds/leyden/ExcludedClasses.java
A sample stack trace
# SIGSEGV (0xb) at pc=0x00007ff1d9260f78, pid=472145, tid=472252 V [libjvm.so+0x636f78] oopDesc::size_given_klass(Klass*)+0x28 (oop.inline.hpp:196) V [libjvm.so+0xd9e66d] G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x9d (g1ParScanThreadState.cpp:480) V [libjvm.so+0xda43db] void G1ParScanThreadState::do_oop_evac<narrowOop>(narrowOop*)+0x12b (g1ParScanThreadState.cpp:221) V [libjvm.so+0xd9fa98] G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x488 (g1ParScanThreadState.cpp:334) V [libjvm.so+0xdb96fa] G1ParScanThreadState::trim_queue_partially()+0x3a (g1ParScanThreadState.inline.hpp:53) V [libjvm.so+0xdbe943] void G1ScanHRForRegionClosure::ChunkScanner::on_dirty_cards<G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}>(G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}&&)+0x2c3 (g1RemSet.cpp:519) V [libjvm.so+0xdb6f39] G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x5e9 (g1RemSet.cpp:664) V [libjvm.so+0xdea89b] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b (g1YoungCollector.cpp:670) V [libjvm.so+0xdeac6a] G1EvacuateRegionsBaseTask::work(unsigned int)+0x8a (g1YoungCollector.cpp:656) V [libjvm.so+0x1a2b480] WorkerThread::run()+0x80 (workerThread.cpp:70) V [libjvm.so+0x18cc40a] Thread::call_run()+0xba (thread.cpp:245) V [libjvm.so+0x159e0f9] thread_native_entry(Thread*)+0x1c9 (os_linux.cpp:858)
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000010e
I'll run tests on aarch64 as well and will report back when I get the results.
Thanks
- Ioi
On 10/9/24 8:23 AM, ioi.lam@oracle.com wrote:
Thanks Andrew.
I'll test your merge on our CI and then push to premain.
- Ioi
On 10/9/24 7:46 AM, Andrew Dinn wrote:
Hi Ioi/Vladimir,
I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch.
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!...
The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run.
The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped.
I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx.
regards,
Andrew Dinn -----------
On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
On 09/10/2024 22:23, ioi.lam@oracle.com wrote:
I'm also seeing failures on macos-aarch64 and linux-aarch64. In addition to the tests I listed below, these two also fail reliably
runtime/cds/appcds/applications/MicronautFirstApp.java#leyden runtime/cds/appcds/applications/MicronautFirstApp.java#leyden_old
I'll see if I can debug these failures. regards, Andrew Dinn -----------
On 10/9/24 9:02 AM, ioi.lam@oracle.com wrote:
I am seeing some failures on linux/x64 with various assertion failures in the GC code, either during collection or gc verification. I merged up to d504df0e3d7928f0e7ff10ace03e002953c6a58f from Andrew's repo.
runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden_old runtime/cds/appcds/applications/JavacBench.java#leyden runtime/cds/appcds/applications/JavacBench.java#leyden_old runtime/cds/appcds/applications/SpringPetClinic.java#leyden_old runtime/cds/appcds/leyden/ExcludedClasses.java
A sample stack trace
# SIGSEGV (0xb) at pc=0x00007ff1d9260f78, pid=472145, tid=472252 V [libjvm.so+0x636f78] oopDesc::size_given_klass(Klass*)+0x28 (oop.inline.hpp:196) V [libjvm.so+0xd9e66d] G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x9d (g1ParScanThreadState.cpp:480) V [libjvm.so+0xda43db] void G1ParScanThreadState::do_oop_evac<narrowOop>(narrowOop*)+0x12b (g1ParScanThreadState.cpp:221) V [libjvm.so+0xd9fa98] G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x488 (g1ParScanThreadState.cpp:334) V [libjvm.so+0xdb96fa] G1ParScanThreadState::trim_queue_partially()+0x3a (g1ParScanThreadState.inline.hpp:53) V [libjvm.so+0xdbe943] void G1ScanHRForRegionClosure::ChunkScanner::on_dirty_cards<G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}>(G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}&&)+0x2c3 (g1RemSet.cpp:519) V [libjvm.so+0xdb6f39] G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x5e9 (g1RemSet.cpp:664) V [libjvm.so+0xdea89b] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b (g1YoungCollector.cpp:670) V [libjvm.so+0xdeac6a] G1EvacuateRegionsBaseTask::work(unsigned int)+0x8a (g1YoungCollector.cpp:656) V [libjvm.so+0x1a2b480] WorkerThread::run()+0x80 (workerThread.cpp:70) V [libjvm.so+0x18cc40a] Thread::call_run()+0xba (thread.cpp:245) V [libjvm.so+0x159e0f9] thread_native_entry(Thread*)+0x1c9 (os_linux.cpp:858)
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000010e
I'll run tests on aarch64 as well and will report back when I get the results.
Thanks
- Ioi
On 10/9/24 8:23 AM, ioi.lam@oracle.com wrote:
Thanks Andrew.
I'll test your merge on our CI and then push to premain.
- Ioi
On 10/9/24 7:46 AM, Andrew Dinn wrote:
Hi Ioi/Vladimir,
I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch.
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!... The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run.
The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped.
I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx.
regards,
Andrew Dinn -----------
On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
-- regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill
On 10/10/2024 09:46, Andrew Dinn wrote:
On 09/10/2024 22:23, ioi.lam@oracle.com wrote:
I'm also seeing failures on macos-aarch64 and linux-aarch64. In addition to the tests I listed below, these two also fail reliably
runtime/cds/appcds/applications/MicronautFirstApp.java#leyden runtime/cds/appcds/applications/MicronautFirstApp.java#leyden_old
I'll see if I can debug these failures.
I have been debugging MicronautFirstApp.java#leyden on aarch64 and found a few details of what is going wrong but no fix yet. The error happens in the production VM run. It is bailing out with a SEGV at a pc that is at a mapped address but does not identify real code. The problem appears consistently but not always in the same place. I have been able to identify some details of what is happening under calls to ConcurrentHashMap$ValueIterator::next or ReentrantLock$NonfairSync::initialTryLock. It also crashes in interpreted code but in that case I have not been yet able to decipher the stack and identify the method. When the crash happens it is under compiled code that has been loaded from the AOT cache. The methods cited above are actually the ones whose return address is on the stack when the SEGV happens. The preceding instruction in both cases is a branch to a stub that links to another method pulled from the cache. So, for ConcurrentHashMap$ValueIterator::next the stub transfers control to ConcurrentHashMap$Traverser::advance. The frame that sits above the return address and frame pointer certainly looks like we are in method advance (the frame is certainly the right size) but somehow we have branched off into hyperspace and hit a SEGV. I assumed this might relate to the GC change I made to the post barrier to indirectly load grain and card shift from the AOTConstants area. The called methods both included object writes and gc post barriers. So, I reverted the GC Barrier code to use hard wired shifts. The error continued to happen in the same places. So, I am leaning towards the idea that this is some other conflict with the GC changes to implement late barriers. I'll keep on debugging and post when I make progress. Perhaps Vladimir or Ashu could check the merge and also the tweak I made to the x86 barrier code to make the register pushes safe. regards, Andrew Dinn -----------
On 10/9/24 9:02 AM, ioi.lam@oracle.com wrote:
I am seeing some failures on linux/x64 with various assertion failures in the GC code, either during collection or gc verification. I merged up to ReentrantLock$NonfairSync::initialTryLocku d504df0e3d7928f0e7ff10ace03e002953c6a58f from Andrew's repo.
runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden runtime/cds/appcds/applications/HelidonQuickStartSE.java#leyden_old runtime/cds/appcds/applications/JavacBench.java#leyden runtime/cds/appcds/applications/JavacBench.java#leyden_old runtime/cds/appcds/applications/SpringPetClinic.java#leyden_old runtime/cds/appcds/leyden/ExcludedClasses.java
A sample stack trace
# SIGSEGV (0xb) at pc=0x00007ff1d9260f78, pid=472145, tid=472252 V [libjvm.so+0x636f78] oopDesc::size_given_klass(Klass*)+0x28 (oop.inline.hpp:196) V [libjvm.so+0xd9e66d] G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x9d (g1ParScanThreadState.cpp:480) V [libjvm.so+0xda43db] void G1ParScanThreadState::do_oop_evac<narrowOop>(narrowOop*)+0x12b (g1ParScanThreadState.cpp:221) V [libjvm.so+0xd9fa98] G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x488 (g1ParScanThreadState.cpp:334) V [libjvm.so+0xdb96fa] G1ParScanThreadState::trim_queue_partially()+0x3a (g1ParScanThreadState.inline.hpp:53) V [libjvm.so+0xdbe943] void G1ScanHRForRegionClosure::ChunkScanner::on_dirty_cards<G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}>(G1ScanHRForRegionClosure::scan_heap_roots(G1HeapRegion*)::{lambda(unsigned char*, unsigned char*)#1}&&)+0x2c3 (g1RemSet.cpp:519) V [libjvm.so+0xdb6f39] G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x5e9 (g1RemSet.cpp:664) V [libjvm.so+0xdea89b] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b (g1YoungCollector.cpp:670) V [libjvm.so+0xdeac6a] G1EvacuateRegionsBaseTask::work(unsigned int)+0x8a (g1YoungCollector.cpp:656) V [libjvm.so+0x1a2b480] WorkerThread::run()+0x80 (workerThread.cpp:70) V [libjvm.so+0x18cc40a] Thread::call_run()+0xba (thread.cpp:245) V [libjvm.so+0x159e0f9] thread_native_entry(Thread*)+0x1c9 (os_linux.cpp:858)
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000010e
I'll run tests on aarch64 as well and will report back when I get the results.
Thanks
- Ioi
On 10/9/24 8:23 AM, ioi.lam@oracle.com wrote:
Thanks Andrew.
I'll test your merge on our CI and then push to premain.
- Ioi
On 10/9/24 7:46 AM, Andrew Dinn wrote:
Hi Ioi/Vladimir,
I merged the rest of master into premain, fixed the conflicts and pushed the result to my premain branch.
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!... The merged code built and ran ok on aarch64. On x86 it built ok but crashed during the assembly run.
The problem was that the x86 post barrier can now be called from C2 with tmp and tmp2 allocated by the register allocator. So, that means both rcx and rscratch1 are possible values for register tmp. This is a problem because the x86 AOT variant of the barrier code pushes rcx and rscratch1 to the stack, loads the region grain shift address in to rscratch1 and the grain shift into into rcx. It has to be able to use rcx in order to be able to execute shrptr(tmp). However, if tmp == rcx then the unshifted value is restored into tmp when rcx is popped.
I pushed a follow-up patch to my premain that avoids this problem. It uses a pushed register to do the shift and ensures that register differs from tmp and rcx.
regards,
Andrew Dinn -----------
On 09/10/2024 07:08, ioi.lam@oracle.com wrote:
In the premain branch, we have made some changes in g1BarrierSetC2.cpp
commit cc111760a0dd123935e039feb19eaac742f865b2 Author: Vladimir Kozlov <vladimir.kozlov@oracle.com> Date: Mon Aug 26 14:21:28 2024
Cleanup AOT runtime constants code
commit c319a3ee2be0236feab88c9be66d71e6e93b53fc Author: Andrew Dinn <adinn@redhat.com> Date: Wed Aug 21 07:48:22 2024
AOT G1 barrier loads region grain and card shifts via runtime constants data area
Recently, the following changes in g1BarrierSetC2.cpp have been pushed to mainline that caused conflicts with the leyden changes.
commit 81ebbb2463df8b014bb209dc4028668fc78e8327 Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Mon Oct 7 01:28:18 2024
8341525: G1: use bit clearing to remove tightly-coupled initialization store pre-barriers
commit 0b467e902d591ae9feeec1669918d1588987cd1c Author: Roberto Castañeda Lozano <rcastanedalo@openjdk.org> Date: Thu Oct 3 01:36:33 2024
8334060: Implementation of Late Barrier Expansion for G1
When I tried merging with mainline, I couldn't merge these changes, so I ended up merging to this one (just an hour prior to 0b467e9), and pushed the result to premain.
commit c6e7e551928c04b74775b5d4c03eb31232aeb2c9 (master) Author: Aleksey Shipilev <shade@openjdk.org> Date: Thu Oct 3 00:25:42 2024
8341091: CDS: Segmented roots array misses roots
Andrew Dinn or Vladimir Kozlov, could you help merging with mainline?
Thanks
- Ioi
Hi Ioi, I believe I have worked out what was causing the aarch64 code to break. A call from archived C2 code to G1BarrierSetRuntime::write_ref_field_pre_entry was not being relocated and so we ended up jumping to an address in the old VM 0xffff76c77578 rather than the correct address in the new VM 0xfffff4c77578. This was happening because the C2 barrier code was refactored to use some new helper methods and one of them, generate_c2_barrier_runtime_call, failed to load its branch target using a RuntimeAddress wrapper static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, rthread); __ mov(rscratch1, runtime_path); __ blr(rscratch1); } The mov to rscratch1 above needs to be replaced with __ lea(rscratch1, RuntimeAddress(runtime_path)); I have pushed a patch to the branch in my repo and with this patch the ExcludedClasses, JavacBench, MicronautFirstApp, HelidonQuickStart and SpringPetclinic tests all passed. I am not sure what is breaking x86 but it is not the same problem -- the x86 implementation of generate_c2_barrier_runtime_call is defined correctly as follows: static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { #ifdef _LP64 SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, r15_thread); // rax is a caller-saved, non-argument-passing register, so it does not // interfere with c_rarg0 or c_rarg1. If it contained any live value before // entering this stub, it is saved at this point, and restored after the // call. If it did not contain any live value, it is free to be used. In // either case, it is safe to use it here as a call scratch register. __ call(RuntimeAddress(runtime_path), rax); #else Unimplemented(); #endif // _LP64 } I will move on now to debugging the problems on x86. regards, Andrew Dinn -----------
Hi Ioi, I think I know now have bot aarch64 and x86 code working. The tweak I made after the merge to Ashu's code in G1BarrierSetAssembler::generate_post_barrier_fast_path that selects which registers to be pushed and used as scratch registers was incomplete -- it failed to allow for aliasing of a pushed scratch register to new_val and store_addr. I pushed a follow-up patch to my premain repo which fixes this: https://github.com/adinn/leyden/commit/88b6b4b2770cc976b82a5f4c35fd87c94b947... All the tests that you flagged as failing are now passing with this extra commit. Could you merge that change from here and rerun tests: https://github.com/adinn/leyden/tree/premain regards, Andrew Dinn ----------- On 14/10/2024 15:49, Andrew Dinn wrote:
Hi Ioi,
I believe I have worked out what was causing the aarch64 code to break. A call from archived C2 code to G1BarrierSetRuntime::write_ref_field_pre_entry was not being relocated and so we ended up jumping to an address in the old VM 0xffff76c77578 rather than the correct address in the new VM 0xfffff4c77578.
This was happening because the C2 barrier code was refactored to use some new helper methods and one of them, generate_c2_barrier_runtime_call, failed to load its branch target using a RuntimeAddress wrapper
static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, rthread); __ mov(rscratch1, runtime_path); __ blr(rscratch1); }
The mov to rscratch1 above needs to be replaced with
__ lea(rscratch1, RuntimeAddress(runtime_path));
I have pushed a patch to the branch in my repo and with this patch the ExcludedClasses, JavacBench, MicronautFirstApp, HelidonQuickStart and SpringPetclinic tests all passed.
I am not sure what is breaking x86 but it is not the same problem -- the x86 implementation of generate_c2_barrier_runtime_call is defined correctly as follows:
static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { #ifdef _LP64 SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, r15_thread); // rax is a caller-saved, non-argument-passing register, so it does not // interfere with c_rarg0 or c_rarg1. If it contained any live value before // entering this stub, it is saved at this point, and restored after the // call. If it did not contain any live value, it is free to be used. In // either case, it is safe to use it here as a call scratch register. __ call(RuntimeAddress(runtime_path), rax); #else Unimplemented(); #endif // _LP64 }
I will move on now to debugging the problems on x86.
regards,
Andrew Dinn -----------
Hi Andrew, Thanks for making the fixes. I ran tiers 1-3 in our test pipeline and don't see any regression any more. All leyden tests passed. I've merged your commits into the premain branch. Thanks - Ioi On 10/15/24 9:14 AM, Andrew Dinn wrote:
Hi Ioi,
I think I know now have bot aarch64 and x86 code working. The tweak I made after the merge to Ashu's code in G1BarrierSetAssembler::generate_post_barrier_fast_path that selects which registers to be pushed and used as scratch registers was incomplete -- it failed to allow for aliasing of a pushed scratch register to new_val and store_addr.
I pushed a follow-up patch to my premain repo which fixes this:
https://urldefense.com/v3/__https://github.com/adinn/leyden/commit/88b6b4b27...
All the tests that you flagged as failing are now passing with this extra commit.
Could you merge that change from here and rerun tests:
https://urldefense.com/v3/__https://github.com/adinn/leyden/tree/premain__;!...
regards,
Andrew Dinn -----------
On 14/10/2024 15:49, Andrew Dinn wrote:
Hi Ioi,
I believe I have worked out what was causing the aarch64 code to break. A call from archived C2 code to G1BarrierSetRuntime::write_ref_field_pre_entry was not being relocated and so we ended up jumping to an address in the old VM 0xffff76c77578 rather than the correct address in the new VM 0xfffff4c77578.
This was happening because the C2 barrier code was refactored to use some new helper methods and one of them, generate_c2_barrier_runtime_call, failed to load its branch target using a RuntimeAddress wrapper
static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, rthread); __ mov(rscratch1, runtime_path); __ blr(rscratch1); }
The mov to rscratch1 above needs to be replaced with
__ lea(rscratch1, RuntimeAddress(runtime_path));
I have pushed a patch to the branch in my repo and with this patch the ExcludedClasses, JavacBench, MicronautFirstApp, HelidonQuickStart and SpringPetclinic tests all passed.
I am not sure what is breaking x86 but it is not the same problem -- the x86 implementation of generate_c2_barrier_runtime_call is defined correctly as follows:
static void generate_c2_barrier_runtime_call(MacroAssembler* masm, G1BarrierStubC2* stub, const Register arg, const address runtime_path) { #ifdef _LP64 SaveLiveRegisters save_registers(masm, stub); if (c_rarg0 != arg) { __ mov(c_rarg0, arg); } __ mov(c_rarg1, r15_thread); // rax is a caller-saved, non-argument-passing register, so it does not // interfere with c_rarg0 or c_rarg1. If it contained any live value before // entering this stub, it is saved at this point, and restored after the // call. If it did not contain any live value, it is free to be used. In // either case, it is safe to use it here as a call scratch register. __ call(RuntimeAddress(runtime_path), rax); #else Unimplemented(); #endif // _LP64 }
I will move on now to debugging the problems on x86.
regards,
Andrew Dinn -----------
participants (2)
-
Andrew Dinn
-
ioi.lam@oracle.com