[lworld] RFR: 8366717: [lworld] Cleanup defensive fixing of JDK-8365996
Marc Chevalier
mchevalier at openjdk.org
Thu Dec 18 08:35:22 UTC 2025
Let's proceed piece by piece.
# `G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_aarch64.cpp`
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp#L216-L220
`push_call_clobbered_registers`/`pop_call_clobbered_registers` should be enough around a runtime call, that's what clobbered registers are.
But let's dig a bit, that will be useful later!
push_call_clobbered_registers()
=> push_call_clobbered_registers_except(exclude = empty set)
=> push(call_clobbered_gp_registers() - exclude, sp) // with exclude = empty set
So, we save at least `call_clobbered_gp_registers` which is defined as:
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L3614-L3620
So, we save r0-r7 and r10-r17
# `CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_aarch64.cpp`
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/gc/shared/cardTableBarrierSetAssembler_aarch64.cpp#L116-L124
Digging the history looks like it still makes sense, it doesn't look like an accident. If we remove the if-branch, tests are totally on fire. Let's keep it.
# `CardTableBarrierSetAssembler::oop_store_at` in `cardTableBarrierSetAssembler_x86.cpp`
Same as before. But it's not even guarantee that `InlineTypePassFieldsAsArgs` is true. We can have such a backtrace:
CardTableBarrierSetAssembler::oop_store_at (cardTableBarrierSetAssembler_x86.cpp:184)
CardTableBarrierSetAssembler::store_at (cardTableBarrierSetAssembler_x86.cpp:90)
MacroAssembler::store_heap_oop (macroAssembler_x86.cpp:5515)
do_oop_store (templateTable_x86.cpp:148)
(called, for instance, from TemplateTable::putfield_or_static_helper (templateTable_x86.cpp:2964))
where the last give `r8` for `tmp3`. It is not quite clear to me why we don't have a problem in mainline, because it looks like corrupting address base register is a bad idea given that we use it after.
# `MacroAssembler::unpack_inline_helper` in `macroAssembler_aarch64.cpp`
## First part
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L7163-L7166
Yes, these registers are saved, as we saw above! I've added some asserts to make sure that they are still in the `call_clobbered_gp_registers()` set. But it seems a bit redundant: that's what clobber are, registers that we cannot trust to be kept. If we use a register not in this set, the native callee shouldn't touch it. Better safe than sorry?
## Second Part
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L7269-L7271
This is fine.
Indeed, we might reset the stream just before, but then we exhaust it in the loop.
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L7251-L7262
This is already what we do before:
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L7182-L7183
(and this loop has no break).
So entering the if and running the second reset will just leave the stream in the same state.
About this piece, I also took the liberty of changing a tiny bit the ctor of `ScalarizedInlineArgsStream` as this class was not correct if `step` is not 1 or -1. We could also have fixed it by changing the line 79 in
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/share/runtime/signature_cc.hpp#L75-L81
with `_depth = abs(_step)`, but that looks like a recipe for a less explicit constructor, and more obscure code for a feature that is not desirable (skipping elements).
# `gen_c2i_adapter` in `sharedRuntime_aarch64.cpp`
## First part
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L623-L633
After some digging, the comment about r13 doesn't seem correct to me. I've changed it.
The second part is again about trusting `push_call_clobbered_registers` to push enough things. I've added some asserts again.
## Second part
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp#L645-L647
Yes, but we save too much. `save_live_registers` calls `push_CPU_state(_save_vectors, ...)` defined there:
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L3649-L3673
It will always save registers r0-r29, but also the floating point registers. The difference is that with `save_vectors`, we save the whole vectors, while without `save_vectors`, we save only 1 word (8 bytes) per float register (4 words for 4 registers). So, the question is whether we use more that the lower word of each register for arguments, and... I'm not sure. We can keep saving them as it consumes only a bit of stack space, but it runs the same amount of instructions as not saving them.
But I said we save too much, that's because we will save v0-v31 (or pieces of) anyway, while `c_farg*` and `j_farg*` are only the registers v0-v7. If I understand well, in this context (we are generating an adapter), around a call (for instance to allocate a buffer) we need to save only the registers that are relevant for the calling convention, like the ones that can hold arguments. If the register v25 (for instance) contained something useful for the caller, it should have been saved by the caller before doing the call, and so before entering the adapter. Since v25 is not used by the adapter or the calling convention, we don't need to save it. Is it worth refining `save_live_registers`?
# `gen_c2i_adapter` in `sharedRuntime_x86_64.cpp`
The x86 version of the last thing:
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L918-L921
`/*save_vectors*/` is actually `save_wide_vectors`. `save_live_registers` calls `push_FPU_state` anyway which does `fxsave`, which saves XMM0-XMM15. If `save_wide_vectors`, then `save_live_registers` makes sure to save all of ZMM (the rest of ZMM0-ZMM15, as well as the others ZMM). `c_farg*` and `j_farg*` are defined as xmm0-xmm7. XMM registers are already 128 bits (that's 16 bytes), so if we are not using the rest of zmm registers for arguments, it should be fine. Unlike on Aarch64, saving setting `save_wide_vectors` to `true` would create a lot more instructions: only one `fxsave` is enough for the xmm0-15, but the rest must be saved a lot more manually, once piece at a time.
While we probably are good with "only" 16 bytes saved, we need to save only what is used in the calling convention and in the adapter, so my guess would be that we actually only need to save xmm0-xmm7. I'm not sure it's worth improving `save_live_registers` since saving xmm0-xmm15 is done in a single instruction, we would trade a little of stack space for quite a lot more instructions. Maybe it's just fine to over-save.
# `G1BarrierSetAssembler::g1_write_barrier_pre` in `g1BarrierSetAssembler_x86`
https://github.com/openjdk/valhalla/blob/1077e4f9f06336af30d95fc28cbab4d31c9f2a44/src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp#L238-L255
This branch was introduced in 4c3231c "Merge tag 'jdk-20+8' into lworld" on 2022-08-31.
The non-Valhalla side:
- was introduced in 9583e36 (8284161: Implementation of Virtual Threads (Preview)) on 2022-05-07
- to replace what a `push_set` from eb4849e (8283327: Add methods to save/restore registers when calling into the VM from C1/interpreter barrier code) on 2022-03-21
- that was itself replacing a few `push` from 2a0986b8 (8199417: Modularize interpreter GC barriers) on 2018-04-11 (which introduced the whole code)
The Valhalla side:
- saving the `j_farg*` was added in 2455c8ed (8251398: [lworld] TestCallingConvention::test36 spuriously fails due to incorrect field value) on 2020-08-11
- the `pusha` comes from d9f3aaa9 (8242210: [lworld] TestCallingConvention::test36 spuriously fails) on 2020-04-09
- which placed the code from 2a0986b8 (8199417: Modularize interpreter GC barriers) mentioned before.
So it seems the merge jdk was done in a very conservative way, keeping both sides that evolved separately. Yet, the Valhalla side is probably overkill.
push_call_clobbered_registers(save_fpu = true)
=> push_call_clobbered_registers(exlude = empty set, save_fpu = true)
=> push_set(call_clobbered_xmm_registers(), gp_area_size) if save_fpu
`call_clobbered_xmm_registers()` is all the xmm on all os, except on windows where it's xmm0-xmm5 + xmm16-xmm_max. That covers the `c_farg*` (since on windows it only goes from 0 to 3), but it doesn't cover the `j_farg*` (that are from 0 to 7, even on windows). But on Windows, only xmm0-xmm5 are considered volatile (clobbers). Since we are calling a native function, it should be fine (that is `call_clobbered_xmm_registers()` and its comment are correct).
Thanks,
Marc
-------------
Commit messages:
- Merge remote-tracking branch 'origin/lworld' into JDK-8366717.investigate
- oop_store_at
- assert in oop_store_at
- Cleanup
- Experimenting g1_write_barrier_pre
- gen_c2i_adapter
- unpack_inline_helper
- Actually no
- Try the thing in oop_store_at
- Asserting
- ... and 2 more: https://git.openjdk.org/valhalla/compare/619ef43b...3f013c81
Changes: https://git.openjdk.org/valhalla/pull/1824/files
Webrev: https://webrevs.openjdk.org/?repo=valhalla&pr=1824&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8366717
Stats: 72 lines in 9 files changed: 17 ins; 39 del; 16 mod
Patch: https://git.openjdk.org/valhalla/pull/1824.diff
Fetch: git fetch https://git.openjdk.org/valhalla.git pull/1824/head:pull/1824
PR: https://git.openjdk.org/valhalla/pull/1824
More information about the valhalla-dev
mailing list