[aarch64-port-dev ] RFR: 8242029: AArch64: skip G1 array copy pre-barrier if marking not active
Nick Gasson
nick.gasson at arm.com
Thu Apr 9 08:59:44 UTC 2020
On 04/08/20 20:38 pm, Andrew Haley wrote:
> On 4/8/20 7:22 AM, Nick Gasson wrote:
>> Do you think this is safe and worth doing?
>
> Please forgive me for turning this into a rather extreme thought
> experiment: if we hand-translate all GC runtime methods into all
> targets, we have an NxM problem, #collectors * #targets. So it's hard
> to justify without some heavy usage. And also, it means that if any of
> these runtime methods change, we'd risk falling behind on AArch64.
>
> Can you show us the assembly instructions that we'd save?
So I'm suggesting doing the following:
--- a/src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp
@@ -87,13 +87,43 @@ void G1BarrierSetAssembler::gen_write_ref_array_pre_barrier(MacroAssembler* masm
void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators,
Register start, Register count, Register scratch, RegSet saved_regs) {
- __ push(saved_regs, sp);
- assert_different_registers(start, count, scratch);
+
+ assert_different_registers(start, count, scratch, rscratch1, rscratch2);
assert_different_registers(c_rarg0, count);
+
+ const Register card_addr = scratch;
+ const Register end_card_addr = rscratch1;
+
+ Label skip, slowpath, next;
+
+ __ cbz(count, skip);
+
+ __ lsr(card_addr, start, CardTable::card_shift);
+
+ __ lea(end_card_addr, Address(start, count, Address::lsl(LogBytesPerHeapOop)));
+ __ lsr(end_card_addr, end_card_addr, CardTable::card_shift);
+
+ __ load_byte_map_base(rscratch2);
+ __ add(card_addr, card_addr, rscratch2);
+ __ add(end_card_addr, end_card_addr, rscratch2);
+
+ __ bind(next);
+ __ ldrb(rscratch2, Address(card_addr));
+ __ cmpw(rscratch2, (int)G1CardTable::g1_young_card_val());
+ __ br(Assembler::NE, slowpath);
+ __ cmp(card_addr, end_card_addr);
+ __ br(Assembler::EQ, skip);
+ __ add(card_addr, card_addr, 1);
+ __ b(next);
+
+ __ bind(slowpath);
+ __ push(saved_regs, sp);
__ mov(c_rarg0, start);
__ mov(c_rarg1, count);
__ call_VM_leaf(CAST_FROM_FN_PTR(address, G1BarrierSetRuntime::write_ref_array_post_entry), 2);
__ pop(saved_regs, sp);
+
+ __ bind(skip);
}
(Add change the call sites to not pass rscratch1 as scratch.)
It has a nice speedup on the ArrayCopy microbenchmarks, but I agree this
sort of thing is a maintenance burden if it doesn't affect real
workloads.
With JDK-8242029:
Benchmark Mode Cnt Score Error Units
ArrayCopy.arrayCopyObject avgt 15 82.314 ? 0.641 ns/op
ArrayCopy.arrayCopyObjectNonConst avgt 15 87.351 ? 6.820 ns/op
ArrayCopy.arrayCopyObjectSameArraysBackward avgt 15 54.272 ? 1.445 ns/op
ArrayCopy.arrayCopyObjectSameArraysForward avgt 15 54.596 ? 1.329 ns/op
With the above modification:
Benchmark Mode Cnt Score Error Units
ArrayCopy.arrayCopyObject avgt 15 58.913 ? 1.265 ns/op
ArrayCopy.arrayCopyObjectNonConst avgt 15 64.682 ? 8.147 ns/op
ArrayCopy.arrayCopyObjectSameArraysBackward avgt 15 36.866 ? 1.319 ns/op
ArrayCopy.arrayCopyObjectSameArraysForward avgt 15 30.445 ? 3.719 ns/op
Thanks,
Nick
More information about the hotspot-gc-dev
mailing list