RFR: 8342382: Implementation of JEP G1: Improve Application Throughput with a More Efficient Write-Barrier [v30]
Thomas Schatzl
tschatzl at openjdk.org
Thu Apr 10 11:01:42 UTC 2025
On Wed, 9 Apr 2025 12:48:10 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>> src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 101:
>>
>>> 99: }
>>> 100:
>>> 101: void G1BarrierSetAssembler::gen_write_ref_array_post_barrier(MacroAssembler* masm, DecoratorSet decorators,
>>
>> Have you measured the performance impact of inlining this assembly code instead of resorting to a runtime call as done before? Is it worth the maintenance cost (for every platform), risk of introducing bugs, etc.?
>
> I remember significant impact in some microbenchmark. It's also inlined in Parallel GC. I do not consider it a big issue wrt to maintenance - these things never really change, and the method is small and contained.
> I will try to redo numbers.
>From our microbenchmarks (higher numbers are better):
Current code:
Benchmark (size) Mode Cnt Score Error Units
ArrayCopyObject.conjoint_micro 31 thrpt 15 166136.959 ± 5517.157 ops/ms
ArrayCopyObject.conjoint_micro 63 thrpt 15 108880.108 ± 4331.112 ops/ms
ArrayCopyObject.conjoint_micro 127 thrpt 15 93159.977 ± 5025.458 ops/ms
ArrayCopyObject.conjoint_micro 2047 thrpt 15 17234.842 ± 831.344 ops/ms
ArrayCopyObject.conjoint_micro 4095 thrpt 15 9202.216 ± 292.612 ops/ms
ArrayCopyObject.conjoint_micro 8191 thrpt 15 3565.705 ± 121.116 ops/ms
ArrayCopyObject.disjoint_micro 31 thrpt 15 159106.245 ± 5965.576 ops/ms
ArrayCopyObject.disjoint_micro 63 thrpt 15 95475.658 ± 5415.267 ops/ms
ArrayCopyObject.disjoint_micro 127 thrpt 15 84249.979 ± 6313.007 ops/ms
ArrayCopyObject.disjoint_micro 2047 thrpt 15 10682.650 ± 381.832 ops/ms
ArrayCopyObject.disjoint_micro 4095 thrpt 15 4471.940 ± 216.439 ops/ms
ArrayCopyObject.disjoint_micro 8191 thrpt 15 1378.296 ± 33.421 ops/ms
ArrayCopy.arrayCopyObject N/A avgt 15 13.880 ± 0.517 ns/op
ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 14.844 ± 0.751 ns/op
ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 11.080 ± 0.703 ns/op
ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 11.003 ± 0.135 ns/op
Runtime call:
Benchmark (size) Mode Cnt Score Error Units
ArrayCopyObject.conjoint_micro 31 thrpt 15 73100.230 ± 11079.381 ops/ms
ArrayCopyObject.conjoint_micro 63 thrpt 15 65039.431 ± 1996.832 ops/ms
ArrayCopyObject.conjoint_micro 127 thrpt 15 58336.711 ± 2260.660 ops/ms
ArrayCopyObject.conjoint_micro 2047 thrpt 15 17035.419 ± 524.445 ops/ms
ArrayCopyObject.conjoint_micro 4095 thrpt 15 9207.661 ± 286.526 ops/ms
ArrayCopyObject.conjoint_micro 8191 thrpt 15 3264.491 ± 73.848 ops/ms
ArrayCopyObject.disjoint_micro 31 thrpt 15 84587.219 ± 3007.310 ops/ms
ArrayCopyObject.disjoint_micro 63 thrpt 15 62815.254 ± 1214.310 ops/ms
ArrayCopyObject.disjoint_micro 127 thrpt 15 58423.470 ± 285.670 ops/ms
ArrayCopyObject.disjoint_micro 2047 thrpt 15 10720.462 ± 617.173 ops/ms
ArrayCopyObject.disjoint_micro 4095 thrpt 15 4178.195 ± 178.942 ops/ms
ArrayCopyObject.disjoint_micro 8191 thrpt 15 1374.268 ± 44.290 ops/ms
ArrayCopy.arrayCopyObject N/A avgt 15 19.667 ± 0.740 ns/op
ArrayCopy.arrayCopyObjectNonConst N/A avgt 15 21.243 ± 1.891 ns/op
ArrayCopy.arrayCopyObjectSameArraysBackward N/A avgt 15 16.645 ± 0.504 ns/op
ArrayCopy.arrayCopyObjectSameArraysForward N/A avgt 15 17.409 ± 0.705 ns/op
Obviously with larger arrays, the impact diminishes, but it's always there. I think the inlined code is worth the effort in this case.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23739#discussion_r2037086410
More information about the core-libs-dev
mailing list