RFR: 8308231: Faster MemAllocator::Allocation checks for verify/notification [v3]
Albert Mingkun Yang
ayang at openjdk.org
Mon May 22 21:08:54 UTC 2023
On Mon, 22 May 2023 19:19:01 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> In multi-array allocations benchmarks, there is a hot path through the native VM allocation code, which calls lots of notification methods, even when we would return immediately, because the allocation was satisfied from existing TLAB. Not calling these helper methods from `MemAllocator::Allocation` constructor/destructor looks like an incremental win for the benchmarks.
>>
>> Example on M1:
>>
>>
>> Benchmark (size) Mode Cnt Score Error Units
>>
>> # Before
>> MultiArrayAlloc.full 1 avgt 15 74,053 ± 0,869 ns/op
>> MultiArrayAlloc.full 2 avgt 15 87,800 ± 0,931 ns/op
>> MultiArrayAlloc.full 4 avgt 15 124,814 ± 0,615 ns/op
>> MultiArrayAlloc.full 8 avgt 15 188,562 ± 0,785 ns/op
>> MultiArrayAlloc.full 16 avgt 15 313,007 ± 1,108 ns/op
>> MultiArrayAlloc.full 32 avgt 15 640,276 ± 4,560 ns/op
>> MultiArrayAlloc.full 64 avgt 15 1395,220 ± 5,860 ns/op
>> MultiArrayAlloc.full 128 avgt 15 3417,848 ± 11,345 ns/op
>> MultiArrayAlloc.full 256 avgt 15 9955,360 ± 102,057 ns/op
>> MultiArrayAlloc.full 512 avgt 15 27738,002 ± 244,940 ns/op
>> MultiArrayAlloc.full 1024 avgt 15 147507,008 ± 1434,085 ns/op
>>
>> # After
>> MultiArrayAlloc.full 1 avgt 15 70,434 ± 0,373 ns/op ; 5% better
>> MultiArrayAlloc.full 2 avgt 15 82,394 ± 0,137 ns/op ; 7% better
>> MultiArrayAlloc.full 4 avgt 15 108,542 ± 0,129 ns/op ; 15% better
>> MultiArrayAlloc.full 8 avgt 15 170,697 ± 4,480 ns/op ; 11% better
>> MultiArrayAlloc.full 16 avgt 15 272,902 ± 0,877 ns/op ; 15% better
>> MultiArrayAlloc.full 32 avgt 15 524,486 ± 1,447 ns/op ; 22% better
>> MultiArrayAlloc.full 64 avgt 15 1088,932 ± 2,739 ns/op ; 17% better
>> MultiArrayAlloc.full 128 avgt 15 3151,144 ± 14,621 ns/op ; 8% better
>> MultiArrayAlloc.full 256 avgt 15 8455,293 ± 12,656 ns/op ; 18% better
>> MultiArrayAlloc.full 512 avgt 15 26060,055 ± 116,524 ns/op ; 6% better
>> MultiArrayAlloc.full 1024 avgt 15 130824,480 ± 831,703 ns/op ; 13% better
>>
>>
>> Additional testing:
>> - [x] Ad-hoc micro-benchmarks
>> - [x] Linux x86_64 fastdebug `serviceability/jvmti`
>> - [x] Linux x86_64 fastdebug `jdk/jfr`
>> - [x] Linux x86_64 fastdebug `t...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
>
> - Reshuffle and simplify
> - Merge branch 'master' into JDK-8308231-memalloc-check-faster
> - Touch up comment
> - Merge branch 'master' into JDK-8308231-memalloc-check-faster
> - Hide more stuff
> - Touchups
> - Branch
> - Fix
src/hotspot/share/gc/shared/memAllocator.cpp line 98:
> 96:
> 97: if ((is_real_allocation || _tlab_end_reset_for_sample) &&
> 98: JvmtiExport::should_post_sampled_object_alloc()) {
The same check is done inside the caller already.
On this note, I think all checks using info outside `class Allocation` should not be present on this level. (Ofc, this is very subjective.)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/14019#discussion_r1201102058
More information about the hotspot-gc-dev
mailing list