RFR: 8308231: Faster MemAllocator::Allocation checks for verify/notification [v3]

Aleksey Shipilev shade at openjdk.org
Fri Aug 11 14:45:58 UTC 2023


On Mon, 22 May 2023 19:19:01 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> In multi-array allocations benchmarks, there is a hot path through the native VM allocation code, which calls lots of notification methods, even when we would return immediately, because the allocation was satisfied from existing TLAB. Not calling these helper methods from `MemAllocator::Allocation` constructor/destructor looks like an incremental win for the benchmarks.
>> 
>> Example on M1:
>> 
>> 
>> Benchmark             (size)  Mode  Cnt        Score        Error  Units
>> 
>> # Before
>> MultiArrayAlloc.full       1  avgt   15       74,053 ±      0,869  ns/op
>> MultiArrayAlloc.full       2  avgt   15       87,800 ±      0,931  ns/op
>> MultiArrayAlloc.full       4  avgt   15      124,814 ±      0,615  ns/op
>> MultiArrayAlloc.full       8  avgt   15      188,562 ±      0,785  ns/op
>> MultiArrayAlloc.full      16  avgt   15      313,007 ±      1,108  ns/op
>> MultiArrayAlloc.full      32  avgt   15      640,276 ±      4,560  ns/op
>> MultiArrayAlloc.full      64  avgt   15     1395,220 ±      5,860  ns/op
>> MultiArrayAlloc.full     128  avgt   15     3417,848 ±     11,345  ns/op
>> MultiArrayAlloc.full     256  avgt   15     9955,360 ±    102,057  ns/op
>> MultiArrayAlloc.full     512  avgt   15    27738,002 ±    244,940  ns/op
>> MultiArrayAlloc.full    1024  avgt   15   147507,008 ±   1434,085  ns/op
>> 
>> # After
>> MultiArrayAlloc.full       1  avgt   15       70,434 ±      0,373  ns/op  ;  5% better
>> MultiArrayAlloc.full       2  avgt   15       82,394 ±      0,137  ns/op  ;  7% better
>> MultiArrayAlloc.full       4  avgt   15      108,542 ±      0,129  ns/op  ; 15% better
>> MultiArrayAlloc.full       8  avgt   15      170,697 ±      4,480  ns/op  ; 11% better
>> MultiArrayAlloc.full      16  avgt   15      272,902 ±      0,877  ns/op  ; 15% better
>> MultiArrayAlloc.full      32  avgt   15      524,486 ±      1,447  ns/op  ; 22% better
>> MultiArrayAlloc.full      64  avgt   15     1088,932 ±      2,739  ns/op  ; 17% better
>> MultiArrayAlloc.full     128  avgt   15     3151,144 ±     14,621  ns/op  ;  8% better
>> MultiArrayAlloc.full     256  avgt   15     8455,293 ±     12,656  ns/op  ; 18% better
>> MultiArrayAlloc.full     512  avgt   15    26060,055 ±    116,524  ns/op  ;  6% better
>> MultiArrayAlloc.full    1024  avgt   15   130824,480 ±    831,703  ns/op  ; 13% better
>> 
>> 
>> Additional testing:
>>  - [x] Ad-hoc micro-benchmarks
>>  - [x] Linux x86_64 fastdebug `serviceability/jvmti`
>>  - [x] Linux x86_64 fastdebug `jdk/jfr`
>>  - [x] Linux x86_64 fastdebug `t...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase.

I force-pushed because there were lots of changes in the related code and the merge was exceedingly tedious. The `MultiArrayAlloc` benchmarks improve, I am going to post more thorough benchmark results next week. Meanwhile, @iklam, do you want to give this a spin with [JDK-8310823](https://bugs.openjdk.org/browse/JDK-8310823) prototype? I think it would be sensitive to this change as well.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14019#issuecomment-1674890219


More information about the hotspot-gc-dev mailing list