RFR: 8322484: 22-b26 Regression in J2dBench-bimg_misc-G1 (and more) on Windows-x64 and macOS-x64
Thomas Schatzl
tschatzl at openjdk.org
Thu Jan 25 16:26:42 UTC 2024
On Thu, 25 Jan 2024 14:23:29 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:
>> Hi all,
>>
>> please review this improvement to managing region pin counts in g1.
>>
>> Some applications do millions of `Get/ReleasePrimitiveArrayCritical` operations per second, in particular some of the j2dbench benchmarks (e.g. vimg_copyarea 10M/s), on some platforms. Every pin/unpin results in an atomic operation that is the cause for these slowdown.
>>
>> Only Windows seems to be significantly affected.
>>
>> This suggested change implements a per-region thread local cache storing the current pin/unpin refcount difference, writing it back only when a thread pins/unpin an object in a different region.
>>
>> For these benchmarks this often reduces the amount of atomic operations to none, or a few handful; the worst improvement I have seen is that effective atomic operations were reduced to 1/10th. Overall all the j2dbench benchmark scores improve.
>>
>> There is a remaining issue with the `vimg_shapes_gradient` J2dbench subbenchmark: comparing the original results (before integration of region pinning) with latest jdk23 results, there is a regression of about 5%; this is caused by the backout of a bad compiler change (https://bugs.openjdk.org/browse/JDK-8322985). This will be fixed by its redo CR https://bugs.openjdk.org/browse/JDK-8323116.
>>
>> Testing: tier1-3, j2dbench, dacapo, specj*, renaissance benchmarks
>>
>> Thanks,
>> Thomas
>
> src/hotspot/share/gc/g1/g1YoungCollector.cpp line 544:
>
>> 542: assert(G1ThreadLocalData::pin_count_cache(thread).count() == 0, "must be flushed");
>> 543: }
>> 544: #endif
>
> It's not obvious to see this assert in this context. I wonder if it's cleaner (or more natural) if it's moved to `JavaThreadRetireTLABAndFlushLogs`, where we flush the cache.
It was meant similarly to the call to `_g1h->verifier()->check_region_attribute_table()` call to verify post conditions for that GC phase.
Putting it right after `JavaThreadRetireTLABAndFlushLogs` seems too close to the change to kind of make it kind of redundant - i.e. that task is small enough to understand it at a glance anyway (well, me at least).
Thinking again, it might be most useful to put this into a method (like mentioned `check_region_attribute_table`) and call it there to be able to use it elsewhere during debugging if needed.
Of course, we can also just remove this - what do you think?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17552#discussion_r1466620347
More information about the hotspot-gc-dev
mailing list