RFR: 8322484: 22-b26 Regression in J2dBench-bimg_misc-G1 (and more) on Windows-x64 and macOS-x64

Kim Barrett kbarrett at openjdk.org
Thu Jan 25 15:17:37 UTC 2024


On Wed, 24 Jan 2024 12:38:09 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

> Hi all,
> 
>   please review this improvement to managing region pin counts in g1.
> 
> Some applications do millions of `Get/ReleasePrimitiveArrayCritical` operations per second, in particular some of the j2dbench benchmarks (e.g. vimg_copyarea 10M/s), on some platforms. Every pin/unpin results in an atomic operation that is the cause for these slowdown. 
> 
> Only Windows seems to be significantly affected.
> 
> This suggested change implements a per-region thread local cache storing the current pin/unpin refcount difference, writing it back only when a thread pins/unpin an object in a different region.
> 
> For these benchmarks this often reduces the amount of atomic operations to none, or a few handful; the worst improvement I have seen is that effective atomic operations were reduced to 1/10th. Overall all the j2dbench benchmark scores improve.
> 
> There is a remaining issue with the `vimg_shapes_gradient` J2dbench subbenchmark: comparing the original results (before integration of region pinning) with latest jdk23 results, there is a regression of about 5%; this is caused by the backout of a bad compiler change (https://bugs.openjdk.org/browse/JDK-8322985). This will be fixed by its redo CR https://bugs.openjdk.org/browse/JDK-8323116.
> 
> Testing: tier1-3, j2dbench, dacapo, specj*, renaissance benchmarks
> 
> Thanks,
>   Thomas

Changes requested by kbarrett (Reviewer).

src/hotspot/share/gc/g1/g1RegionPinCache.hpp line 31:

> 29: #include "utilities/globalDefinitions.hpp"
> 30: 
> 31: // Holds the pinned object count increment for the given region for a Java thread.

s/increment/adjustment/ ?  Also perhaps s/the pinned/the pending pinned/.

src/hotspot/share/gc/g1/g1RegionPinCache.hpp line 34:

> 32: // I.e. the _count value may actually be negative temporarily if pinning operations
> 33: // were interleaved between two regions.
> 34: class G1RegionPinCache : public StackObj {

A bit more discussion of the rationale would probably make this easier to understand.  That is, this avoids
updating the global pinned count for a region (an atomic operation) in the very common case that a thread
increments and then decrements pinning for an object (so its containing region) without any other intervening
pinning operations or garbage collections.

src/hotspot/share/gc/g1/g1RegionPinCache.inline.hpp line 42:

> 40: inline void G1RegionPinCache::dec_count(uint region_idx) {
> 41:   if (region_idx == _region_idx) {
> 42:     --_count;

I think _count should never underflow here.  Consider adding an assertion.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17552#pullrequestreview-1843949540
PR Review Comment: https://git.openjdk.org/jdk/pull/17552#discussion_r1466514630
PR Review Comment: https://git.openjdk.org/jdk/pull/17552#discussion_r1466523130
PR Review Comment: https://git.openjdk.org/jdk/pull/17552#discussion_r1466512776


More information about the hotspot-gc-dev mailing list