RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15]

Richard Reingruber rrich at openjdk.org
Mon Oct 9 15:34:05 UTC 2023


On Fri, 6 Oct 2023 12:15:40 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> I find pre-processing card-table removes much complexity in determining which (part of) obj belongs to current stripe. However, synchronizing with actual scavenging introduce some complexity.

The complexity for synchronization is not too bad though. Also it only comes from overlapping card table preprocessing with scavenging. I think this could be removed again without loosing performance.

> The fact that `find_first_clean_card` copies the cached-obj-start is easy to miss

Yes, it is easy to miss. I thought it was a minor detail anyway.

> and hard to reason IMO.

It could be passed by reference if the query in `process_range` would be pulled up before the `find_first_clean_card` call. Let me know if you think that was better.

> > we would have a read only copy of the card table only for the current stripe.
> 
> It would still require pre-processing card-table, right? Otherwise, I don't see how one can work around the "interference" across stripes. Maybe this can simplify the impl of `find_first_clean_card`.

That's correct. The implementation should be straight forward. I think I'll experiment with it.

> 
> I am not too concerned about the regression observed for "large (32K) non-array instances", because that pattern is not common in java and the pause-time is still reasonable (<100ms).

Agreed.

> The long-term optimization (or the redemption of the extra-mem-requirement) I have in mind is to use 1 bit (instead of 1 byte) for a card -- Parallel requires only a boolean info for a particular card. One can even pre-alloc two card-tables now that each card-table is 1/8 of its original size, to avoid calling malloc inside young-gc-pause.
> 
> My preference is some simple code without much regression. Ofc, this is quite subjective.

Sure. My first preference would be that the change can be backported. We were discussing internally if the increased memory consumption could be an issue. Since environments that are sensitive to this either configure serial or g1 we thought it could be ok. At least from our point of view.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1753232330


More information about the hotspot-gc-dev mailing list