RFR: 8310031: Parallel: Implement better work distribution for large object arrays in old gen [v15]

Wed Oct 11 09:36:52 UTC 2023

On Tue, 10 Oct 2023 09:42:00 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

> > Also it only comes from overlapping card table preprocessing with scavenging. I think this could be removed again without loosing performance.
> 
> That complexity is uncalled for if its benefit is marginal.

I'll remove it.

> > It could be passed by reference if the query in process_range would be pulled up before the find_first_clean_card call.
> > The implementation should be straight forward. I think I'll experiment with it.
> 
> Could it be updated to not query object-start? That would remove much complexity inside that method.
> 
> Additionally, I wonder if the scanning-dirty-chunk iteration can be simplified a bit [...]

Probably. I though it would be a good idea (for performance and clearity) to strucure the processing 1. objects reaching in, 2. objects contained in, 3. objects reaching out of the dirty chunk. I found now that it's neither necessary for performance nor is it helping to better understand the code.

I'll push a new version that's supposed to look very much like yours, except it does the card table preprocessing and keeps a shadow copy of the card table entries corresponding to the current stripe on stack (so not malloc'ed).

I think it would be a good base for further enhancements you have on your mind but also good to be backported.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14846#issuecomment-1757252721