RFC: Parallel deferred updates

Thu Aug 11 13:41:23 UTC 2022

Hi,

We've been running SPECjbb on AWS Graviton3 with ParallelGC and often
see the "Deferred Updates" phase taking 20-25% of the total compaction
time in a full GC cycle.  Here's a typical example:

[305.992s][trace][gc,phases] GC(544) Par Compact 211.060ms
[306.063s][trace][gc,phases] GC(544) Deferred Updates 71.239ms
[306.063s][info ][gc,phases] GC(544) Compaction Phase 282.669ms

The problem seems to be SPECjbb allocates a number of very large object
arrays (between 64kB ~ 2MB) which cross region boundaries and so their
interior oops cannot be updated during the normal parallel compaction
phase.  The updates are then deferred until the end of the GC cycle,
when they are processed serially.  Processing each of these large arrays
can take multiple milliseconds per object, so it seems like a good
candidate for doing in parallel.  AFAIK there is no correctness problem
with this as all the objects have been relocated by that point, and it
has been suggested in the past [1], although not implemented as far as I
can tell.

This patch is a simple proof of concept:

https://github.com/nick-arm/jdk/commit/95e0ad3fb7dec6fcac20e9727b9cdb32821c477f

It improves critical-jOPS by about 1% on AWS c7g.16xlarge (averaged over
10 runs), and the median pause times for full GC drops from 262ms to
203ms.  I ran some other common benchmarks like Dacapo and couldn't see
any obvious regressions.  This patch doesn't fork off the worker task
unless it encounters at least one deferred object: in the relatively
common case where there are no deferred objects it's quicker to zip
through the regions on a single thread.

Does this sound like a reasonable approach?  If so I can create a formal
JBS ticket / PR.

[1] https://markmail.org/message/k6zc3r2ujq5wqy6k

--
Thanks,
Nick