Long pause time of Final Mark in latest 11u shenandoah

Tue Sep 1 02:38:15 UTC 2020

 Hi Aleksey,
> -----Original Message-----
> From: Aleksey Shipilev [mailto:shade at redhat.com]
> Sent: 2020年9月1日 1:00
> To: Liang Mao <maoliang.ml at alibaba-inc.com>; Roman Kennke
> <rkennke at redhat.com>; shenandoah-dev at openjdk.java.net
> Subject: Re: Long pause time of Final Mark in latest 11u shenandoah
> 
> On 8/31/20 6:31 AM, Liang Mao wrote:
> > Thanks very much for your quick reply! I tried the flush interval but
> > it didn't work. And then I looked into the code and reduced the number
> > of ShenandoahSATBBufferSize which resolved the problem. But now I
> > don't have the testing environment and cannot do more verificatoins. Will let
> you
> > know once I have a double check.
> 
> Ah. Here is the thing: current FlushInterval only works when application comes
> with a full SATB buffer to the filtering code, which can then decide to flush if the
> period had expired. If mutator thread has something in the buffer, but _not_
> anything else -- that would make the thread sit on the hidden elements until
> Final Mark. If you reduce ShenandoahSATBBufferSize, that probably makes
> those low-occupancy buffers submitted to filtering code, where flush deals with
> them.

Yes. I see the force flush only works when the local buffer is full. So reducing the
 buffer size can be a workaround for my case.

> 
> > BTW, do you have any experience of performance impact if we reduce the
> > default value of ShenandoahSATBBufferSize? I guess is it possible
> > that the "snapshots" in Shenandoah would be more than G1 because G1
> > satb only handles old objects?
> 
> SATB handles all reference stores, and it records "previous" values in the fields.
> So it is largely irrelevant if there are "old" or "young" objects. I'd expect the
> same kind of throughput loss as you would do with G1.

IMHO, the large object graph staying in local satb queue could be "floating
 garbage" which is not reached by concurrent marking thread. I guess it would be
 more likely with "young" objects if so.

> 
> Really, the proper way to solve this is to "handshake" the mutator threads
> before finalizing the concurrent mark, and by doing so force them to flush
> however small SATB buffers they sit on.

Ideally it could completely resolve the problem if we could have the handshake
 between mutator threads and marking threads. The concurrent marking thread
will try to notify mutator threads to flush the remaining buffer and the concurrent
marking will eventually stop if there're no more filtered objects. But mutator threads
 may not be running and block the handshake. We may also be able to do a force flush 
of mutator threads' local queue by concurrent marking thread by locks. Or we can
introduce an additional safepoint to flush. Things seems a bit complicated. Do we
 already have a plan for this?

> 
> --
> Thanks,
> -Aleksey

Thanks,
Liang