RFR: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing [v8]

Erik Österlund eosterlund at openjdk.java.net
Wed Oct 7 07:00:15 UTC 2020


On Tue, 6 Oct 2020 12:18:39 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>>> Hi Erik,
>>> Can you give an overview of the use of the "poll word" and its relation to the "poll page" please?
>>> Thanks,
>>> David
>> 
>> Hi David,
>> 
>> Thanks for reviewing this code.
>> 
>> There are various polls in the VM. We have runtime transitions, interpreter transitions, transitions at returns, native
>> wrappers, transitions in nmethods... and sometimes they are a bit different.
>> The "poll word" encapsulates enough information to be able to poll for returns (stack watermark barrier), or poll for
>> normal handshakes/safepoints, with a conditional branch. So really, we could use the "poll word" for every single poll.
>> A low order bit is a boolean saying if handshake/safepoint is armed, and the rest of the word denotes the watermark for
>> which frame has armed returns.  The "poll page" is for polls that do not use conditional branches, but instead uses an
>> indirect load. It is used still in nmethod loop polls, because I experimentally found it to perform worse with
>> conditional branches on one machine, and did not want to risk regressions. It is also used for VM configurations that
>> do not yet support stack watermark barriers, such as Graal, PPC, S390 and 32 bit platforms. They will hopefully
>> eventually support this mechanism, but having the poll page allows a more smooth transition. And unless it is crystal
>> clear that the performance of the conditional branch loop poll really is fast enough on sufficiently many machines, we
>> might keep it until that changes.  Hope this makes sense.  Thanks,
>
>> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_
>> 
>> On 06/10/2020 08:22, Erik ?sterlund wrote:
>> 
>> > > This PR the implementation of "JEP 376: ZGC: Concurrent Thread-Stack Processing" (cf.
>> > > https://openjdk.java.net/jeps/376).
>> 
>> One small thing: the couple of uses of lea(InternalAddress) should really be adr;
>> this generates much better code.
> 
> Hi Andrew,
> 
> Thanks for having a look. I applied your patch. Having said that, this is run on the safepoint slow path, so should be
> a rather cold path, where threads have to wear coats and gloves. But it does not hurt to optimize the encoding further,
> I suppose.  Thanks,

> 
> *Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on
>  [serviceability-dev](mailto:serviceability-dev at openjdk.java.net):*
> 
> Hi Erik,
> 
> On 6/10/2020 5:37 pm, Erik ?sterlund wrote:
> > On Tue, 6 Oct 2020 02:57:00 GMT, David Holmes <dholmes at openjdk.org> wrote:
> > 
> >> Hi Erik,
> >> Can you give an overview of the use of the "poll word" and its relation to the "poll page" please?
> >> Thanks,
> >> David
> > 
> > Hi David,
> > 
> > Thanks for reviewing this code.
> > 
> > There are various polls in the VM. We have runtime transitions, interpreter transitions, transitions at returns, native
> > wrappers, transitions in nmethods... and sometimes they are a bit different.
> > 
> > The "poll word" encapsulates enough information to be able to poll for returns (stack watermark barrier), or poll for
> > normal handshakes/safepoints, with a conditional branch. So really, we could use the "poll word" for every single poll.
> > A low order bit is a boolean saying if handshake/safepoint is armed, and the rest of the word denotes the watermark for
> > which frame has armed returns.
> > 
> > The "poll page" is for polls that do not use conditional branches, but instead uses an indirect load. It is used still
> > in nmethod loop polls, because I experimentally found it to perform worse with conditional branches on one machine, and
> > did not want to risk regressions. It is also used for VM configurations that do not yet support stack watermark
> > barriers, such as Graal, PPC, S390 and 32 bit platforms. They will hopefully eventually support this mechanism, but
> > having the poll page allows a more smooth transition. And unless it is crystal clear that the performance of the
> > conditional branch loop poll really is fast enough on sufficiently many machines, we might keep it until that changes.
> > 
> > Hope this makes sense.
> 
> Yes but I am somewhat surprised. The conventional wisdom has always been
> that polling based on the "poison page" approach far outperforms
> explicit load-test-branch approaches.
> 
> Cheers,
> David

When thread local handshakes was built, both a branch based and indirect load based prototype was implemented. I had a
branch based solution and Mikael Gerdin built an indirect load based solution, so we could compare them. He compared
them on many machines and found that sometimes branches are a bit faster and sometimes a bit slower, depending on CPU
model. But the results with indirect loads was more stable from machine to machine, while the branch based solution
depended a bit more on what CPU model was being used. That is why the indirect load solution was chosen: it was not
always best but it was never bad on any machine.

Since then, we got loop strip mining in C2 which makes the frequency of polls less tight. My hypothesis was that with
that in place, a new evaluation would show that branching is fine now. However, one machine did not agree with that. To
be fair, that machine is not giving very stable results at all right now, so I am not sure if there is a real problem
or not. But I thought I'll keep the old behaviour for now anyway as I do not yet have a good reason to change it, and
not good enough proof that it is okay... yet.

-------------

PR: https://git.openjdk.java.net/jdk/pull/296


More information about the serviceability-dev mailing list