RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]
David Holmes
david.holmes at oracle.com
Mon Feb 22 05:11:26 UTC 2021
Hi Erik,
On 19/02/2021 6:16 pm, Erik Osterlund wrote:
> Hi David,
>
> As for the general problem of accessing a private handle from another thread, my thinking was that this could only possibly be safe if a thread ”grabs on to” a remote thread so it’s not concurrently running. Otherwise doing so would never be safe with or without the new concurrent root processing code.
It is safe whenever the two threads use a protocol that ensures safety.
So bundling a local handle into a synchronous VM-op or synchronous
handshake op is safe, because the originator won't move on until the op
has been completed. But any kind of async op would be dangerous - as
would trying to "return" a Handle via the op.
> So the reasoning was that if we make it safe for safepoint operations to access any thread’s private oops, and for handshakes to access the handshakee’s private oops, then that should cover that kind of interactions.
> However, I did think about the possibility of the handshaker’s private oops being accessed by the handshakee, and thought like you that you just shouldn’t do that. You should use global handles for such shared interactions. But ever since then I have now and then thought ”what if I missed a handle”. And this bug proves that I did indeed miss a handle.
The current case seems to me to be what you would normally encounter.
It is the handshaker that creates the handle and the op and then
potentially has the handshakee execute the op - hence the it has to be
safe for the handshakee to access the handshaker's handles. The reverse
is not expected because, AFAICS, the handshakee never creates the op so
it would only expose local handles if it "returned" them via the op -
and that is inherently unsafe!
>
> Therefore, while I agree that passing a private handle to be used by another thread is something we shouldn’t do, I would sleep better at night knowing that it is safe to do so anyway, should it happen, in case there is another strange interaction that I missed, or some new code comes in that thinks it’s safe. This is the more robust way of dealing with it.
On reflection I can see there are situations where we can construct a
safe exchange of data using handles, so yes this must work. But I don't
like the fact the code performing the exchange has to be aware that
there is some special call that needs to be done to in fact make it safe
though (beyond the protocol) - that is error prone. The Handle logic
should encapsulate this if possible. And ideally we would detect when
such a cross-thread access in in fact unsafe.
So notwithstanding I think there is more to be looked at here, the
current fix is a necessary and desirable one, and I withdraw my
(non-binding) objections.
Thanks,
David
-----
> Thanks,
> /Erik
>
>> On 18 Feb 2021, at 13:48, David Holmes <david.holmes at oracle.com> wrote:
>>
>> Correction ...
>>
>>> On 18/02/2021 12:22 pm, David Holmes wrote:
>>> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the service thread or notification thread) that will be buggy.
>>
>> I was confusing Handles with Resources in terms of lifecycle - not realising Handles are stack-local to a Thread and can only be shared with another thread very, very carefully.
>>
>> David
>>
>>> Perhaps @fisk can comment on the general problem.
>>> Thanks,
>>> David
>>> src/hotspot/share/runtime/handshake.cpp line 84:
>>>> 82: public:
>>>> 83: AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>>>> 84: : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>>> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?
>>> src/hotspot/share/runtime/handshake.cpp line 328:
>>>> 326:
>>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>>> 328: HandshakeOperation cto(hs_cl, NULL, Thread::current());
>>> It is unfortunate that we need to manifest the current thread here (and elsewhere).
>>> -------------
>>> Changes requested by dholmes (Reviewer).
>>> PR: https://git.openjdk.java.net/jdk/pull/2571
More information about the hotspot-runtime-dev
mailing list