RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes david.holmes at oracle.com
Mon Feb 22 07:50:15 UTC 2021


Correction ...

On 22/02/2021 3:11 pm, David Holmes wrote:
> Hi Erik,
> 
> On 19/02/2021 6:16 pm, Erik Osterlund wrote:
>> Hi David,
>>
>> As for the general problem of accessing a private handle from another 
>> thread, my thinking was that this could only possibly be safe if a 
>> thread ”grabs on to” a remote thread so it’s not concurrently running. 
>> Otherwise doing so would never be safe with or without the new 
>> concurrent root processing code.
> 
> It is safe whenever the two threads use a protocol that ensures safety. 
> So bundling a local handle into a synchronous VM-op or synchronous 
> handshake op is safe, because the originator won't move on until the op 
> has been completed. But any kind of async op would be dangerous - as 
> would trying to "return" a Handle via the op.
> 
>> So the reasoning was that if we make it safe for safepoint operations 
>> to access any thread’s private oops, and for handshakes to access the 
>> handshakee’s private oops, then that should cover that kind of 
>> interactions.
>> However, I did think about the possibility of the handshaker’s private 
>> oops being accessed by the handshakee, and thought like you that you 
>> just shouldn’t do that. You should use global handles for such shared 
>> interactions. But ever since then I have now and then thought ”what if 
>> I missed a handle”. And this bug proves that I did indeed miss a handle.
> 
> The current case seems to me to be what you would normally encounter.
> It is the handshaker that creates the handle and the op and then 
> potentially has the handshakee execute the op - hence the it has to be 
> safe for the handshakee to access the handshaker's handles. The reverse 
> is not expected because, AFAICS, the handshakee never creates the op so 
> it would only expose local handles if it "returned" them via the op - 
> and that is inherently unsafe!

Robbin pointed out that the more general issue of safety here involves 
e.g walking the stack of the handshakee which may contain oops; it isn't 
to per-se make Handles created by the handshakee appear safe.

David
-----

>>
>> Therefore, while I agree that passing a private handle to be used by 
>> another thread is something we shouldn’t do, I would sleep better at 
>> night knowing that it is safe to do so anyway, should it happen, in 
>> case there is another strange interaction that I missed, or some new 
>> code comes in that thinks it’s safe. This is the more robust way of 
>> dealing with it.
> 
> On reflection I can see there are situations where we can construct a 
> safe exchange of data using handles, so yes this must work. But I don't 
> like the fact the code performing the exchange has to be aware that 
> there is some special call that needs to be done to in fact make it safe 
> though (beyond the protocol) - that is error prone. The Handle logic 
> should encapsulate this if possible. And ideally we would detect when 
> such a cross-thread access in in fact unsafe.
> 
> So notwithstanding I think there is more to be looked at here, the 
> current fix is a necessary and desirable one, and I withdraw my 
> (non-binding) objections.
> 
> Thanks,
> David
> -----
> 
>> Thanks,
>> /Erik
>>
>>> On 18 Feb 2021, at 13:48, David Holmes <david.holmes at oracle.com> wrote:
>>>
>>> Correction ...
>>>
>>>> On 18/02/2021 12:22 pm, David Holmes wrote:
>>>> More generally I am concerned by the fact that we have a potential 
>>>> bug if a Handle created by one thread is accessed by another, 
>>>> without first calling start_processing() on the creator of the 
>>>> Handle. This would seem to be error prone to me unless it is 
>>>> actually hidden away inside the Handle implementation - though how 
>>>> to do that efficiently is unclear. I'm very concerned that it is too 
>>>> easy to write code (eg. handing something off to the  service thread 
>>>> or notification thread) that will be buggy.
>>>
>>> I was confusing Handles with Resources in terms of lifecycle - not 
>>> realising Handles are stack-local to a Thread and can only be shared 
>>> with another thread very, very carefully.
>>>
>>> David
>>>
>>>> Perhaps @fisk can comment on the general problem.
>>>> Thanks,
>>>> David
>>>> src/hotspot/share/runtime/handshake.cpp line 84:
>>>>> 82:  public:
>>>>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, 
>>>>> JavaThread* target, jlong start_ns)
>>>>> 84:     : HandshakeOperation(cl, target, NULL), 
>>>>> _start_time_ns(start_ns) {}
>>>> Not at all clear why requester is always NULL here. There is 
>>>> obviously a requester thread even in the async case. If that thread 
>>>> created a Handle in the op then don't we need to call 
>>>> start_processing on it?
>>>> src/hotspot/share/runtime/handshake.cpp line 328:
>>>>> 326:
>>>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>>>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>>>> It is unfortunate that we need to manifest the current thread here 
>>>> (and elsewhere).
>>>> -------------
>>>> Changes requested by dholmes (Reviewer).
>>>> PR: https://git.openjdk.java.net/jdk/pull/2571


More information about the hotspot-runtime-dev mailing list