RFR 8240902: JDI shared memory connector can use already closed Handles
David Holmes
david.holmes at oracle.com
Wed Mar 18 23:10:44 UTC 2020
Hi Patricio,
On 19/03/2020 6:44 am, Patricio Chilano wrote:
> Hi David,
>
> On 3/18/20 4:27 AM, David Holmes wrote:
>> Hi Patricio,
>>
>> On 18/03/2020 6:14 am, Patricio Chilano wrote:
>>> Hi all,
>>>
>>> Please review the following patch:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240902
>>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8240902/v1/webrev/
>>>
>>> Calling closeConnection() on an already created/opened connection
>>> includes calls to CloseHandle() on objects that can still be used by
>>> other threads. This can lead to either undefined behavior or, as
>>> detailed in the bug comments, changes of state of unrelated objects.
>>
>> This was a really great find!
> Thanks! : )
>
>>> This issue was found while debugging the reason behind some jshell
>>> test failures seen after pushing 8230594. Not as important, but there
>>> are also calls to closeStream() from createStream()/openStream() when
>>> failing to create/open a stream that will return after executing
>>> "CHECK_ERROR(enterMutex(stream, NULL));" without closing the intended
>>> resources. Then, calling closeConnection() could assert if the reason
>>> of the previous failure was that the stream's mutex failed to be
>>> created/opened. These patch aims to address these issues too.
>>
>> Patch looks good in general. The internal reference count guards
>> deletion of the internal resources, and is itself safe because never
>> actually delete the connection. Thanks for adding the comment about
>> this aspect.
>>
>> A few items:
>>
>> Please update copyright year before pushing.
> Done.
>
>> Please align ENTER_CONNECTION/LEAVE_CONNECTION macros the same way as
>> STREAM_INVARIANT.
> Done.
>
>> 170 unsigned int refcount;
>> 171 jint state;
>>
>> I'm unclear about the use of stream->state and connection->state as
>> guards - unless accessed under a mutex these would seem to at least
>> need acquire/release semantics.
>>
>> Additionally the reads of refcount would also seem to need to some
>> form of memory synchronization - though the Windows docs for the
>> Interlocked* API does not show how to simply read such a variable!
>> Though I note that the RtlFirstEntrySList method for the "Interlocked
>> Singly Linked Lists" API does state "Access to the list is
>> synchronized on a multiprocessor system." which suggests a read of
>> such a variable does require some form of memory synchronization!
> In the case of the stream struct, the state field is protected by the
> mutex field. It is set to STATE_CLOSED while holding the mutex, and
> threads that read it must acquire the mutex first through
> sysIPMutexEnter(). For the cases where sysIPMutexEnter() didn't acquire
> the mutex, we will return something different than SYS_OK and the call
> will exit anyways. All this behaves as before, I didn't change it.
Thanks for clarifying.
> The refcount and state that I added to the SharedMemoryConnection struct
> work together. For a thread closing the connection, setting the
> connection state to STATE_CLOSED has to happen before reading the
> refcount (more on the atomicity of that read later). That's why I added
> the MemoryBarrier() call; which I see it's better if I just move it to
> after setting the connection state to closed. For the threads accessing
> the connection, incrementing the refcount has to happen before reading
> the connection state. That's already provided by the
> InterlockedIncrement() which uses a full memory barrier. In this way if
> the thread closing the connection reads a refcount of 0, then we know
> it's safe to release the resources, since other threads accessing the
> connection will see that the state is closed after incrementing the
> refcount. If the read of refcount is not 0, then it could be that a
> thread is accessing the connection or not (it could have read a state
> connection of STATE_CLOSED after incrementing the refcount), we don't
> know, so we can't release anything. Similarly if the thread accessing
> the connection reads that the state is not closed, then we know it's
> safe to access the stream since anybody closing the connection will
> still have to read refcount which will be at least 1.
> As for the atomicity of the read of refcount, from
> https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access,
> it states that "simple reads and writes to properly-aligned 32-bit
> variables are atomic operations". Maybe I should declare refcount
> explicitly as DWORD32?
It isn't the atomicity in question with the naked read but the
visibility. Any latency in the visibility of the store done by the
InterLocked*() function should be handled by the retry loop, but what is
to stop the C++ compiler from hoisting the read of refcount out of the
loop? It isn't even volatile (which has a stronger meaning in VS than
regular C+++).
> Instead of having a refcount we could have done something similar to the
> stream struct and protect access to the connection through a mutex. To
> avoid serializing all threads we could have used SRW locks and only the
> one closing the connection would do AcquireSRWLockExclusive(). It would
> change the state of the connection to STATE_CLOSED, close all handles,
> and then release the mutex. ENTER_CONNECTION() and LEAVE_CONNECTION()
> would acquire and release the mutex in shared mode. But other that maybe
> be more easy to read I don't think the change will be smaller.
>
>> 413 while (attempts>0) {
>>
>> spaces around >
> Done.
>
>> If the loop at 413 never encounters a zero reference_count then it
>> doesn't close the events or the mutex but still returns SYS_OK. That
>> seems wrong but I'm not sure what the right behaviour is here.
> I can change the return value to be SYS_ERR, but I don't think there is
> much we can do about it unless we want to wait forever until we can
> release those resources.
SYS_ERR would look better, but I see now that the return value is
completely ignored anyway. So we're just going to leak resources if the
loop "times out". I guess this is the best we can do.
Thanks,
David
>
>> And please wait for serviceability folk to review this.
> Sounds good.
>
>
> Thanks for looking at this David! I will move the MemoryBarrier() and
> change the refcount to be DWORD32 if you are okay with that.
>
>
> Thanks,
> Patricio
>> Thanks,
>> David
>> -----
>>
>>> Tested in mach5 with the current baseline, tiers1-3 and several runs
>>> of open/test/langtools/:tier1 which includes the jshell tests where
>>> this connector is used. I also applied patch
>>> http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrev
>>> mentioned in the comments of the bug, on top of the baseline and run
>>> the langtool tests with and without this fix. Without the fix running
>>> around 30 repetitions already shows failures in tests
>>> jdk/jshell/FailOverExecutionControlTest.java and
>>> jdk/jshell/FailOverExecutionControlHangingLaunchTest.java. With the
>>> fix I run several hundred runs and saw no failures. Let me know if
>>> there is any additional testing I should do.
>>>
>>> As a side note, I see there are a couple of open issues related with
>>> jshell failures (8209848) which could be related to this bug and
>>> therefore might be fixed by this patch.
>>>
>>> Thanks,
>>> Patricio
>>>
>
More information about the serviceability-dev
mailing list