RFR 8240902: JDI shared memory connector can use already closed Handles

Wed Mar 18 20:44:51 UTC 2020

Hi David,

On 3/18/20 4:27 AM, David Holmes wrote:
> Hi Patricio,
>
> On 18/03/2020 6:14 am, Patricio Chilano wrote:
>> Hi all,
>>
>> Please review the following patch:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8240902
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8240902/v1/webrev/
>>
>> Calling closeConnection() on an already created/opened connection 
>> includes calls to CloseHandle() on objects that can still be used by 
>> other threads. This can lead to either undefined behavior or, as 
>> detailed in the bug comments, changes of state of unrelated objects. 
>
> This was a really great find!
Thanks!  : )

>> This issue was found while debugging the reason behind some jshell 
>> test failures seen after pushing 8230594. Not as important, but there 
>> are also calls to closeStream() from createStream()/openStream() when 
>> failing to create/open a stream that will return after executing 
>> "CHECK_ERROR(enterMutex(stream, NULL));" without closing the intended 
>> resources. Then, calling closeConnection() could assert if the reason 
>> of the previous failure was that the stream's mutex failed to be 
>> created/opened. These patch aims to address these issues too.
>
> Patch looks good in general. The internal reference count guards 
> deletion of the internal resources, and is itself safe because never 
> actually delete the connection. Thanks for adding the comment about 
> this aspect.
>
> A few items:
>
> Please update copyright year before pushing.
Done.

> Please align ENTER_CONNECTION/LEAVE_CONNECTION macros the same way as 
> STREAM_INVARIANT.
Done.

>  170 unsigned int refcount;
>  171     jint state;
>
> I'm unclear about the use of stream->state and connection->state as 
> guards - unless accessed under a mutex these would seem to at least 
> need acquire/release semantics.
>
> Additionally the reads of refcount would also seem to need to some 
> form of memory synchronization - though the Windows docs for the 
> Interlocked* API does not show how to simply read such a variable! 
> Though I note that the RtlFirstEntrySList method for the "Interlocked 
> Singly Linked Lists" API does state "Access to the list is 
> synchronized on a multiprocessor system." which suggests a read of 
> such a variable does require some form of memory synchronization!
In the case of the stream struct, the state field is protected by the 
mutex field. It is set to STATE_CLOSED while holding the mutex, and 
threads that read it must acquire the mutex first through 
sysIPMutexEnter(). For the cases where sysIPMutexEnter() didn't acquire 
the mutex, we will return something different than SYS_OK and the call 
will exit anyways. All this behaves as before, I didn't change it.

The refcount and state that I added to the SharedMemoryConnection struct 
work together. For a thread closing the connection, setting the 
connection state to STATE_CLOSED has to happen before reading the 
refcount (more on the atomicity of that read later). That's why I added 
the MemoryBarrier() call; which I see it's better if I just move it to 
after setting the connection state to closed. For the threads accessing 
the connection, incrementing the refcount has to happen before reading 
the connection state. That's already provided by the 
InterlockedIncrement() which uses a full memory barrier. In this way if 
the thread closing the connection reads a refcount of 0, then we know 
it's safe to release the resources, since other threads accessing the 
connection will see that the state is closed after incrementing the 
refcount. If the read of refcount is not 0, then it could be that a 
thread is accessing the connection or not (it could have read a state 
connection of STATE_CLOSED after incrementing the refcount), we don't 
know, so we can't release anything. Similarly if the thread accessing 
the connection reads that the state is not closed, then we know it's 
safe to access the stream since anybody closing the connection will 
still have to read refcount which will be at least 1.
As for the atomicity of the read of refcount, from 
https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access, 
it states that "simple reads and writes to properly-aligned 32-bit 
variables are atomic operations". Maybe I should declare refcount 
explicitly as DWORD32?

Instead of having a refcount we could have done something similar to the 
stream struct and protect access to the connection through a mutex. To 
avoid serializing all threads we could have used SRW locks and only the 
one closing the connection would do AcquireSRWLockExclusive(). It would 
change the state of the connection to STATE_CLOSED, close all handles, 
and then release the mutex. ENTER_CONNECTION() and LEAVE_CONNECTION() 
would acquire and release the mutex in shared mode. But other that maybe 
be more easy to read I don't think the change will be smaller.

>  413 while (attempts>0) {
>
> spaces around >
Done.

> If the loop at 413 never encounters a zero reference_count then it 
> doesn't close the events or the mutex but still returns SYS_OK. That 
> seems wrong but I'm not sure what the right behaviour is here.
I can change the return value to be SYS_ERR, but I don't think there is 
much we can do about it unless we want to wait forever until we can 
release those resources.

> And please wait for serviceability folk to review this.
Sounds good.

Thanks for looking at this David! I will move the MemoryBarrier() and 
change the refcount to be DWORD32 if you are okay with that.

Thanks,
Patricio
> Thanks,
> David
> -----
>
>> Tested in mach5 with the current baseline, tiers1-3 and several runs 
>> of open/test/langtools/:tier1 which includes the jshell tests where 
>> this connector is used. I also applied patch 
>> http://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrev 
>> mentioned in the comments of the bug, on top of the baseline and run 
>> the langtool tests with and without this fix. Without the fix running 
>> around 30 repetitions already shows failures in tests 
>> jdk/jshell/FailOverExecutionControlTest.java and 
>> jdk/jshell/FailOverExecutionControlHangingLaunchTest.java. With the 
>> fix I run several hundred runs and saw no failures. Let me know if 
>> there is any additional testing I should do.
>>
>> As a side note, I see there are a couple of open issues related with 
>> jshell failures (8209848) which could be related to this bug and 
>> therefore might be fixed by this patch.
>>
>> Thanks,
>> Patricio
>>