adding rsockets support into JDK
Chris Hegarty
chris.hegarty at oracle.com
Tue Dec 11 19:30:45 UTC 2018
Lucy,
Sure, the small test scenarios can be modified to make them "work". The
bigger question is how the proposed JDK-RDMA implementation code can be
modified to provide the semantics of the `SocketChannel` API.
Issue #1: rread returns EAGAIN. One possible solution could be that the
blocking code path in RdmaSocketChannelImpl could fallback back into a
blocking rpoll POLLOUT if the IOUtil.read method invocation returns
IOStatus.UNAVAILABLE. I think this should work, and not have too much of
a negative impact since the fallback will only occur infrequently.
Q: can rwrite return EAGAIN? I have not checked yet.
Issue #2: This issue is likely to be encountered mainly during testing,
since a non-blocking connect followed by an accept, on the same thread,
is not all the common in non-test code. That said, the semantics of the
SocketChannel API would lead one to expect it to work. ( I get that
rsocket is not asynchronous, but the semantics of non-blocking channels
implies some asynchronousity ). I wonder if the JDK-RDMA implementation
should have a dedicated thread that "pulls" on unfinished non-blocking
connects that are not subsequently registered with a Selector? Maybe
accepts too? I'm not sure yet.
-Chris.
> On 11 Dec 2018, at 00:36, Lu, Yingqi <yingqi.lu at intel.com> wrote:
>
> Hi Alan/Chris,
>
> I was able to confirm that connecting on non-blocking socket causes issues. It happens when connect/accept occurs in the same thread or different threads in the same process.
>
> Then, I did a small tweak in Chris's sample application by spawning a thread doing rpoll on the connection_fd. Now, the connect/accept works in both of the cases above. Please let me know if this is a valid workaround for the issue.
>
> Performance wise, this workaround should not impact send/receive at all. It might only add a small overhead to the connection setup phase only with non-blocking RDMA socket.
>
> The modified app code is available at
>
> For connect/accept occur in the same thread: https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified.c
>
> For connect/accept occur in two different threads: https://cr.openjdk.java.net/~ylu/testNonBlocking_raccept_modified_2threads.c
>
> Thanks,
> Lucy
>
>> -----Original Message-----
>> From: Alan Bateman [mailto:Alan.Bateman at oracle.com]
>> Sent: Saturday, December 8, 2018 8:10 AM
>> To: Chris Hegarty <chris.hegarty at oracle.com>; Lu, Yingqi
>> <yingqi.lu at intel.com>; nio-dev at openjdk.java.net
>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Aundhe,
>> Shirish <shirish.aundhe at intel.com>; Kaczmarek, Eric
>> <eric.kaczmarek at intel.com>
>> Subject: Re: adding rsockets support into JDK
>>
>> On 08/12/2018 09:39, Chris Hegarty wrote:
>>> :
>>>
>>> - It has become apparent that mixing blocking and non-blocking
>>> connect/accept operations, in the same thread, may cause issues. For
>>> example, attempting to setup a connected-socket on the same host by
>>> issuing a non-blocking connect followed by a blocking accept, will
>>> just hang and not make progress [3]. Upon further enquiries it
>>> appears
>>> that the programming model for rsocket is a subtly different than
>>> that
>>> of the regular Berkeley sockets ( at least for the connection
>>> handshake ). It is not immediately clear how to reasonably
>>> workaround
>>> this issue ( it's not a bug in rdma-core, but more a fundamental
>>> part
>>> of its thread-less operation ).
>>>
>> Would it be possible to expand on this to say whether the same issues arises
>> when the non-blocking connect is initiated on a different thread, or in a
>> different process, or even a different machine on the fabric.
>> That is, if the socket is non-blocking and I do a rconnect and then delay before
>> doing anything else on the socket then will the peer doing accept be
>> blocked/hung in the mean-time?
>>
>> -Alan
More information about the nio-dev
mailing list